Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
342 views
in Technique[技术] by (71.8m points)

python - Measuring average cosine similarity between the groups

I have the following data frame:

Group        Vector
1            [1 1 0 1 0 0]
1            [1 0 0 1 0 0]
1            [1 0 0 1 1 1]
1            [0 0 0 1 0 1]
2            [0 0 0 1 0 1]
2            [0 0 0 1 0 1]
2            [0 1 1 1 0 1]
2            [1 1 0 0 0 1]

How could I calculate the average cosine similarity within the groups? This is the expected outcome (Note I make up to numbers for the calculation)

Group        Vector            Average_Similarity
1            [1 1 0 1 0 0]      0.34
1            [1 0 0 1 0 0]      0.34
1            [1 0 0 1 1 1]      0.34
1            [0 0 0 1 0 1]      0.34
2            [0 0 0 1 0 1]      0.48
2            [0 0 0 1 0 1]      0.48
2            [0 1 1 1 0 1]      0.48
2            [1 1 0 0 0 1]      0.48
question from:https://stackoverflow.com/questions/65908200/measuring-average-cosine-similarity-between-the-groups

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Suppose we read data from your example like:

from ast import literal_eval
df = pd.read_clipboard(sep="|", converters = {"Vector":literal_eval})
df
   Group              Vector
0      1  [1, 1, 0, 1, 0, 0]
1      1  [1, 0, 0, 1, 0, 0]
2      1  [1, 0, 0, 1, 1, 1]
3      1  [0, 0, 0, 1, 0, 1]
4      2  [0, 0, 0, 1, 0, 1]
5      2  [0, 0, 0, 1, 0, 1]
6      2  [0, 1, 1, 1, 0, 1]
7      2  [1, 1, 0, 0, 0, 1]

Then try:

from scipy.spatial.distance import pdist

df["Average_Similarity"] = df.groupby("Group")["Vector"].transform(
    lambda group: pdist(group.to_list(), metric="cosine").mean()
)
df

   Group              Vector  Average_Similarity
0      1  [1, 1, 0, 1, 0, 0]            0.380615
1      1  [1, 0, 0, 1, 0, 0]            0.380615
2      1  [1, 0, 0, 1, 1, 1]            0.380615
3      1  [0, 0, 0, 1, 0, 1]            0.380615
4      2  [0, 0, 0, 1, 0, 1]            0.365323
5      2  [0, 0, 0, 1, 0, 1]            0.365323
6      2  [0, 1, 1, 1, 0, 1]            0.365323
7      2  [1, 1, 0, 0, 0, 1]            0.365323

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...