I have a large dataframe with duplicated data, there are three columns: word, emotion and score.
I want to find the duplicates in the word column and keep only the words with the higher score.
df = pd.DataFrame({
'word': ['love', 'sadness', 'love', 'love', 'sadness'],
'emotion': ['trust', 'trust', 'confidente', 'joy', 'sad'],
'score': [0.758, 0.250, 0.828, 0.921, 0.981]
})
df_result = pd.DataFrame({
'word': ['love', 'sadness'],
'emotion': ['joy', 'sad'],
'score': [0.921, 0.981]
})
I tried to drop duplicates, but I couldn't add a condition where A < B
# something like
.drop_duplicates('word') # keep A < B
.sort_index()
.reset_index(drop=True)
There are similar questions for this in stackoverflow, but none of them is a similar to this problem.
question from:
https://stackoverflow.com/questions/65946974/how-to-drop-duplicates-in-a-dataframe-and-with-a-condition 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…