Currently I am using the following function;
df['i'] = df.groupby(['i']).filter(lambda i: len(i) > 500)
This works as intended, tested on other data frames, except when dealing with large quantities of groups. I am trying to use this with around 50,000 groups and have thus far not seen my program process this line.
The longest I have let the program run is a bit under 48 hours.
Edit: The method works fine for large groups assuming the lambda function does not remove all the groups. decreasing the minimum length a group can be to 250 allowed the program to execute within 30 seconds.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…