I have a dataframe of 100 columns and 2 million rows. Among the columns three column are year, compound_id, lt_rto
. Hare
length(unique(year))
30
length(unique(compound_id))
642
What I want to do is create a new column named avg_rto that is for each year
and each compound_id
the mean for lowest 12% of lt_rto
values. For example - suppose for year
2001, and coumpund_id
xyz, it will find the all the values of lt_rto
that are at lower 12% and calculate the mean. This mean will be at the rows where year == 2001 & comound_id == "xyz"
.
The code I came up is -
dt <- dt %>% group_by(year, compound_id) %>%
mutate( avg_rto = mean( dt[['lt_rto']] < quantile(fun.zero.omit(dt[['lt_rto']]),
probs = .88, na.rm = TRUE ) ))
Note: I also intend to omit the zero values while calculating the lower 12 % value.
The above code gives me same value for all the observations. And this also takes a lot time.
My problem is I can not figure out what's wrong on the code and how can I reduce the run time.
Thank you for your help.
question from:
https://stackoverflow.com/questions/65869583/creating-new-column-while-using-group-by-quantile-and-other-functions-takes-lon 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…