Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
129 views
in Technique[技术] by (71.8m points)

python - How to check if a value in dataframe satisfies a condition based on all or last few values in the column and replace it?

I want to check if the value in my dataframe is greater than 1.5 times the median of all previous values (or last 10 previous values) and replace it with the median of all previous values (or last 10 previous values). I have a huge dataset so i dont want to use loops.

  df
Out[315]: 
      a
0  15.0
1  16.0
2  13.5
3  14.6
4  15.0
5  26.0
6  12.0
7  28.0
8  12.0
9  29.0

i want the 26 to be replaced by median of previous values and so on. Once the value is replaced, i want the new value to be considered for calculating the median the next time. Here is what i have tried:(for simplicity i have taken a condition of >20 and mean of past 2 values). Actually, i want the condition to compare the value to 1.5*median of previous 10 values and if greater, then replace it with the median of previous 10 values and the new value to be used next time the median is calculated.

df["b"] = df["a"]
df['b'] = np.where(df["b"]>20, df['b'].rolling(2).mean(), df["b"])
    df
Out[88]: 
      a     b
0  11.0  11.0
1  16.0  16.0
2  13.5  13.5
3  14.6  14.6
4  15.0  15.0
5  26.0  14.8
6  12.0  12.0
7  28.0  19.0
8  12.0  12.0
9  29.0  20.0

Here the replaced values are not getting used to caluclate the median next time. for eg. last value in df["b"] is 20 which is a mean of 28 and 12. But i want the value to be mean of 19 and 12 because 19 is the replaced value.

question from:https://stackoverflow.com/questions/65713692/how-to-check-if-a-value-in-dataframe-satisfies-a-condition-based-on-all-or-last

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use rolling with window of 10 and min_periods as 1 and get median. Shifting the values as just the median of previous values has to be considered

temp = df['a'].rolling(10, min_periods=1).median().shift(1)

0   NaN  
1    15.0
2    15.5
3    15.0
4    14.8
5    15.0
6    15.0
7    15.0
8    15.0
9    15.0

If val is greater than 1.5 times median, replacing the value. df['a'] > 1.5 * temp will be boolean index for where this condition holds

df.loc[df['a'] > 1.5 * temp, 'a'] = temp
df

    a
0   15.0
1   16.0
2   13.5
3   14.6
4   15.0
5   15.0
6   12.0
7   15.0
8   12.0
9   15.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...