Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
373 views
in Technique[技术] by (71.8m points)

python - Access the result of a previous calculation in custom function passed to apply()

I'm working with Pandas in Python and I would like to access the result of the previous calculation when applying a custom function to a series.

Roughly like this:

import pandas

# How can I obtain previous_result?
def foo(value, previous_result = None):

    # On the first iteration there is no previous result
    if previous_result is None:
        previous_result = value

    return value + previous_result

series = pandas.Series([1,2,3])
print(series.apply(foo))

This can also be generalized to "How to pass the n previous results to the function?". I know about series.rolling() but even with rolling I wasn't able to obtain the previous results, only the previous values of the input series.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The most special type of the operations you describe are available as cummax, cummin, cumprod and cumsum (f(x) = x + f(x-1)).

More functionality can be found in expanding objects: mean, standard deviation, variance kurtosis, skewness, correlation, etc.

And for the most general case, you can use expanding().apply() with a custom function. For example,

from functools import reduce  # For Python 3.x
ser.expanding().apply(lambda r: reduce(lambda prev, value: prev + 2*value, r))

is equivalent to f(x) = 2x + f(x-1)

The methods I listed are optimized and run quite fast but when you use a custom function the performance gets worse. For exponential smoothing, pandas starts to outperform loops for Series of length 1000 but expanding().apply()'s performance with reduce is quite bad:

np.random.seed(0)    
ser = pd.Series(70 + 5*np.random.randn(10**4))    
ser.tail()
Out: 
9995    60.953592
9996    70.211794
9997    72.584361
9998    69.835397
9999    76.490557
dtype: float64


ser.ewm(alpha=0.1, adjust=False).mean().tail()
Out: 
9995    69.871614
9996    69.905632
9997    70.173505
9998    70.139694
9999    70.774781
dtype: float64

%timeit ser.ewm(alpha=0.1, adjust=False).mean()
1000 loops, best of 3: 779 μs per loop

With loops:

def exp_smoothing(ser, alpha=0.1):
    prev = ser[0]
    res = [prev]
    for cur in ser[1:]:
        prev = alpha*cur + (1-alpha)*prev
        res.append(prev)
    return pd.Series(res, index=ser.index)

exp_smoothing(ser).tail()
Out: 
9995    69.871614
9996    69.905632
9997    70.173505
9998    70.139694
9999    70.774781
dtype: float64

%timeit exp_smoothing(ser)
100 loops, best of 3: 3.54 ms per loop

Total time is still in milliseconds but with expanding().apply():

ser.expanding().apply(lambda r: reduce(lambda p, v: 0.9*p+0.1*v, r)).tail()
Out: 
9995    69.871614
9996    69.905632
9997    70.173505
9998    70.139694
9999    70.774781
dtype: float64

%timeit ser.expanding().apply(lambda r: reduce(lambda p, v: 0.9*p+0.1*v, r))
1 loop, best of 3: 13 s per loop

Methods like cummin, cumsum are optimized and only require x's current value and function's previous value. However with a custom function the complexity is O(n**2). This is mainly because there will be cases that function's previous value and x's current value won't be enough to calculate function's current value. For cumsum, you can use previous cumsum and add the current value to reach a result. You cannot do that for, say, geometric mean. That's why expanding will become unusable for even moderately sized Series.

In general, iterating over a Series is not a very expensive operation. For DataFrames, it needs to return a copy of each row so it is very inefficient but this is not the case for Series. Of course you should use vectorized methods when available but if that's not the case, using a for loop for a task like recursive calculation is OK.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...