Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

Use cumulative sum to assign a value in python/pyspark

Using Python I'd like to write some code that classifies all items where the cumulative sum of the Miles column <=2.5 as being "IN" and the rest "OUT". Are there any suggestions where to start?

Example Data set

Rank  Name  Miles
  1   A     0.5  
  2   A     1
  3   B     1
  4   B     1
  5   C     2

Desired Output

Rank  Name  Miles  Assign
  1   A     0.5     IN
  2   A     1       IN
  3   B     1       IN
  4   B     1       OUT
  5   C     2       OUT
question from:https://stackoverflow.com/questions/65713474/use-cumulative-sum-to-assign-a-value-in-python-pyspark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It looks like you're using Pandas, though I'm not an expert.

If you have a dataframe like this:

   Rank Name  Miles
0     1    A    0.5
1     2    A    1.0
2     3    B    1.0
3     4    B    1.0
4     5    C    2.0

Then you can simply create a new column where the values are based on the cumulative sum of the Miles column:

df['Assign'] = ['IN' if i <= 2.5 else 'OUT' for i in df['Miles'].cumsum()]

Or, I think this is more idiomatic:

df['Assign'] = ['IN' if i else 'OUT' for i in df['Miles'].cumsum() <= 2.5]

Which becomes:

   Rank Name  Miles Assign
0     1    A    0.5     IN
1     2    A    1.0     IN
2     3    B    1.0     IN
3     4    B    1.0    OUT
4     5    C    2.0    OUT

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...