Use cumulative sum to assign a value in python/pyspark

Question

Welcome To Ask or Share your Answers For Others

Use cumulative sum to assign a value in python/pyspark

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Use cumulative sum to assign a value in python/pyspark

Using Python I'd like to write some code that classifies all items where the cumulative sum of the Miles column <=2.5 as being "IN" and the rest "OUT". Are there any suggestions where to start?

Example Data set

Rank  Name  Miles
  1   A     0.5  
  2   A     1
  3   B     1
  4   B     1
  5   C     2

Desired Output

Rank  Name  Miles  Assign
  1   A     0.5     IN
  2   A     1       IN
  3   B     1       IN
  4   B     1       OUT
  5   C     2       OUT

question from:https://stackoverflow.com/questions/65713474/use-cumulative-sum-to-assign-a-value-in-python-pyspark

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:53:55+0000

It looks like you're using Pandas, though I'm not an expert.

If you have a dataframe like this:

   Rank Name  Miles
0     1    A    0.5
1     2    A    1.0
2     3    B    1.0
3     4    B    1.0
4     5    C    2.0

Then you can simply create a new column where the values are based on the cumulative sum of the Miles column:

df['Assign'] = ['IN' if i <= 2.5 else 'OUT' for i in df['Miles'].cumsum()]

Or, I think this is more idiomatic:

df['Assign'] = ['IN' if i else 'OUT' for i in df['Miles'].cumsum() <= 2.5]

Which becomes:

   Rank Name  Miles Assign
0     1    A    0.5     IN
1     2    A    1.0     IN
2     3    B    1.0     IN
3     4    B    1.0    OUT
4     5    C    2.0    OUT

Categories

Use cumulative sum to assign a value in python/pyspark

Use cumulative sum to assign a value in python/pyspark

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags