Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
272 views
in Technique[技术] by (71.8m points)

python - Pandas group by cumsum length of values does not match length of index

As most of these help questions begin, I'm new to Python and Pandas. I've been learning by doing, especially when I have a particular task to complete. I have searched the help pages and could not find an answer that addressed by specific problem and I could not devise a solution based on answers to similar problems.

I have a data set with 50K+ entries. The general format is:

    code  value
0    101    0.0
1    102    0.0
2    103   23.2
3    104   10.3
4    105    0.2
5    106    0.0
6    107   22.6
7    108    0.0
8    109    0.0
9    110    2.2
10   111    3.8
11   112    0.0

My first task was to segregate consecutive non-zero values. Through trial and error, I managed to condense my script to one line that accomplished this.

df[df['value'] != 0].groupby((df['value'] == 0).cumsum())
for grp, val in df[df['value'] != 0].groupby((df['value'] == 0).cumsum()):
    print(f'[group {grp}]')
    print(val)

The output is:

[group 2]
   code  value
2   103   23.2
3   104   10.3
4   105    0.2
[group 3]
   code  value
6   107   22.6
[group 5]
    code  value
9    110    2.2
10   111    3.8

I have other manipulations and calculations to do on this data set and I think the easiest way to access these data would be to transform the groupby object into a column (if that is even the correct terminology?), like so:

   code  value  group
0   103   23.2      2
1   104   10.3      2
2   105    0.2      2
3   107   22.6      3
4   110    2.2      5
5   111    3.8      5

Obviously, I get a "Length of values does not match length of index" error. I searched the help pages and it seemed that I needed to do some type of reset_index method. I tried various syntax structures and many other coding solutions suggested in other threads the past day and a half without success. I finally decided to give up and ask for help when I returned from a short break and found my cat rolling on the keyboard, adding and deleting gobs of gibberish to the script snippets I had been testing.

If someone would be kind enough to help me with this script--to get the groupby object into a column, I would greatly appreciate it. Thanks.

question from:https://stackoverflow.com/questions/65837063/pandas-group-by-cumsum-length-of-values-does-not-match-length-of-index

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This will give you the groups then drop the zero rows.

df = pd.DataFrame({'code': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112],
 'value': [0.0, 0.0, 23.2, 10.3, 0.2, 0.0, 22.6, 0.0, 0.0, 2.2, 3.8, 0.0]})

df['group'] = df.value.eq(0).cumsum()
df = df.loc[df.value.ne(0)]

Output

    code  value  group
2    103   23.2      2
3    104   10.3      2
4    105    0.2      2
6    107   22.6      3
9    110    2.2      5
10   111    3.8      5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...