Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
288 views
in Technique[技术] by (71.8m points)

dataframe - extract new columns and fill values based on categorical values data frame in python

I have a data frame where one column is categorical strings and the next one is the values corresponding to it:

df = pd.DataFrame(list((['a', 'b', 'c', 'buy', 5],
                      ['f', 'b', 'a', 'buy', 2],
                      ['a', 'b', 'c', 'sold', 6],
                      ['a', 'b', 'f', 'buy', 4],
                      ['a', 'b', 'c', 'returned', 'yes'])), columns = ['attr1', 'attr2','attr3','status','value'])

initial df with too many rows that are duplicated

I want to create new columns based on df.status column, and fill empty ones with np.nan, requires pivot on multiple columns:

result df after pivot on multiple indexes

I am looking for an efficient solution that works for large data frames.

question from:https://stackoverflow.com/questions/65873915/extract-new-columns-and-fill-values-based-on-categorical-values-data-frame-in-py

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You want:

In [255]: df.pivot(index=['attr1', 'attr2', 'attr3'],columns='status', values='value').rename_axis(None, axis=1).reset_index()
Out[255]: 
  attr1 attr2 attr3 buy returned sold
0     a     b     c   5      yes    6
1     a     b     f   4      NaN  NaN
2     f     b     a   2      NaN  NaN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...