Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
379 views
in Technique[技术] by (71.8m points)

python - How to explode a list inside a Dataframe cell into separate rows

I'm looking to turn a pandas cell containing a list into rows for each of those values.

So, take this:

enter image description here

If I'd like to unpack and stack the values in the nearest_neighbors column so that each value would be a row within each opponent index, how would I best go about this? Are there pandas methods that are meant for operations like this?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the code below, I first reset the index to make the row iteration easier.

I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.

I use a lambda function together with list iteration to create a row for each element of the nearest_neighbors paired with the relevant name and opponent.

Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name and opponent).

df = (pd.DataFrame({'name': ['A.J. Price'] * 3, 
                    'opponent': ['76ers', 'blazers', 'bobcats'], 
                    'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
      .set_index(['name', 'opponent']))

>>> df
                                                    nearest_neighbors
name       opponent                                                  
A.J. Price 76ers     [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           blazers   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           bobcats   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]

df.reset_index(inplace=True)
rows = []
_ = df.apply(lambda row: [rows.append([row['name'], row['opponent'], nn]) 
                         for nn in row.nearest_neighbors], axis=1)
df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])

>>> df_new
                    nearest_neighbors
name       opponent                  
A.J. Price 76ers          Zach LaVine
           76ers           Jeremy Lin
           76ers        Nate Robinson
           76ers                Isaia
           blazers        Zach LaVine
           blazers         Jeremy Lin
           blazers      Nate Robinson
           blazers              Isaia
           bobcats        Zach LaVine
           bobcats         Jeremy Lin
           bobcats      Nate Robinson
           bobcats              Isaia

EDIT JUNE 2017

An alternative method is as follows:

>>> (pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(), 
             id_vars=['name', 'opponent'],
             value_name='nearest_neighbors')
     .set_index(['name', 'opponent'])
     .drop('variable', axis=1)
     .dropna()
     .sort_index()
     )

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...