Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
128 views
in Technique[技术] by (71.8m points)

python - Weird behaviour on pandas dataframe

My problem is that I have a dataframe, work on another dataframe and the first edits too. Why could this be?

>untokenized_tweet_tp

                                                     text  ...       screenName
0       [month, open, #postdoc, position, chemical, ch...  ...        VRiffault
1       [hardworking, biofuel, producers, iowa, state,...  ...  LindaWa53201017
3       [today, time, imperative, resort, alternate, s...  ...        ROBRAIPUR
4       [special, gaetanos, beach, club, bell, choosin...  ...    buffbiodiesel
7       [stena, bulk, introduce, low, carbon, shipping...  ...      NPortuarias
                                                   ...  ...              ...
130060  [reseter, elite, vegan, make, unacceptable, ea...  ...      Randy_Anglo
130171  [solar, wind, destroy, supply, limited, output...  ...  RealRichardBail
130331  [renewable, energy, defined, wood, wood, waste...  ...      PaulSchmehl
130375                     [guess, aiding, wood, passion]  ...     GraceIrene21
130384  [homogenous, white, state, diversity, propagan...  ...      Randy_Anglo
[52411 rows x 3 columns]
for i in tweet_tp.index.values:
...     tweet_tp.text[i] = TreebankWordDetokenizer().detokenize(tweet_tp.text[i])
... 
>untokenized_tweet_tp
... 
                                                     text  ...       screenName
0       month open #postdoc position chemical characte...  ...        VRiffault
1       hardworking biofuel producers iowa state worki...  ...  LindaWa53201017
3       today time imperative resort alternate sources...  ...        ROBRAIPUR
4       special gaetanos beach club bell choosing #rec...  ...    buffbiodiesel
7        stena bulk introduce low carbon shipping options  ...      NPortuarias
                                                   ...  ...              ...
130060  reseter elite vegan make unacceptable eat meat...  ...      Randy_Anglo
130171  solar wind destroy supply limited output backe...  ...  RealRichardBail
130331  renewable energy defined wood wood waste munic...  ...      PaulSchmehl
130375                          guess aiding wood passion  ...     GraceIrene21
130384  homogenous white state diversity propaganda wi...  ...      Randy_Anglo
[52411 rows x 3 columns]

Notice I never mentioned untokenized_tweet_tp inside the for loop.

>type(tweet_tp)
<class 'pandas.core.frame.DataFrame'>
>type(untokenized_tweet_tp)
<class 'pandas.core.frame.DataFrame'>

untokenized_tweet_tp first gets declared like this untokenizd_tweet_tp=tweet_tp


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
untokenizd_tweet_tp=tweet_tp 

This is the key.

If you do not want changes to tweet_tp to affect untokenizd_tweet_tp then do

untokenizd_tweet_tp=tweet_tp.copy()

Otherwise any changes you make to one will affect the other

why should I make a copy of a data frame in pandas

This should be a good reference conceptually


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...