Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
497 views
in Technique[技术] by (71.8m points)

python - Assign unique identifier for dataframe rows based on dataframe with preassigned unique identifier

I have dataframe with unique identifier assigned based on three columns i.e., [col2,col3,col3]

Dataframe1:

col1      col2     col3     col4      col5         unique_id
1         abc       bcv      zxc      www.com        8
2         bcd       qwe      rty      www.@com       12
3         klp       oiu      ytr      www.io         15
4         zxc       qwe      rty      www.com        6

After data preprocessing, will import Dataframe_2 with same column values as shown above but without unique_id. Dataframe_2 rows must be assigned with unique identifier based on col2,col3,col4 and by referring to the Dataframe1.

If Dataframe_2 has new row which is not present in Dataframe1, then assign new identifier.

Dataframe_2:

col1      col2     col3     col4      col5         
1         bcd       qwe      rty      www.@com              
2         zxc       qwe      rty      www.com
3         abc       bcv      zxc      www.com 
4         kph       hir      mat      www.com            

Expected Dataframe_2:

col1      col2     col3     col4      col5         unique_id        
1         bcd       qwe      rty      www.@com        12     
2         zxc       qwe      rty      www.com         6
3         abc       bcv      zxc      www.com         8 
4         kph       hir      mat      www.com         35

Since Row4 is not present in Dataframe1, a new unique identifier is assigned.

question from:https://stackoverflow.com/questions/65914677/assign-unique-identifier-for-dataframe-rows-based-on-dataframe-with-preassigned

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
# assign the old unique_id
df2n = df2.join(df1.set_index(['col2', 'col3', 'col4', 'col5'])[['unique_id']],
         on=['col2', 'col3', 'col4', 'col5'], how='left')

# assign new unique_id with max df1.unique_id + 1
id_max = df1.unique_id.max() + 1
null_num = df2n['unique_id'].isnull().sum()

cond = df2n['unique_id'].isnull()
df2n.loc[cond,'unique_id'] = range(id_max, id_max + null_num)
df2n['unique_id'] = df2n['unique_id'].astype(int)

print(df2n)

      col1 col2 col3 col4      col5  unique_id
    0     1  bcd  qwe  rty  www.@com         12
    1     2  zxc  qwe  rty   www.com          6
    2     3  abc  bcv  zxc   www.com          8
    3     4  kph  hir  mat   www.com         16

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...