Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

r - How to get unique occurrences of these character strings separated by ";"?

So I have a column with values in this structure:

tribble(
  ~col,
  "AA_BB;AA_AA;AA_BB",
  "BB_BB;AA_AA",
  "AA_BB",
  "BB_AA;BB_AA;AA_AA;BB_AA") 
)

So each row has items separated by a ";". The first for has items AA_BB, AA_AA and AA_BB. I want the first row to be transformed to "AA_BB;AA_AA" and the last row to be transformed to "BB_AA;AA_AA".

I thought about using separate but I the result didn't really help me (especially since I don't know how many columns there can be at most).

df %>%
  separate(col, into = c("A", "B", "C", "D"), sep = ";")

Any tips on how to do this?

question from:https://stackoverflow.com/questions/65876198/how-to-get-unique-occurrences-of-these-character-strings-separated-by

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can split the column, get the unique elements and paste

library(dplyr)
library(stringr)
library(purrr)
df %>% 
   mutate(col = map_chr(strsplit(col, ";"), ~ str_c(unique(.x), collapse=";")))

-output

# A tibble: 4 x 1
#  col        
#  <chr>      
#1 AA_BB;AA_AA
#2 BB_BB;AA_AA
#3 AA_BB      
#4 BB_AA;AA_AA

Or split with separate_rows, then do a group by paste after getting the distinct rows

library(tidyr)
df %>%
    mutate(rn = row_number()) %>% 
    separate_rows(col, sep=";") %>% 
    distinct %>% 
    group_by(rn) %>% 
    summarise(col = str_c(col, collapse=";"), .groups = 'drop') %>% 
    select(col)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...