Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
275 views
in Technique[技术] by (71.8m points)

r - Rearranging Three Similar Tables Based on ID and Stacked Values

Hello I have three tables and there is a process I need to identify to be able to structure these in preparation for a merge. Once I figure this out I feel that it will help me for a long time.

The first table, I have a table which, for each vehicleID involved in a traffic crash, inventories the damages (fender, rear end, etc) sustained to said vehicleID in a traffic crash. I would like to make, for each VEHICLEID, three new IMPACTCODE variables which display the up to three IMPACTTYPE codes (format char) listed in the table below:

itemindex    damageid    impacttype    vehicleid
1            2433        1             VEH1
2            2434        11            VEH1
3            2435        13            VEH1
1            2521        10            VEH012
2            2522        8             VEH012

I would like help to structure this into the following, where there is one line for each vehicleID:

vehicleid    impacttype_1     impacttype_2    impacttype_3    
VEH1         1                11              13
VEH012       10               8               NA

For the second table, I have a list of personID's involved in a traffic crash, and several contributing circumstance codes - which depict whether that personID was reported to be texting, speeding, normal, or other contributing circumstance to said crash. The amount of codes that can be assigned can go up to an unlimited amount, but I want to keep just the first four. I would like to do something similar as the first, but in case of this table, there are a few more variables I am dealing with. So I would like to learn how the two processes differ. Note that in the case of PERSON_06 the codes do not always start in the first position:

itemindex    circumstancecode    circumstanceid    circumstancetype    personid    reportnumber    
1            18                  4231703           Person              PERSON_01   REPORT1
2            17                  4231704           Person              PERSON_01   REPORT1
3            40.88               4231705           Person              PERSON_01   REPORT1
4            70.88               4231706           Person              PERSON_01   REPORT1
1            0                   4231707           Person              PERSON_02   REPORT1
1            61                  4231708           Person              PERSON_03   REPORT2
1            17                  4231709           Person              PERSON_04   REPORT3
1            40.88               4231710           Person              PERSON_05   REPORT4
1            0                   4231711           Person              PERSON_06   REPORT4
2            0                   4231712           Person              PERSON_06   REPORT4
3            55.88               4231713           Person              PERSON_06   REPORT4

I would like to structure this table like the following (code 1 thru 4 is based on circumstancecode) Note that for situations where there is no code to fill the code1-4 variables, I place an NA:

personid    reportnumber     circumstancetype    code_1    code1    code_3    code_4
PERSON_01   REPORT1          PERSON              18        17       40.88     70.88
PERSON_02   REPORT1          PERSON              0         NA       NA        NA
PERSON_03   REPORT2          PERSON              61        NA       NA        NA
PERSON_04   REPORT3          PERSON              17        NA       NA        NA
PERSON_05   REPORT4          PERSON              40.88     NA       NA        NA
PERSON_06   REPORT4          PERSON              0         0        55.88     NA

I would like to keep all the code_ values as character because some values have letters in them, and I want to keep the exact formatting that is in the original data (so 40.88 is a char).

For the final table, there is a little bit different of a need. I would like to transform this table, which lists personID's who witnessed the car crash.

itemindex    personid    reportnumber
1            PERSON_55   REPORT_1
2            PERSON_56   REPORT_1
1            PERSON_44   REPORT_2
1            PERSON_32   REPORT_3
2            PERSON_34   REPORT_3
3            PERSON_35   REPORT_3
4            PERSON_36   REPORT_3
1            PERSON_21   REPORT_4
2            PERSON_22   REPORT_4
3            PERSON_23   REPORT_4

I would like each line to be a single crash reportnumber up to three variables as to list up to three witnesses:

reportnumber    witness1    witness2    witness3
REPORT_1        PERSON_55   PERSON_56   NA
REPORT_2        PERSON_44   NA          NA
REPORT_3        PERSON_32   PERSON_34   PERSON_35
REPORT_4        PERSON_21   PERSON_23   NA

Thank you so much for your help!!!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your 3 examples are similar. I will provide some general explanation, and then the code for each example.

Using tidyverse, you can put your data into wide format with pivot_wider. In this case, you will first want to enumerate rows within each group, which can be done from row_number(). Then, with pivot_wider, you can specify the id_cols which is vehicleid. The column names are derived from the row numbers, and adding the prefix "impacttype_" in front of them. The values are taken from impacttype.

library(tidyverse)

df1 %>%
  group_by(vehicleid) %>%
  mutate(rn = row_number()) %>%
  pivot_wider(id_cols = vehicleid, names_from = rn, values_from = impacttype, names_prefix = "impacttype_")

Output

  vehicleid impacttype_1 impacttype_2 impacttype_3
  <chr>            <int>        <int>        <int>
1 VEH1                 1           11           13
2 VEH012              10            8           NA

In the second example, you can use slice_head to get the first 4 rows in each group. Otherwise, the pivot_wider is relatively similar.

df2 %>%
  group_by(personid) %>%
  slice_head(n = 4) %>%
  mutate(rn = row_number()) %>%
  pivot_wider(id_cols = c(personid, reportnumber, circumstancetype), names_from = rn, values_from = circumstancecode, names_prefix = "code_")

Output

  personid  reportnumber circumstancetype code_1 code_2 code_3 code_4
  <chr>     <chr>        <chr>             <dbl>  <dbl>  <dbl>  <dbl>
1 PERSON_01 REPORT1      Person             18       17   40.9   70.9
2 PERSON_02 REPORT1      Person              0       NA   NA     NA  
3 PERSON_03 REPORT2      Person             61       NA   NA     NA  
4 PERSON_04 REPORT3      Person             17       NA   NA     NA  
5 PERSON_05 REPORT4      Person             40.9     NA   NA     NA  
6 PERSON_06 REPORT4      Person              0        0   55.9   NA 

In the third example, you already have an index to use for pivot_wider.

df3 %>%
  pivot_wider(id_cols = reportnumber, names_from = itemindex, values_from = personid, names_prefix = "witness")

Output

  reportnumber witness1  witness2  witness3  witness4 
  <chr>        <chr>     <chr>     <chr>     <chr>    
1 REPORT_1     PERSON_55 PERSON_56 NA        NA       
2 REPORT_2     PERSON_44 NA        NA        NA       
3 REPORT_3     PERSON_32 PERSON_34 PERSON_35 PERSON_36
4 REPORT_4     PERSON_21 PERSON_22 PERSON_23 NA 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...