Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
256 views
in Technique[技术] by (71.8m points)

R, dplyr: Using dynamic variables to access a table name

I'm trying to use dynamic variables in a for loop to access a tablename. Other questions on SO (eg here, here and here) seem to be about using dynamic variables to access a column name. I'm using R v4.0.3 and dplyr v1.0.2.

Basically, I'm importing from a .sav (SPSS) file, and seeking to split the 400+ columns into smaller dataframes holding the info for each survey question. That part works, but I then want to do things like add a mean column for each new dataframe. I'm currently trying to do that in the segmentation for loop, but I can't get it to work. (I'd also be happy to do it separately for each new dataframe in another for loop or a list or something, but I can't see how that would work either if I can't get the other one to work!)

Simplifying somewhat, the columns in the original file are named as QX.Y_Z, where Z are items within question Y in the X block.

Some dummy data, setting up a (sav-type) dataframe of 2 questions, each with two items:

    mydata=tibble(Q6.1_1_1=as.numeric(c(2, 1, 3, 1, 2, 3, 1, 3, 2, 1, 1, 1, 2, 2)),
              Q6.1_1_2=as.numeric(c(1, 3, 1, 1, 1, 2, 3, 3, 1, 3, 1, 1, 1, 2)),
              Q7.1_1_1=as.numeric(c(1, 2, 1, 2, 1, 3, 3, 1, 2, 3, 2, 1, 3, 2)),
              Q7.1_1_2=as.numeric(c(3, 1, 3, 1, 2, 1, 3, 2, 3, 1, 3, 1, 1, 3)),
              )
    var_label(mydata$Q6.1_1_1)<-"Rate your effort - before."
    var_label(mydata$Q6.1_1_2)<-"Rate your effort - before." 
    var_label(mydata$Q7.1_1_1)<-"Rate your enthusiasm - before." 
    var_label(mydata$Q7.1_1_2)<-"Rate your enthusiasm - after." 
    val_labels(mydata$Q6.1_1_1)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q6.1_1_2)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q7.1_1_1)<-c(Low=1, Medium=2, High=3)
    val_labels(mydata$Q7.1_1_2)<-c(Low=1, Medium=2, High=3)
mydata
# A tibble: 14 x 4
     Q6.1_1_1   Q6.1_1_2   Q7.1_1_1   Q7.1_1_2
    <dbl+lbl>  <dbl+lbl>  <dbl+lbl>  <dbl+lbl>
 1 2 [Medium] 1 [Low]    1 [Low]    3 [High]  
 2 1 [Low]    3 [High]   2 [Medium] 1 [Low]   
 3 3 [High]   1 [Low]    1 [Low]    3 [High]  
 4 1 [Low]    1 [Low]    2 [Medium] 1 [Low]   
 5 2 [Medium] 1 [Low]    1 [Low]    2 [Medium]
 6 3 [High]   2 [Medium] 3 [High]   1 [Low]   
 7 1 [Low]    3 [High]   3 [High]   3 [High]  
 8 3 [High]   3 [High]   1 [Low]    2 [Medium]
 9 2 [Medium] 1 [Low]    2 [Medium] 3 [High]  
10 1 [Low]    3 [High]   3 [High]   1 [Low]   
11 1 [Low]    1 [Low]    2 [Medium] 3 [High]  
12 1 [Low]    1 [Low]    1 [Low]    1 [Low]   
13 2 [Medium] 1 [Low]    3 [High]   1 [Low]   
14 2 [Medium] 2 [Medium] 2 [Medium] 3 [High] 

Remove the item number from the question string:

varlist<-mydata %>% 
  colnames() %>% 
  as_tibble() %>% 
  separate(value, "qno", sep="_", extra = "drop", fill="right") %>%
  unique() %>% 
  pull()
> varlist
[1] "Q6.1" "Q7.1"

Generate the subtables:

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
} 

This gives me subtables called tableQ6.1 and tableQ7.1. So far, so good.

However, when I try to add a mean column (giving the mean of each row) for each of these subtables as they are generated, I can't find a way to tell mutate() to use the dynamic name of the table. These are a couple of the options I've tried, but all I get (with these and a LOT more) is errors, so I must be missing something obvious:

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
  tabname<-sym(paste0("table", v))
  mutate({{tabname}}, mean=rowMeans(across(where(is.numeric)), na.rm = FALSE))
}

for (v in varlist) {
  assign(paste0("table", v), select(mydata, matches(v)))
  tabname<-"table{v}" %>%
    mutate("mean{v}":=rowMeans(across(where(is.numeric)), na.rm = FALSE))
}

Any guidance (including wider comments on whether this is the best approach) would be welcome!

question from:https://stackoverflow.com/questions/65644014/r-dplyr-using-dynamic-variables-to-access-a-table-name

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use split.default to split dataframe with similar column names and then take row wise mean of each subset. This would avoid you to create intermediate dataframes and variables.

sapply(split.default(mydata, sub('\..*', '', names(mydata))), rowMeans)

#       Q6  Q7
# [1,] 1.5 2.0
# [2,] 2.0 1.5
# [3,] 2.0 2.0
# [4,] 1.0 1.5
# [5,] 1.5 1.5
# [6,] 2.5 2.0
# [7,] 2.0 3.0
# [8,] 3.0 1.5
# [9,] 1.5 2.5
#[10,] 2.0 2.0
#[11,] 1.0 2.5
#[12,] 1.0 1.0
#[13,] 1.5 2.0
#[14,] 2.0 2.5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...