Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
404 views
in Technique[技术] by (71.8m points)

How do I populate a dataframe in R based on a summary of values in another dataframe?

I have a dataframe full of categories, each with an associated p value. I would like to create a new dataframe that has two columns:

  1. A sequence of pvalues
  2. The number of categories in the first dataframe that have pvalues below that threshold

So, ideally something like:

|pVal |SigCats|
|-----|-------|
|0.05 |  100  |
|0.01 |  80   |
|0.001|  50   |

How do I generate this dataframe?

Here's an example dataset:

  set.seed(42) 
  n = 20
  sourceDat <- data.frame(id=1:n, 
                    group=rep(LETTERS[1:2], n/2),
                    p_value=sample(1:10, n, replace=TRUE)/500)

I know I can count the number of categories that meet a certain criteria with:

sum(sourceDat$p_value < 0.01) #for categories with pvalues less than 0.01

But I don't know how to use this function to populate a dataframe. My attempt below gives me an error...

pVals   <- c(0.05,10^seq(from = -2, to = -20,by= -1))
pValDat <- data.frame(x=pVals)
pValDat <- pValDat %>%
    dplyr::mutate(sigCats = sum(sourceDat$p_value < x))

I'm most familiar with base R and tidyverse


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I figured it out using dplyr's rowwise function:

pValDat <- data.frame(x=10^seq(from = -2, to = -20,by= -1))
pValDat %>%
    rowwise %>%
    mutate(sigCats = sum((sourceDat$pvalue < x)))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...