I have a dataframe full of categories, each with an associated p value. I would like to create a new dataframe that has two columns:
- A sequence of pvalues
- The number of categories in the first dataframe that have pvalues below that threshold
So, ideally something like:
|pVal |SigCats|
|0.05 | 100 |
|0.01 | 80 |
|0.001| 50 |
How do I generate this dataframe?
Here's an example dataset:
n = 20
sourceDat <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
p_value=sample(1:10, n, replace=TRUE)/500)
I know I can count the number of categories that meet a certain criteria with:
sum(sourceDat$p_value < 0.01) #for categories with pvalues less than 0.01
But I don't know how to use this function to populate a dataframe. My attempt below gives me an error...
pVals <- c(0.05,10^seq(from = -2, to = -20,by= -1))
pValDat <- data.frame(x=pVals)
pValDat <- pValDat %>%
dplyr::mutate(sigCats = sum(sourceDat$p_value < x))
I'm most familiar with base R and tidyverse