Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
253 views
in Technique[技术] by (71.8m points)

r - Subset rows in a data frame based on a vector of values

I have two data sets that are supposed to be the same size but aren't. I need to trim the values from A that are not in B and vice versa in order to eliminate noise from a graph that's going into a report. (Don't worry, this data isn't being permanently deleted!)

I have read the following:

But I'm still not able to get this to work right. Here's my code:

bg2011missingFromBeg <- setdiff(x=eg2011$ID, y=bg2011$ID)
#attempt 1
eg2011cleaned <- subset(eg2011, ID != bg2011missingFromBeg)
#attempt 2
eg2011cleaned <- eg2011[!eg2011$ID %in% bg2011missingFromBeg]

The first try just eliminates the first value in the resulting setdiff vector. The second try yields and unwieldy error:

Error in `[.data.frame`(eg2012, !eg2012$ID %in% bg2012missingFromBeg) 
:  undefined columns selected
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This will give you what you want:

eg2011cleaned <- eg2011[!eg2011$ID %in% bg2011missingFromBeg, ]

The error in your second attempt is because you forgot the ,

In general, for convenience, the specification object[index] subsets columns for a 2d object. If you want to subset rows and keep all columns you have to use the specification object[index_rows, index_columns], while index_cols can be left blank, which will use all columns by default.

However, you still need to include the , to indicate that you want to get a subset of rows instead of a subset of columns.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...