Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
419 views
in Technique[技术] by (71.8m points)

r - Ignore outliers in ggplot2 boxplot

How would I ignore outliers in ggplot2 boxplot? I don't simply want them to disappear (i.e. outlier.size=0), but I want them to be ignored such that the y axis scales to show 1st/3rd percentile. My outliers are causing the "box" to shrink so small its practically a line. Are there some techniques to deal with this?

Edit Here's an example:

y = c(.01, .02, .03, .04, .05, .06, .07, .08, .09, .5, -.6)
qplot(1, y, geom="boxplot")

enter image description here

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use geom_boxplot(outlier.shape = NA) to not display the outliers and scale_y_continuous(limits = c(lower, upper)) to change the axis limits.

An example.

n <- 1e4L
dfr <- data.frame(
  y = exp(rlnorm(n)),  #really right-skewed variable
  f = gl(2, n / 2)
)

p <- ggplot(dfr, aes(f, y)) + 
  geom_boxplot()
p   # big outlier causes quartiles to look too slim

p2 <- ggplot(dfr, aes(f, y)) + 
  geom_boxplot(outlier.shape = NA) +
  scale_y_continuous(limits = quantile(dfr$y, c(0.1, 0.9)))
p2  # no outliers plotted, range shifted

Actually, as Ramnath showed in his answer (and Andrie too in the comments), it makes more sense to crop the scales after you calculate the statistic, via coord_cartesian.

coord_cartesian(ylim = quantile(dfr$y, c(0.1, 0.9)))

(You'll probably still need to use scale_y_continuous to fix the axis breaks.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...