Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
307 views
in Technique[技术] by (71.8m points)

r - Adding cumulative quantities to a geom_bar plots drawn with facet_wrap

newbie here! After a long search I still could not find a satisfying solution to my problem. I have a dataset of heart failure rates (https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records) and I would like to display a series of geom plot where the "Sruvived" and "Dead" are counted per category (i.e. sex, smoking and so on).

I think i have done a decent job at preparing the plots, and they look right to me. The problem is, it is difficult to see the how the ratio between surviving and dying patient with different characteristics is.

I have two but both of them elude me:

  • Put a count on top of every bar so that the ratio becomes obvious
  • Directly show the ratio on every characteristic.

Here is the code I wrote.


    library(ggplot)
    
    heart_faliure_data <- read.csv(file = "heart_failure_clinical_records_dataset.csv", header = FALSE, skip=1)
    
    #Prepare Column Names
    c_names <- c("Age",
                 "Anaemia",
                 "creatinine_phosphokinase",
                 "diabetes",
                 "ejection_fraction",
                 "high_blood_pressure",
                 "platelets",
                 "serum_creatinine",
                 "serum_sodium",
                 "sex",
                 "smoking",
                 "time",
                 "DEATH_EVENT")
    
    
    #Apply column names to the dataframe
    colnames(heart_faliure_data) <- c_names
    
    
    # Some Classes like sex, Anaemia, diabetes, high_blood_pressure smoking and DEATH_EVENT are booleans
    # (see description of Dataset) and should be transformed into factors
    heart_faliure_data$sex <- factor(heart_faliure_data$sex, 
                                     levels=c(0,1), 
                                     labels=c("Female","Male"))
    heart_faliure_data$smoking <- factor(heart_faliure_data$smoking, 
                                         levels=c(0,1), 
                                         labels=c("No","Yes"))
    heart_faliure_data$DEATH_EVENT <- factor(heart_faliure_data$DEATH_EVENT, 
                                             levels=c(0,1), 
                                             labels=c("Survived","Died"))
    heart_faliure_data$high_blood_pressure <- factor(heart_faliure_data$high_blood_pressure, 
                                                     levels=c(0,1), 
                                                     labels=c("No","Yes"))
    heart_faliure_data$Anaemia <- factor(heart_faliure_data$Anaemia, 
                                         levels=c(0,1), 
                                         labels=c("No","Yes"))
    heart_faliure_data$diabetes <- factor(heart_faliure_data$diabetes, 
                                          levels=c(0,1), 
                                          labels=c("No","Yes"))
    # Adjust Age to a int value
    heart_faliure_data$Age <- as.integer(heart_faliure_data$Age)
    
    
    # selecting the categorical variables and study the effect of each variable on death-event
    categorical.heart_failure <- heart_faliure_data  %>%
      select(Anaemia,
             diabetes,
             high_blood_pressure,
             sex,
             smoking,
             DEATH_EVENT) %>%
      gather(key = "key", value = "value", -DEATH_EVENT)
    
    
    #Visualizing this effect with a grouped barplot
    categorical.heart_failure %>% 
      ggplot(aes(value)) +
      geom_bar(aes(x        = value, 
                   fill     = DEATH_EVENT), 
                   alpha    = .2, 
                   position = "dodge", 
                   color    = "black",
                   width    = .7,
                   stat = "count") +
      labs(x = "",
           y = "") +
      theme(axis.text.y  = element_blank(),
            axis.ticks.y = element_blank()) +
      facet_wrap(~ key, 
                 scales = "free", 
                 nrow = 4) +
      scale_fill_manual(values = c("#FFA500", "#0000FF"), 
                        name   = "Death Event", 
                        labels = c("Survived", "Dead"))

And here is a (not so bad) image of the result: enter image description here

The goal would be to have some numerical value on top of the bars. Or even just a a y indication...

I would be glad about any help you can give me!

question from:https://stackoverflow.com/questions/65875829/adding-cumulative-quantities-to-a-geom-bar-plots-drawn-with-facet-wrap

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What about something like this. To make it work, I aggregated the data first:

tmp <- categorical.heart_failure %>% 
  group_by(DEATH_EVENT, key, value) %>% 
  summarise(n = n())


#Visualizing this effect with a grouped barplot
tmp %>% 
  ggplot(aes(x = value, y=n)) +
  geom_bar(aes(fill     = DEATH_EVENT), 
           alpha    = .2, 
           position = position_dodge(width=1), 
           color    = "black",
           width    = .7,
           stat = "identity") +
  geom_text(aes(x=value, y=n*1.1, label = n, group=DEATH_EVENT), position = position_dodge(width=1), vjust=0) + 
  labs(x = "",
       y = "") +
  theme(axis.text.y  = element_blank(),
        axis.ticks.y = element_blank()) +
  facet_wrap(~ key, 
             scales = "free", 
             nrow = 4) +
  scale_fill_manual(values = c("#FFA500", "#0000FF"), 
                    name   = "Death Event", 
                    labels = c("Survived", "Dead")) + 
  coord_cartesian(ylim=c(0, max(tmp$n)*1.25))

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...