Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
181 views
in Technique[技术] by (71.8m points)

r - visualizing clusters extracted from MClust using ggplot2

I am analysing the distribution of my data using mclust (follow-up to Clustering with Mclust results in an empty cluster)
Here my data for download https://www.file-upload.net/download-14320392/example.csv.html

First, I evaluate the clusters present in my data:

library(reshape2)
library(mclust)
library(ggplot2)

data <- read.csv(file.choose(), header=TRUE,  check.names = FALSE)
data_melt <- melt(data, value.name = "value", na.rm=TRUE)

fit <- Mclust(data$value, modelNames="E", G = 1:7)
summary(fit, parameters = TRUE)

---------------------------------------------------- 
Gaussian finite mixture model fitted by EM algorithm 
---------------------------------------------------- 

Mclust E (univariate, equal variance) model with 4 components: 

log-likelihood    n df       BIC       ICL
-20504.71 3258  8 -41074.13 -44326.69

Clustering table:
1    2    3    4 
0 2271  896   91 

Mixing probabilities:
1         2         3         4 
0.2807685 0.4342499 0.2544305 0.0305511 

Means:
1        2        3        4 
1381.391 1381.715 1574.335 1851.667 

Variances:
1        2        3        4 
7466.189 7466.189 7466.189 7466.189 

Now having them identified, I would like to overlay the total distribution with distributions of the individual components. To do this, I tried to extract the assignment of each value to the respective cluster using:

df <- as.data.frame(data)
df$classification <- as.factor(df$value[fit$classification])

ggplot(df, aes(value, fill= classification)) + 
  geom_density(aes(col=classification, fill = NULL), size = 1)

As a result, I get the following: enter image description here

It looks to have worked, however, I wonder,
a) where the descriptions (1602, 1639 and 1823) of the individual classifications come from
b) how I can scale the individual densities as a fraction of the total (for example 1823 contributes only 91 values out of 3258 observations; see above)
c) if it makes sense to alternatively use predicted normal distributions based on the mean + SD obtained?

Any help or suggestions are highly appreciated!

question from:https://stackoverflow.com/questions/65540959/visualizing-clusters-extracted-from-mclust-using-ggplot2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you could get what you want in the following way:

library(magrittr)
data_melt <- data_melt %>% mutate(class = as.factor(fit$classification))
ggplot(data_melt, aes(x=value, colour=class, fill=class)) + 
    geom_density(aes(y=..count..), alpha=.25)

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...