Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
301 views
in Technique[技术] by (71.8m points)

python - DBSCAN epsilon optimisation with precomputed proximity matrix

I want to use DBSCAN to find clusters in my dataset. First, I calculated a proximity matrix with unsupervised random forest, which gives me a N x N matrix of size 516,516. Then I use it in DBSCAN as precomputed input.

It is good practice to optimise epsilon in DBSCAN to get meaningful results. I found multiple posts online where this is performed on 2D data (x and y). However, when my data is of larger dimensions and I feel like the elbow plot doesn't make sense here.

# Use nearestneighbors for calculating distance between points
from sklearn.neighbors import NearestNeighbors

# Calculating distances
neigh=NearestNeighbors(n_neighbors=2)
distance=neigh.fit(Prox_mat)

# indices and distance values
distances,indices=distance.kneighbors(Prox_mat)

# Now sorting the distance increasing order
sorting_distances=np.sort(distances,axis=0)

# sorted distances
sorted_distances=sorting_distances[:,1]

# plot between distance vs epsilon
plt.plot(sorted_distances)
plt.xlabel('Distance')
plt.ylabel('Epsilon')
plt.show()

The elbow plot looks something like this:

Elbow plot

Then I use the epsilon of 1.3 as input in the DBSCAN.

clustering_model=DBSCAN(eps=1.3, metric="precomputed")
# fit the model to proximity matrix
clustering_model.fit(Prox_mat)
# predicted labels by DBSCAN
predicted_labels=clustering_model.labels_

# visualising clusters after PCA
plt.scatter(Prox_mat_PCA.iloc[:,0], Prox_mat_PCA.iloc[:,1],c=predicted_labels, cmap='Paired')
plt.title("DBSCAN")

DBSCAN scattterplot

Unfortunately, every instance is assigned number 0, meaning that it belongs to the same cluster.

I was wondering, would it be a good idea to perform PCA on proximity matrix (technically obtaining PCoAs) and then inputting the first 2 PCoAs in the DBSCAN to find the epsilon and subsequent clusters?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...