python - DBSCAN epsilon optimisation with precomputed proximity matrix

Question

Welcome To Ask or Share your Answers For Others

python - DBSCAN epsilon optimisation with precomputed proximity matrix

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - DBSCAN epsilon optimisation with precomputed proximity matrix

I want to use DBSCAN to find clusters in my dataset. First, I calculated a proximity matrix with unsupervised random forest, which gives me a N x N matrix of size 516,516. Then I use it in DBSCAN as precomputed input.

It is good practice to optimise epsilon in DBSCAN to get meaningful results. I found multiple posts online where this is performed on 2D data (x and y). However, when my data is of larger dimensions and I feel like the elbow plot doesn't make sense here.

# Use nearestneighbors for calculating distance between points
from sklearn.neighbors import NearestNeighbors

# Calculating distances
neigh=NearestNeighbors(n_neighbors=2)
distance=neigh.fit(Prox_mat)

# indices and distance values
distances,indices=distance.kneighbors(Prox_mat)

# Now sorting the distance increasing order
sorting_distances=np.sort(distances,axis=0)

# sorted distances
sorted_distances=sorting_distances[:,1]

# plot between distance vs epsilon
plt.plot(sorted_distances)
plt.xlabel('Distance')
plt.ylabel('Epsilon')
plt.show()

The elbow plot looks something like this:

Elbow plot

Then I use the epsilon of 1.3 as input in the DBSCAN.

clustering_model=DBSCAN(eps=1.3, metric="precomputed")
# fit the model to proximity matrix
clustering_model.fit(Prox_mat)
# predicted labels by DBSCAN
predicted_labels=clustering_model.labels_

# visualising clusters after PCA
plt.scatter(Prox_mat_PCA.iloc[:,0], Prox_mat_PCA.iloc[:,1],c=predicted_labels, cmap='Paired')
plt.title("DBSCAN")

DBSCAN scattterplot

Unfortunately, every instance is assigned number 0, meaning that it belongs to the same cluster.

I was wondering, would it be a good idea to perform PCA on proximity matrix (technically obtaining PCoAs) and then inputting the first 2 PCoAs in the DBSCAN to find the epsilon and subsequent clusters?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - DBSCAN epsilon optimisation with precomputed proximity matrix

python - DBSCAN epsilon optimisation with precomputed proximity matrix

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags