Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
183 views
in Technique[技术] by (71.8m points)

python - How to plot the cost / inertia values in sklearn kmeans?

Is it possible to draw kmeans cost value? I want to draw the cost value based on iteration of kmeans like below diagramenter image description here

Can you please refer to some relevant thread? Thank you


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Inertia in Kmeans

By cost I assume you want to plot the inertia values for each iteration that happens in a Kmeans run.

The K-means algorithm aims to choose centroids that minimize the inertia, or within-cluster sum-of-squares criterion. Inertia can be recognized as a measure of how internally coherent clusters are.

enter image description here

This is what the KMeans tries to minimize with each iteration.

More details here.


Printing inertia values for each iteration

You can get the final inertia value after fitting a KMeans() by using kmeans.inertia_ but if you want to get inertia values for each iteration, one way is to set verbose=2.

def train_kmeans(X):
    kmeans = KMeans(n_clusters=5, verbose=2, n_init=1)
    kmeans.fit(X)
    return kmeans

X = np.random.random((1000,7))
train_kmeans(X)
Initialization complete
Iteration 0, inertia 545.5728914456803
Iteration 1, inertia 440.5225419317938
Iteration 2, inertia 431.87478970379755
Iteration 3, inertia 427.52125502838504
Iteration 4, inertia 425.75105209622967
Iteration 5, inertia 424.7788124997543
Iteration 6, inertia 424.2111904252263
Iteration 7, inertia 423.7217490965455
Iteration 8, inertia 423.29439165408354
Iteration 9, inertia 422.9243615021072
Iteration 10, inertia 422.54144662407566
Iteration 11, inertia 422.2677910840504
Iteration 12, inertia 421.98686844470336
Iteration 13, inertia 421.76289612029376
Iteration 14, inertia 421.59241427498324
Iteration 15, inertia 421.36516415785724
Iteration 16, inertia 421.23801796298704
Iteration 17, inertia 421.1065220191125
Iteration 18, inertia 420.85788031236586
Iteration 19, inertia 420.6053961581343
Iteration 20, inertia 420.4998816171483
Iteration 21, inertia 420.4436034595902
Iteration 22, inertia 420.39833211852346
Iteration 23, inertia 420.3583721574586
Iteration 24, inertia 420.32684273674226
Iteration 25, inertia 420.2786269304449
Iteration 26, inertia 420.24149714604516
Iteration 27, inertia 420.22255866139835
Iteration 28, inertia 420.2075247585145
Iteration 29, inertia 420.19985517233584
Iteration 30, inertia 420.18983415887305
Iteration 31, inertia 420.18584733421886
Converged at iteration 31: center shift 8.716337631121295e-33 within tolerance 8.370287188573764e-06

NOTE: KMeans re-initializes its centroids multiple times and runs up to max_iters for each initialization. For a single list of inertia values, you will have to set n_iter=1 to ensure a single initialization during fitting the model. If you set n_iter to higher values, you will see multiple lists of initialization and iterations in the printed output.


Plotting inertia values for each iteration

The problem is, that (to my knowledge) there is no way of storing these inertia values into a variable using a parameter in sklearn. Therefore you may need to write a wrapper around it to redirect the verbose stdout into an output variable as text and then extracting the inertia values for each iteration.

You can use StringIO to capture this printed output from verbose=2, extract and plot.

Here is the complete code -

import io
import sys
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

#Dummy data
X = np.random.random((1000,7)) 

def train_kmeans(X):
    kmeans = KMeans(n_clusters=5, verbose=2, n_init=1) #<-- init=1, verbose=2
    kmeans.fit(X)
    return kmeans

#HELPER FUNCTION
#Takes the returned and printed output of a function and returns it as variables
#In this case, the returned output is the model and printed is the verbose intertia at each iteration

def redirect_wrapper(f, inp):
    old_stdout = sys.stdout
    new_stdout = io.StringIO()
    sys.stdout = new_stdout

    returned = f(inp)                #<- Call function
    printed = new_stdout.getvalue()  #<- store printed output

    sys.stdout = old_stdout
    return returned, printed


returned, printed = redirect_wrapper(train_kmeans, X)

#Extract inertia values
inertia = [float(i[i.find('inertia')+len('inertia')+1:]) for i in printed.split('
')[1:-2]]

#Plot!
plt.plot(inertia)

enter image description here


EDIT: I have updated my answer to write a general helper function that calls a given function (that returns and prints something) and returns its printed data and returned data separately. In this case the model is returned and the printed is stored as text in a variable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...