K- Means Clustering Applications
k-means clustering is rather easy to apply to even large data sets, particularly when using heuristics such as Lloyd's algorithm. It has been successfully used in market segmentation, computer vision, and astronomy among many other domains. It often is used as a preprocessing step for other algorithms, for example to find a starting configuration.
k-means originates from signal processing, and still finds use in this domain. For example, in computer graphics, color quantization is the task of reducing the color palette of an image to a fixed number of colors k. The k-means algorithm can easily be used for this task and produces competitive results. A use case for this approach is image segmentation. Other uses of vector quantization include non-random sampling, as k-means can easily be used to choose k different but prototypical objects from a large data set for further analysis.
In cluster analysis, the k-means algorithm can be used to partition the input data set into k partitions
However, the pure k-means algorithm is not very flexible, and as such is of limited use (except for when vector quantization as above is actually the desired use case). In particular, the parameter k is known to be hard to choose (as discussed above) when not given by external constraints. Another limitation is that it cannot be used with arbitrary distance functions or on non-numerical data. For these use cases, many other algorithms are superior.
k-means clustering has been used as a feature learning (or dictionary learning) step, in either (semi-)supervised learning or unsupervised learning. The basic approach is first to train a k-means clustering representation, using the input training data (which need not be labelled). Then, to project any input datum into the new feature space, an "encoding" function, such as the thresholded matrix-product of the datum with the centroid locations, computes the distance from the datum to each centroid, or simply an indicator function for the nearest centroid, or some smooth transformation of the distance. Alternatively, transforming the sample-cluster distance through a Gaussian RBF, obtains the hidden layer of a radial basis function network.
This use of k-means has been successfully combined with simple, linear classifiers for
semi-supervised learning in NLP (specifically for named entity recognition)and in computer vision. On an object recognition task, it was found to exhibit comparable performance with more sophisticated feature learning approaches such as autoencoders and restricted Boltzmann machines. However, it generally requires more data, for equivalent performance, because each data point only contributes to one "feature".
The following implementations are available under Free/Open Source Software licenses,with publicly available source code.
- Accord.NET contains C# implementations for k-means, k-means++ and k-modes.
- ALGLIB contains parallelized C++ and C# implementations for k-means and k-means++.
- AOSP contains a Java implementation for k-means.
- CrimeStat implements two spatial k-means algorithms, one of which allows the user to define the starting locations.
- ELKI contains k-means (with Lloyd and MacQueen iteration, along with different initializations such as k-means++ initialization) and various more advanced clustering algorithms.
- Julia contains a k-means implementation in the JuliaStats Clustering package.
- KNIME contains nodes for k-means and k-medoids.
- Mahout contains a MapReduce based k-means.
- mlpack contains a C++ implementation of k-means.
- Octave contains k-means.
- OpenCV contains a k-means implementation.
- Orange includes a component for k-means clustering with automatic selection of k and cluster silhouette scoring.
- PSPP contains k-means, The QUICK CLUSTER command performs k-means clustering on the dataset.
- R contains three k-means variations.
- SciPy and scikit-learn contain multiple k-means implementations.
- Spark MLlib implements a distributed k-means algorithm.
- Torch contains an unsup package that provides k-means clustering.
- Weka contains k-means and x-means.
|Python AdaBoost Mathematics Behind AdaBoost||421||1|
|Python PyCaret How to optimize the probability threshold % in binary classification||2071||0|
|Python K-means Predicting Iris Flower Species||1323||2|
|Python PyCaret How to ignore certain columns for model building||2636||0|
|Python PyCaret Experiment Logging||680||0|
|Python PyWin32 Open a File in Excel||941||0|
|Python Guppy GSL Introduction||220||2|
|Python Usage of Guppy With Example||1102||2|
|Python Naive Bayes Tutorial||553||2|
|Python Guppy Recent Memory Usage of a Program||893||2|
|Introduction to AdaBoost||290||1|
|Python AdaBoost Implementation of AdaBoost||513||1|
|Python AdaBoost Advantages and Disadvantages of AdaBoost||3715||1|
|Python K-Means Clustering Applications||334||2|
|Python Random Forest Algorithm Decision Trees||440||0|
|Python K-means Clustering PREDICTING IRIS FLOWER SPECIES||457||1|
|Python Random Forest Algorithm Bootstrap||476||0|
|Python PyCaret Util Functions||441||0|
|Python K-means Music Genre Classification||1764||1|
|Python PyWin Attach an Excel file to Outlook||1542||0|
|Python Guppy GSL Document and Test Example||248||2|
|Python Random Forest Algorithm Bagging||387||0|
|Python AdaBoost An Example of How AdaBoost Works||280||1|
|Python PyWin32 Getting Started PyWin32||603||0|
|Python Naive Bayes in Machine Learning||376||2|
|Python PyCaret How to improve results from hyperparameter tuning by increasing "n_iter"||1724||0|
|Python PyCaret Getting Started with PyCaret 2.0||357||1|
|Python PyCaret Tune Model||1326||1|
|Python PyCaret Create your own AutoML software||321||0|
|Python PyCaret Intoduction to PyCaret||297||1|
|Python PyCaret Compare Models||2697||1|
|Python PyWin Copying Data into Excel||1154||0|
|Python Guppy Error: expected function body after function declarator||414||2|
|Python Coding Random forest classifier using xgBoost||247||0|
|Python PyCaret How to tune "n parameter" in unsupervised experiments||659||0|
|Python PyCaret How to programmatically define data types in the setup function||1403||0|
|Python PyCaret Ensemble Model||806||1|
|Python Random forest algorithm Introduction||229||0|
|Python k-means Clustering Example||340||1|
|Python PyCaret Plot Model||1245||1|
|Python Hamming Distance||715||0|
|Python Understanding Random forest algorithm||311||0|
|Python PyCaret Sort a Dictionary by Keys||245||0|
|Python Coding Random forest classifier using sklearn||341||0|
|Python Guppy Introduction||368||2|
|Python How to use Guppy/Heapy for tracking down Memory Usage||1069||2|
|Python AdaBoost Summary and Conclusion||232||1|
|Python PyCaret Create Model||366||1|
|Python k -means Clusturing Introduction||326||2|
|Python k-means Clustering With Example||351||2|