k-means clustering
Partitioning n
observations into k
clusters, in which each observation belongs to the cluster with the nearest mean
- NP-hard
- exist heuristic algorithms, that provide quicker computation, for an approximation
- tends to form clusters of comparable spatial extent
- expectation-maximization mechanism allows for different shapes
- aim to minimize the sum of square in each cluster
Start with a set of k means m1 ... mk.
- Assign observation to the cluster whose mean point is nearest
- Update new means to be the centroids of the observations in the new cluster