User:NikosA/About Clustering

From GRASS-Wiki
Jump to: navigation, search

Drafting a page about clustering, i.cluster's similarities and differences with other well known clustering algorithms.

Overview

Basic Definitions

  • What is clustering? Clustering is essentialy groupping objects based on (their) observed similar properties. In the field of digital image classification, the objects are the pixels (or observations) of an image. As per a geospatial compatible terminology, the objects are the cells of a raster map.

What is i.cluster?

From the manual

  • i.cluster is the program that generates the spectral signatures for the land cover types in the image using a clustering algorithm.
  • It results in a signature file that is used as input for the second pass program i.maxlik.
  • The clustering algorithm operates by reading through the imagery data and then building pixel clusters based on the spectral reflectances of the pixels.
  • The spectral distributions of the clusters (which will be the land cover spectral signatures) are influenced by six parameters set by the user.
  • The first parameter set by the user is the initial number of clusters to be discriminated.
  • i.cluster starts by generating spectral signatures for this number of clusters and "trys" to end up with this number of clusters during the clustering process.
  • The resulting number of clusters and their spectral distributions, however, are also influenced by the range of the spectral values in the image and the other parameters set by the user.
  • These parameters are: the minimum cluster size, minimum cluster separation, the percent convergence, the number of iterations, and the row and column sampling interval.
  • The cluster spectral signatures that result are composed of cluster means and covariance matrices.
  • These cluster means and covariance matrices are used in the second pass program i.maxlik to classify the image.


Moritz Lennert's notes

  • i.cluster does not cluster all pixels, but only a sample (see parameter 'sample').
  • The result of that clustering is not that all pixels are assigned to a given cluster, but only that you have signatures that are "representative" of a given cluster.
  • If you run i.cluster on the same data asking for the same number of classes, but with different sample sizes, you will probably get slightly different signatures for each cluster at each run.
  • In the second step, you use i.maxlik to then assign each pixel to one of the clusters / classes created by i.cluster.
  • Labelling is actually a third step in that process.
  • So, i.cluster is used for creating signatures of representative classes
  • ...does not allow you to produce a raster layer indicating for each pixel the cluster it is assigned to.
  • However, i.maxlik does not use the same algorithm to assign pixels to clusters / classes as ISODATA. So the result is not exactly the same.

What is not...

Various notes to be sorted...

  • i.cluster is not an implementation of the ISODATA clustering algorithm

Other Clustering Algorithms

Similarities with...

  • i.cluster is a modification of the K-means clustering algorithm

Differences with...

Various notes, to be sorted...

i.cluster vs ISODATA

  • i.cluster expects at least two input variables (maps), ISODATA operates also on a single variable (map)
  • ...
  • ...
  • ISODATA, OTOH, clusters all pixels and thus already assigns each pixel to a given cluster / class, without going through the i.maxlik phase.

Discussions about i.cluster