Butina cluster
Clustering has been described as 'the art of finding groups in data' and is widely used within the pharmaceutical industry to design different representative sets. Most common uses of representative sets could be as training sets in the development of different structure-activity models and for screening in different biological screens. In both cases, one would assume that the cluster centroid is a good representative member of the corresponding cluster. It is therefore of great importance to be able to create homogeneous clusters in a consistent way and to deal with either small or very large sets equally well. Butina Cluster approach uses desired similarity within the cluster, as defined by Tanimoto index, as the only input to the clustering program.
See also:
References: