Clustering datasets

Image data
[bridge.pgm]
Bridge
(256x256)

4096 vectors, 16-d
4x4 pixel blocks  ts  txt
4x4 binarized pixel blocks  ts  txt
4x4 pixel blocks: 25% randomly sampled (for training)  ts  txt
4x4 pixel blocks: 75% randomly sampled (for testing)  ts  txt
[house.ppm]
House
(256x256)

34112 vectors, 3-d
RGB-values, quantized to 5 bits per color  ts  txt
RGB-values, 8 bits per color  ts  txt
[missa001.pgm]
Miss America
(360x288)

6480 vectors, 16-d
4x4 pixel blocks from the difference image of frame 1 and 2  ts  txt
4x4 pixel blocks from the difference image of frame 2 and 3  ts  txt
 
Birch-sets

Birch1

Birch2
Synthetic 2-d data with 100 000 vectors and 100 clusters.

Zhang et al., "BIRCH: A new data clustering algorithm and its applications", Data Mining and Knowledge Discovery, 1 (2), 141-182, 1997.

Birch3
 
Birch1: Clusters in regular grid structure  ts  txt
Birch2: Clusters at a sine curve  ts  txt
Birch3: Random sized clusters in random locations  ts  txt
 
S-sets
S1
S1
S3
S3
S2
S2
S4
S4
Synthetic 2-d data with 5000 vectors and 15 Gaussian clusters with different degree of cluster overlapping.

P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006.

S1:  ts  txt
S2:  ts  txt
S3:  ts  txt
S4:  ts  txt

Source and labels:  zip
 
A-sets
A1
A1
3000 vectors,
20 clusters
A2
A2
5250 vectors,
35 clusters
Synthetic 2-d data with varying number of clusters and vectors.

A1:  ts  txt
A2:  ts  txt
A3:  ts  txt
A3
A3
7500 vectors,
50 clusters
   
 
Dim-sets
  Dim2
Dim2
Synthetic data with Gaussian clusters in multi-dimensional space.
1351-10126 vectors, 2-d - 15-d

ts  txt
DIM-sets (other)
DIM032
DIM032
1024 vectors,
16 clusters
32 dimensions
DIM064
DIM064
1024 vectors,
16 clusters
64 dimensions
Dim-sets.

DIM032:  ts  txt
DIM064:  ts  txt
DIM128:  ts  txt
DIM256:  ts  txt
DIM512:  ts  txt
DIM1024:  ts  txt
DIM128
DIM128
1024 vectors,
16 clusters
128 dimensions
DIM256
DIM256
1024 vectors,
16 clusters
256 dimensions
 
DIM512
DIM512
1024 vectors,
16 clusters
512 dimensions
DIM1024
DIM1024
1024 vectors,
16 clusters
1024 dimensions
 
 
KDDCUP04Bio set
KDDCUP04Bio
KDDCUP04Bio
145751 vectors,
2000 clusters
74 dimensions
  KDDCUP04Bio biology dataset.

KDDCUP04Bio:  ts  txt
Thyroid set
Thyroid
Thyroid
215 vectors,
2 clusters
5 dimensions
  Thyroid dataset.

Thyroid:  ts  txt
Wine set
Wine
Wine
178 vectors,
3 clusters
13 dimensions
  Wine dataset.

Wine:  ts  txt
Yeast set
Yeast
Yeast
1484 vectors,
10 clusters
8 dimensions
  Yeast dataset.

Yeast:  txt
Yeast_times100:  ts  txt
Breast-cancer-Wisconsin set
Breast
Breast
699 vectors,
2 clusters
9 dimensions
  Breast-cancer-Wisconsin dataset.

Breast:  ts  txt
g2 sets
g2-2-30
g2-2-30
1024 vectors per cluster,
2 clusters
1-1024 dimensions
variance 10-100
  Gaussian clusters dataset.

g2:  ts's in zip file (53MB) 

Related links