Clustering Methods (5 cp) 3621552


Course description

Clustering is a basic tool used in data analysis, pattern recognition and data mining for finding groups in data. The main challenges of clustering is to define a cost function that is then optimized by an algorithm. We consider several cost functions and algorithms for the problem, study how to solve the number of clusters. Numerical, categorical, text and graphs are considered. Practical clustering methods also need to consider outliers and missing data.

Course will be arranged as a series of video lectures. The recordings will be done publicly during scheduled lecture times. Videos are stored in youtube and the participants can watch them anytime. Questions and discussions will appear immediately after each recording and during the exercise sessions.

Intro

Lectures

Teacher: Pasi Fränti
Schedule: 28 h, starting from 24.1.2017
Tuesday 14-16 (D106 / F213)
Lectures in YouTube

Lecture 1: 24.1. Introduction (ppt)
Lecture 3: 14.3. Objective functions (ppt)
Lecture 3: 21.3. Clustering text and web pages (ppt)
Lecture 4: 25.4. Fast nearest neighbor searches in high dimensions (ppt) (pdf)
Lecture 5: 10.5. Number of clusters (ppt) (pdf)
Lecture 6: 16.5. Outliers (ppt) (pdf)
Lecture 7: 23.5. Divisive algorithms (ppt) (pdf)

Video lecture 1: Random Swap (ppt) (pdf)
Video lecture 2: Centroid Index (ppt) (pdf)
Video lecture 3: K-means properties (ppt) (pdf)
Video lecture 4: Fast K-means (ppt) (pdf)
Video lecture 5: Agglomerative clustering (to appear)

Exercises

Mondays 14-16 (D106 / F213):

Exercise 1: 30.1.
Exercise 2: 20.2.
Exercise 3: 20.3.
Exercise 4: 27.3.
Exercise 5: 10.4.    Tasks selected
Exercise 6: 8.5.
Exercise 7: 22.5.

Lectures Notes and material from 2014
Suplementary material from 2012

Preliminary knowledge

Design & Analysis of Algorithms

Exams

24.5. 12-16, Room OTS 100 (Joensuu), Room F 211 (Kuopio)
16.6. 12-16, Room OTS 100 (Joensuu), Room CA 101 (Kuopio)

Links

Clusterator
Animator
Visualization software