Winter workshop on
Data Mining and Pattern Recognition

mekrijärvi landscape

March 4-6, 2013
Mekrijärvi Research Station, Ilomantsi, Finland

NEW Pictures from the workshop


Updated 18.3.2013

Monday: Workshop day 1

Tuesday: Workshop day 2

Wednesday: ECSE Progress Support Meeting


Cut-based clustering algorithms, Pasi Fränti
Cut-based methods refer to divisive algorithms that solve the clustering hierarhically by a sequence of splits. We first discuss the role of the cutting, which is based either on a local criterion or a global optimization function. we then study how to find high quality clustering by an efficient algorithm regardless of the cost function used.

Kernel-based clustering, Tapio Pahikkala
The maximum margin clustering problem is an extension of the concept of support vector machine (SVM) classifiers and their variations to the unsupervised learning setting. While it is appealing from a practical point of view, it leads to a combinatorial optimization problem that is difficult to address. This is especially the case for the multi-class scenario. We present an efficient unsupervised extension of a well-known variation of the SVM classifier, namely the multi-class regularized least-squares classifier. The method combines steepest descent search with a powerful meta-heuristic for avoiding local minima with inferior clustering performance. The computational efficiency of the combinatiroal searches is ensured through matrix algebraic optimization. In the experiments, the method significantly outperforms existing approaches in terms of clustering accuracy.

Solving ranking problems, Antti Airola
The problem of learning-to-rank has attracted considerable attention in the machine learning community over the past decade. The most widely used approach to solving these problems has been to model them as pairwise classification problems, resulting in methods such as the widely used ranking support vector machine (RankSVM) algorithm. However, straightforward implementations of this approach result in inefficient algorithms that scale quadratically or worse with respect to the amount of available training data. We propose an algorithm, that using cutting plane optimization techniques and self-balancing binary search trees allows training the linear RankSVM in O(n*log(n)) time. The method easily scales to millions of data points.

Segmentation of image series, Valery Grishkin
Objects recognition on a series of image causes segmentation problems of these objects in each image in the series. In this case, automatic segmentation methods do not produce satisfactory results. Most good segmentation results can be obtained with the help of interactive algorithms. However, for large series of images to use interactive methods is impractical because of the inability to fully automate the process of segmentation. The paper proposes a combined method based on single-use interactive algorithms for basic shots and disseminating the results to the rest of the images. Other images in the series are transformed to the base image. The segmentation of objects in these images is done using masks obtained by processing of the base image.Then, if necessary, can be carried out the reverse transform.

Multivariate mixed-effects models for reflectance of forest trees, Lauri Mehtätalo
Multiple overlapping images with several objects per image provide a dataset with two crossed groupings: on one hand, there are multiple objects per image and on the other hand multiple observations per object. Forest tree data contains multiple overlapping aerial images containing of visual reflectance on four channels: Red, Green, Blue and NIR. We apply mixed-effects models to classify the species of the trees. We first partition the data into sunlit and self-shaded parts, and then use Mahalanobis distance to perform species classification of the obtained partitions.

Set-matching based comparison of clustering results, Mohammed Rezaei
Finding a good measure for comparing clustering results is vital in clustering evaluation. A good measure should be consistent with the changes in the size of data, and the number and size of clusters. External validity indexes measure how well a clustering result matches to the ground truth. The indexes can be categorized into pair-counting, set-matching and information theoretic based measures. We study the state of the art indexes in each group and focus on set-matching indexes providing two new indexes in this group.

SciFest science festivals 2007-2012 analyzed, Ilkka Jormanainen
SciFest in annual science and technology festival, where school students get their hands into science in concrete workshop activities. The festival is organized in Joensuu in collaboration between Joensuu Science Society and University of Eastern Finland, and it attracts about 10000 visitors every year. In this presentation, I will present basic statistics and key figures of the festival between 2007 and 2012. We will explore what kind of clusters the festival program forms and what kind of changes there have been over the years in the content. During the presentation, we will also discuss with the audience what would be suitable data mining techniques for example for automatic generation of program packages according to teachers? professional background and interests.

A simple voice activity detector for speaker verification, Tomi Kinnunen
A voice activity detector (VAD) plays a vital role in robust speaker verification, where energy VAD is most commonly used. Energy VAD works well in noise-free conditions but deteriorates in noisy conditions. One way to tackle this is to introduce speech enhancement preprocessing. We study an alternative, likelihood ratio based VAD that trains speech and nonspeech models on an utterance-by-utterance basis from mel-frequency cepstral coefficients (MFCCs). The training labels are obtained from enhanced energy VAD. As the speech and nonspeech models are re-trained for each utterance, minimum assumptions of the background noise are made. According to both VAD error analysis and speaker verification results utilizing state-of-the-art i-vector system, the proposed method outperforms energy VAD variants by a wide margin.

Phase estimation for single-channel speech enhancement, Rahim Saeidi
If not all, most of the speech enhancement algorithms are developed in a way that provided the noisy observation, they estimate a closest possible amplitude spectrum to the clean signal.This talk is targeted to show the results of our recent study on the impact of exploiting the spectral phase information to further improve the speech quality of the single-channel speech enhancement algorithms. Our experiments confirm the usefulness of phase estimation for minimum mean square error (MMSE) estimation of the spectral amplitude as well as in the last stage of signal reconstruction.

Automatic facial age estimation, Mohamed Eldib
Estimating human age from images is a problem that has recently gained attention from the computer vision community due to its numerous applications as well as the challenges that face a satisfactory solution. Beside traditional challenges in captured facial images under uncontrolled settings such as different lighting, varying poses and expressions, aging effects on appearance depends on many other factors such as life style. In this thesis, a new automatic age estimation framework is proposed. A single image is required as input for the subject of interest to estimate his age. The framework is composed of three main modules: 1) the core system module; 2) the enhancement module; and 3) the application module.

Evaluating confidence of a fingermark general pattern by Bayesian network, Rudolf Haraksim
We have developed two Bayesian networks for human-assisted fingerprint examination, one working at the finger-level and the other at the person level. The importance of the fingermark general pattern comes from its capability to exclude false matches from the process. The two networks models the dependencies between different types of using likelihood ratios. These can be further combined with other types of evidence, for example the minutiae configurations.

Studies on phase based features for speech processing, Padmanabhan Rajan
The Fourier transform is a complex function, and thus has both magnitude and phase information. Most features used in speech processing has used information from the magnitude. In this talk, I will present some recent experiments we have conducted using features derived from phase.

Source device identification from speech signals, Cemal Hanilci
Development in digital technology brings many challenging problems together. In many court cases, digital materials produced by such sophisticated hardware and software tools are presented as evidences. The forensic examiners are asked to prepare a report which describes the information that digital evidences contain. Thus tracking the source device that digital material generated is essential for the investigation. In this presentation, we will show that speech signal contains information about source device in some forms which enables us to identify the source from speech signals.

Fusing visual and tactile information for 3-D object reconstruction, Jarmo Ilonen
Manipulation and grasping of unknown objects is a great challenge in service robotics. One reason for this is that in many real-world scenarios, grasping is only the first step in a sequence of more complex object manipulation actions, and planning them requires a 3-D model of an object. An additional problem is that 3-D reconstruction from a single viewpoint is only possible with additional assumptions on object shape. We present a method for constructing a 3-D model for a symmetric object by fusion of visual and tactile information while the object is grasped.

Image preprocessing methods for retinal imaging, Lasse Lensu
Retinal imaging poses a few challenges when the purpose is to characterise and calibrate the imaging system and to estimate the true quantities of subtances related to eye disease lesions. When carried out properly, the preprocessed image information together with appropriate expert ground truth enables early detection of lesions using, for example, statistical pattern recognition methods. Here we discuss the problems related to characterising and calibrating the imaging system and the current solutions for the task.

Inter-channel registration of spectral retinal images, Lauri Laaksonen
While spectral images provide significant additional information compared to grayscale and RGB images, the inter-channel registration of spectral retinal images presents many additional challenges. Several feature-based approaches to registering retinal images will be presented along with methods for comparing their performance. In addition to pair-wise registration, the approaches for registering over a whole set of images will be considered.

World-wide location-based search using OpenStreetMap, Andrei Tabarcea
Most of the location-based data on the web is found in indirect forms such as postal addresses, IP addresses or plain description in natural language. We propose to detect locations by identifying the postal addresses found in web pages. This is done by post-processing the results provided by the API of a conventional search engine. A first prototype which works for Finland was demonstrated and the new challenge is to extend the search for the whole world by using publicly-available OpenStreetMap data.

A balanced clustering method, Mikko Malinen
Sometimes, a balanced clustering result is desirable. In balanced clustering, the points are evenly distributed into clusters. We present a clustering method, in which the cost function is formed by taking total squared errors of clusters multiplied by the number of points in the corresponding clusters. We show that this problem is NP-hard and present a polynomial-time approximate algorithm. Experiments show that 72% of the clustering runs give more balanced result with proposed method than with k-means method.

Overcoming the complexity of software product management, Andrey Maglyas
This presentation describes an approach to overcoming the complexity of software product management (SPM). It will be based on several studies that investigate the activities and roles in product management, as well as issues related to the adoption of software product management. Product management processes and practices in 13 companies were studied and analysed. Moreover, collected data were analysed with the Grounded Theory method to identify the possible ways to overcome the complexity of SPM. Elements of the Theory of Constraints were used for deeper data

Image analysis for pulpmaking process measurement, Natalia Strokina
To optimize the production processes of pulp and papermaking, we search for solutions to assess and control product quality. We study machine vision methods at the wet-stage of the papermaking, where the paper products are only formed. The study dirt particle classification, fiber characterization, pulp flow characterization, and oxygen volume estimation. We search for solutions both on these sub-problems as well as on the entire process.

Eye tracking for surgical microscope, Shahram Eivazi
The goal of my study is to investigate the possibility of applying intelligent system in micro-neurosurgery. The system is based on behavioral inputs, particularly eye movements. It also can be used for future command and control applications in micro-neurosurgery. To achieve the goal, I aim to record eye movements of neurosurgeons in operating room by integrating a binocular eye tracker within an operating microscope. In this talk i will present the first prototype of the eye tracker.

Computational approaches to visual attention for interaction inference, Hana Vrzakova
Many aspects of interaction are hard to directly observe and measure. My research focuses on particular aspects of UX such as cognitive workload, problem solving or engagement, and establishes computational links between them and visual attention. Using machine learning and pattern recognition techniques, I aim to achieve automatic inferences for HCI and employ them as enhancements in gaze-aware interfaces.


Free accommodation will be provided for all accepted participants.


Monday 4.3 at 10:00 - leaving from Science Park
Wednesday 6.3 at 12:30 - leaving from Mekrijärvi

Arrival to Joensuu to join the group from Science Park:
08:30 - flight from Helsinki
There is no train connection in the morning.

Public bus connections:
Joensuu-Ilomantsi: 09:30-10:40, 12:30-13:40, 14:05-15:45, 16:10-17:25, 18:20-19:30
Ilomantsi-Joensuu: 14:00-15:35, 16:30-18:00

Previous workshops in Mekrijärvi:

DMPR 2012
ADMPR 2011


ECSE Graduate School
University of Eastern Finland (UEF)

Prof. Pasi Fränti, chairman
Karol Waga, webmaster