Puheteknologian uudet menetelmät ja sovellukset (PUMS)

New Methods and Applications of Speech Technology
University of Joensuu

Main Page
News
Goals
Technology
Research Group
Partners
Links
Results
 

The PUMS Project

The PUMS project is part of FENIX --- Interactive Computing, a technology programme of the National Technology Agency of Finland. The acronym PUMS originates from the project's Finnish name Puheteknologian uudet menetelmät ja sovellukset. It translates to New Methods and Applications of Speech Technology in English. These web pages provide information on the part of the project involving the Computer Science department (CS) of University of Joensuu.

The CS department's part of the PUMS project deals with automatic speaker recognition and management of speaker profiles. We aim at automatically generating voice profiles from voice samples of individual speakers. The voice profiles can be easily combined with other data, such as a face image or a personal record. The potential applications of the voice profiles are in any person authentication and speech technology systems. A user can create a voice profile over a telephone or a microphone attached to his/her PC. Once the profile is generated, it can later be updated to match the changing environments and, for example, state of health of the speaker.

Any speech signal can be roughly thought of as a complex combination of different informations. The information can be divided in the following, for example:

Speech content
This is the idea, or the message that the speaker is trying to transmit to the listener while speaking
Speaker characteristics
Physical structure of the speech organs characteristic to a speaker: windpipe length, lung size, tongue mass, nasal cavity structure, etc.
Transmission path
The characteristics of the acoustic speech signal transmission path as well as electronic devices involved in the speech transmission: room size/outside weather, recording and playback device characteristics, etc.
Time varying speaker parameters
Speaker's emotional state and health
Linguistic encoding
Language, Dialect, etc.

In speaker recognition the goal is to extract and utilise the speaker characteristics from the speech signal. In general, the speaker recognition can be divided in two different tasks based on the application. A system either does speaker identification or speaker verification. Technically the easiest case is a closed set speaker identification task.

Closed Set Speaker Identification
A set of N distinct speaker models have been stored in the identification system by extracting abstract parameters from speech samples from the N speakers. In the identification task similar parameters from new speech input are extracted and it is decided which one of the N known speakers most resembles the input speech parameters.
Open Set Speaker Identification
Similar to the closed set problem, except that the unknown speaker is not necessary any of the speakers stored in the system database. Thus, the system must decide whether the best matching speaker is sufficiently close to any speaker in the database.
Speaker Verification
Task of deciding whether an utterance was produced by a claimed speaker. The identity claim can be given e.g. as a PIN code. (Special case of the open set identification task with N=1)

These problems are attacked with standard methods as well as state-of-the-art speaker recognition technique developed in University of Joensuu. Our experiments have shown that unsupervised learning in speaker classifier training gives sufficient accuracy.

We use C/C++ for implementation of the products as well as Matlab and scripting languages for experimenting.

Go to beginning of the PUMS main page

Valid HTML 4.01!

This page was last updated 2004-10-25