Prof. Tomi H. Kinnunen's corner

	Tomi H. Kinnunen Professor, PhD, PhLic, Docent Professor of speech technology, specialized in speaker recognition and security (spoofing attacks and countermeasures; co-founder of ASVspoof challenge series).
	Computational Speech Group School of Computing University of Eastern Finland P.O. Box 111, FI-80101 Joensuu, FINLAND
	E-mail: firstname dot lastname at uef dot fi
	Google Scholar profile LinkedIn

NEWS (09/2023) !

A funded post-doctoral AND PhD position available in my 2022-2026 Academy of Finland funded project "Generalized Voice Anti-Spoofing and Voice Biometrics" (SPEECHFAKES). This project addresses topics ranging from speaker recognition and anti-spoofing (among others)! If interested to know more, please contact me by e-mail <tomi.kinnunen@uef.fi> along with your resume and publication list.

PROGRAM CODE:

[Github] for Baselines and Protocols for Household Speaker Recognition
[Github] for ASVtorch toolkit: Speaker verification with deep neural networks
GPU accelerated implementation of i-vector extractor (training / extraction) using PyTorch. The methods are described in our Interspeech 2019 paper.
Semi-supervised speech activity detector, described in [Computer Speech & Language paper]
Audio replay attack detection baseline code (Matlab), for ASVspoof 2017 challenge
Local variability features (Matlab). See [the related paper in Digital Signal Processing]
PLDA for anti-spoofing (Python). See also [IEEE-T-IFS paper]
Fast probabilistic linear discriminant analysis (PLDA) implementation (Matlab and Python). See the related S+SSPR paper [PDF]
Utterance-by-utterance adaptive speech activity detector (SAD) presented in ICASSP 2013 [PDF].
Multiple window (multitaper) spectrum estimators (Matlab). See also the related publications in IEEE T-ASLP, Speech Communication, IEEE SPL, Interspeech 2010 and ASRU 2011.
Regularized all-pole methods as an appendix of Odyssey 2012 paper. See also the related publication in IEEE SPL.
Temporally weighted linear predictors (from Jouni Pohjalainen's page)

DATA AND CHALLENGES

ASVspoof 5 evaluation plan (see further details at www.asvspoof.org)
Voice Conversion Challenge 2020 database v1.0 (see also the challenge website)
ASVspoof 2019 (The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database) data now publicly available For further details, see www.asvspoof.org.

ASVspoof 2019 "Real PA" set (hosted at Eurecom)

Corpus of Age-related Voice Disguise (AVOID) is available [Instructions to obtain the data]; the data was used in this Speech Communication and this JASA publication.

The Voice Conversion Challenge 2018: database and results (VCC18). See also the challenge overview paper and another paper containing supplementary speech artifact analysis (both will be presented at Odyssey 2018)
Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof). See [IEEE-J-STSP overview paper of the ASVspoof challenge]

ASVspoof2017 challenge data (audio replay attack detection) [NOTE! this is patched v2.0 of the corpus, described here and recommended to be used instead of the original one] See also ICASSP 2017 paper about data collection and Interspeech 2017 challenge overview paper

ASVspoof2015 challenge data (voice conversion and text-to-speech attack detection task).

I-vectors (~420 MB) used in IEEE-T-IFS paper (hosted at IDIAP).
I4U consortium filelists for NIST SRE12 development purposes (from Rahim Saeidi's pages) used in [Interspeech 2013 paper]

PUBLICATIONS:

International refereed journal articles:

T. Kinnunen, K.A. Lee, H. Tak, N. Evans, A. Nautsch, "t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators", to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence.
X. Liu, X. Wang, M. Sahidullah, J. Patino, H. Delgado, T. Kinnunen, M. Todisco, J. Yamagishi, N. Evans, A. Nautsch, K.A. Lee, "ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild", IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 2507--2522, 2023. [Arxiv link]
L. Tavi, T. Kinnunen, Rosa González Hautamäki,"Improving Speaker De-Identification with Functional Data Analysis of F0 Trajectories," Speech Communication, 140: 1-10, 2022.
A. Kanervisto, V. Hautamäki, T. Kinnunen and J. Yamagishi, "Optimizing Tandem Speaker Verification and Anti-Spoofing Systems," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022 [Arxiv link]
X. Liu, M. Sahidullah, T. Kinnunen, “Optimizing Multi-Taper Features for Deep Speaker Verification”, IEEE Signal Processing Letters, 28: 2187--2191, October 2021. [arxiv]
K.A. Lee, V. Vestman, and T. Kinnunen, “ASVtorch Toolkit: Speaker Verification with Deep Neural Networks”, SoftwareX, Volume 14, 100697, June 2021 [link]
A. Nautsch, X. Wang, N. Evans, T. Kinnunen, V. Vestman, M. Todisco, H. Delgado, M. Sahidullah, J. Yamagishi, K.A. Lee, “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech”, IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(2): 252--265, April 2021 [arxiv link]
T. Kinnunen, H. Delgado, N. Evans, K.A. Lee, V. Vestman, A. Nautsch, M. Todisco, X. Wang, M. Sahidullah, J. Yamagishi, D.A. Reynolds, “Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2195-2210, 2020 [PDF]
X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. L. Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J. -F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling, “ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech”, Computer Speech & Language, 64, November 2020 [arxiv link]
B. Chettri, T. Kinnunen, E. Benetos, “Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification”, Computer Speech & Language, 63: 1--18, September 2020 [PDF]
A. Sholokhov, T. Kinnunen, V. Vestman, K.A. Lee, “Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores”, Computer Speech & Language, 60: 1--19, March 2020. [PDF]
A. Kato and T. Kinnunen, “Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(2):2336--2349, December 2019 [PDF]
R. González Hautamäki, V. Hautamäki, T. Kinnunen, “On Limits of Automatic Speaker Verification: Explaining Degraded Recognizer Score Through Acoustic Changes Resulting from Voice Disguise”, Journal of the Acoustic Society of America, 146(1): 693--704, July 2019 [PDF]
V. Vestman, T. Kinnunen, R. Gonzalez Hautamäki, M. Sahidullah, “Voice Mimicry Attacks Assisted by Automatic Speaker Verification”, Computer Speech & Language, 59: 36--54, January 2020 [PDF]
E. Jokinen, R. Saeidi, T. Kinnunen, P. Alku, “Vocal Effort Compensation for MFCC Feature Extraction in a Shouted Versus Normal Speaker Recognition Task”, Computer Speech & Language, 53: 1--11, January 2019 [PDF] [doi]
V. Vestman, D. Gowda, M. Sahidullah, P. Alku, and T. Kinnunen, “Speaker Recognition from Whispered Speech: a Tutorial Survey and an Application of Time-Varying Linear Prediction”, Speech Communication, 99: 62--79, May 2018 [PDF]
M. Sahidullah, D. Thomsen, R. Gonzalez Hautamäki, T. Kinnunen, Z.-H. Tan, R. Parts, and M. Pitkänen, “Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(1): 44--56, January 2018 [PDF]
A. Sholokhov, M. Sahidullah, T. Kinnunen, “Semi-Supervised Speech Activity Detection with an Application to Automatic Speaker Verification”, Computer Speech & Language, 47: 132--156, January 2018 [PDF] [Program code]
R. González Hautamäki, M. Sahidullah, V. Hautamäki, and Tomi Kinnunen, “Acoustical and Perceptual Study of Voice Disguise by Age Modification in Speaker Verification”, Speech Communication, 95: 1--15, 2017 [PDF]
Z. Wu, J. Yamagishi, T. Kinnunen, C. Hanilçi, M. Sahidullah, A. Sizov, N. Evans, M. Todisco, H. Delgado, “ASVspoof: the Automatic Speaker Verification Spoofing and Countermeasures Challenge”, IEEE J. on Selected Topics in Signal Processing, 11(4): 588---604, June 2017 [PDF]
A. Sizov, K. A. Lee, T. Kinnunen, “Direct Optimization of the Detection Cost for i-Vector based Spoken Language Recognition”, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(3): 588---597, March 2017 [PDF]
C. Hanilçi, T. Kinnunen, M. Sahidullah, A. Sizov, “Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise”, Speech Communication, 85: 83---97, December 2016 [PDF]
M. Sahidullah and T. Kinnunen, “Local Spectral Variability Features for Speaker Verification”, Digital Signal Processing, 50: 1--11, March 2016 [PDF] [Program code]
H. Behravan, V. Hautamäki, S. M. Siniscalchi, T. Kinnunen, C.-H. Lee, “i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition”, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(1): 29---41, January 2016 [PDF]
R. Gonzalez Hautamäki, T. Kinnunen, V. Hautamäki and A.-M. Laukkanen, “Automatic versus Human Speaker Verification: the Case of Voice Mimicry”, Speech Communication, 72: 13--31, September 2015 [PDF]
A. Sizov, E. Khoury, T. Kinnunen, Z. Wu and S. Marcel, “Joint Speaker Verification and Anti-Spoofing in the i-Vector Space”, IEEE Transactions on Information Forensics and Security, 10(4): 821--832, April 2015 [PDF] [i-vector data (hosted at IDIAP)] [Code (hosted at IDIAP)]
Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, H. Li, “Spoofing and Countermeasures for Speaker Verification: a Survey”, Speech Communication, 66: 130--153, February 2015 [PDF]
H. Behravan, V. Hautamäki, T. Kinnunen, “Factors Affecting i-Vector Based Foreign Accent Recognition: a Case Study in Spoken Finnish”, Speech Communication, 66: 118--129, February 2015 [PDF]
C. Hanilçi and T. Kinnunen, “Source Cell-Phone Recognition from Recorded Speech Using Non-Speech Segments”, Digital Signal Processing, 35: 75--85, December 2014 [PDF]
J. Pohjalainen, C. Hanilçi, T. Kinnunen, P. Alku, “Mixture Linear Prediction in Speaker Verification Under Vocal Effort Mismatch”, IEEE Signal Processing Letters, 21(12): 1516--1520, December 2014 [PDF] [MATLAB CODE]
P. Rajan, A. Afanasyev, V. Hautamäki, T. Kinnunen, “From Single to Multiple Enrollment i-Vectors: Practical PLDA Scoring Variants for Speaker Verification“, Digital Signal Processing, 31: 93--10, August 2014 [PDF]
V. Hautamäki, T. Kinnunen, F. Sedlak, K.A. Lee, B. Ma, H. Li, “Sparse Classifier Fusion for Speaker Verification”, IEEE Transactions on Audio, Speech and Language Processing, 21(8): 1622--1631, August 2013 [PDF]
O. Schleusing, T. Kinnunen, B. Story, J.-M. Vesin, “Joint Source-Filter Optimization for Accurate Vocal Tract Estimation Using Differential Evolution”, IEEE Transactions on Audio, Speech and Language Processing, 21(8): 1560--1572, August 2013 [PDF]
Md. J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy, “Multitaper MFCC and PLP Features for Speaker Verification Using i-Vectors”, Speech Communication, 55(2): 237--251, February 2013 [PDF] [ ISCA Award for the best paper published in Speech Communication (2013 - 2015) ]
Z. Wu, T. Kinnunen, E.S. Chng, H. Li, “Mixture of Factor Analyzers Using Priors from Non-Parallel Speech for Voice Conversion”, IEEE Signal Processing Letters, 19(12): 914--917, December 2012 [PDF]
P. Mowlaee, R. Saeidi, M.G. Christensen, Z.-H. Tan, T. Kinnunen, P. Fränti, S.H. Jensen, “A Joint Approach for Single-Channel Speaker Identification and Speech Separation”, IEEE Transactions on Audio, Speech and Language Processing, 20(9): 2586--2601, November 2012 [PDF] [supplementary audio material].
T. Kinnunen, R. Saeidi, F. Sedlak, K.A. Lee, J. Sandberg, M. Hansson-Sandsten, H. Li, “Low-Variance Multitaper MFCC Features: a Case Study in Robust Speaker Verification”, IEEE Transactions on Audio, Speech and Language Processing, 20(7): 1990--2001, September 2012 [PDF][Multitaper Matlab code].
C. Hanilçi, T. Kinnunen, F. Ertas, R. Saeidi, J. Pohjalainen, P. Alku, “Regularized All-Pole Models for Speaker Verification Under Noisy Environments”, IEEE Signal Processing Letters 19(3), 163--166, March 2012 [PDF]. Find also extended analysis and the RLP program codes from Odyssey 2012 version.
T. Kinnunen, I. Sidoroff, M. Tuononen, P. Fränti, “Comparison of Clustering Methods: a Case Study of Text-Independent Speaker Modeling”, Pattern Recognition Letters, 32(13): 1604--1617, October 2011 [PDF]
K.A. Lee, C.H. You, H. Li, T. Kinnunen, K.C. Sim, “Using Discrete Probabilities with Bhattacharyya Measure for SVM-based Speaker Verification”, IEEE Transactions on Audio, Speech and Language Processing, 19(4): 861--870, May 2011 [PDF].
R. Saeidi, J. Pohjalainen, T. Kinnunen, P. Alku, “Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification”, IEEE Signal Processing Letters, 17(6), pp. 599--602, June 2010 [PDF].
J. Sandberg, M. Hansson-Sandsten, T. Kinnunen, R. Saeidi, P. Flandrin, P. Borgnat, “Multitaper Estimation of Frequency-Warped Cepstra with Application to Speaker Verification”, IEEE Signal Processing Letters, 17(4): 343--346, April 2010. [PDF][Multitaper Matlab code]
T. Kinnunen and H. Li, “An Overview of Text-Independent Speaker Recognition: from Features to Supervectors”, Speech Communication 52(1): 12--40, January 2010 [PDF].
T. Kinnunen, J. Saastamoinen, V. Hautamäki, M. Vinni, P. Fränti, “Comparative Evaluation of Maximum a Posteriori Vector Quantization and Gaussian Mixture Models in Speaker Verification”, Pattern Recognition Letters 30(4): 341--347, March 2009. [PDF]
V. Hautamäki, T. Kinnunen and P. Fränti, “Text-Independent Speaker Recognition Using Graph Matching”, Pattern Recognition Letters, 29(9): 1427--1432, 2008. [PDF]
V. Hautamäki, T. Kinnunen, I. Kärkkäinen, M. Tuononen, J. Saastamoinen, P. Fränti, “Maximum a Posteriori Estimation of the Centroid Model for Speaker Verification“, IEEE Signal Processing Letters, 15: 162--165, 2008. [PDF]
T. Kinnunen, E. Karpov, P. Fränti, “Real-Time Speaker Identification and Verification”, IEEE Transactions on Audio, Speech and Language Processing, 14(1): 277--288, Jan 2006. [PDF]

International book chapters:

M. Sahidullah, H. Delgado, M. Todisco, A. Nautsch, X. Wang, T. Kinnunen, N. Evans, J. Yamagishi, K.A. Lee (2023), "Introduction to Voice Presentation Attack Detection and Recent Advances" In: Marcel, S., Fierrez, J., Evans, N. (eds) Handbook of Biometric Anti-Spoofing. Advances in Computer Vision and Pattern Recognition. Springer, Singapore. https://doi.org/10.1007/978-981-19-5288-3_13
M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans, J. Yamagishi, K.A. Lee, “Introduction to Voice Presentation Attack Detection and Recent Advances”, book chapter in Handbook of Biometric Anti-Spoofing: Presentation Attack Detection, Springer, S. Marcel, M.S. Nixon, J. Fierrez, N. Evans (Eds.), Springer, 2018 [PDF]
N. Evans, T. Kinnunen, J. Yamagishi, Z. Wu, F. Alegre and P. De Leon, “Speaker recognition anti-spoofing”, book chapter in Handbook of Biometric Anti-spoofing, Springer, S. Marcel, S. Li and M. Nixon, Eds., 2014.
N. Evans, F. Alegre, Z. Wu and T. Kinnunen, “Anti-spoofing: voice conversion”, book chapter in Encyclopedia of Biometrics, 2nd Edition, Springer US, Stan Z. Li and Anil K. Jain, Eds, 2014.
F. Alegre, N. Evans, T. Kinnunen, Z. Wu, and J. Yamagishi, “Anti-spoofing: voice databases”, book chapter in Encyclopedia of Biometrics, 2nd Edition, Springer US, Stan Z. Li and Anil K. Jain, Eds, 2014.

Submitted pre-prints:

X. Liu, M. Sahidullah, T. Kinnunen, "Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification" [arxiv]
J.-w. Jung, H. Tak, H.-j. Shim, H.-S. Heo, B.-J. Lee, S.-W. Chung, H.-G. Kang, H.-J. Yu, N. Evans, T. Kinnunen, "SASV Challenge 2022: A Spoofing Aware Speaker Verification Challenge Evaluation Plan" [arxiv]

Internationally refereed conference articles:

H.-j. Shim, R. Gonzalez Hautamäki, M. Sahidullah, T. Kinnunen, "How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning", Proc. INTERSPEECH 2023, 785-789, doi: 10.21437/Interspeech.2023-1901.
X. Liu, M. Sahidullah, K.A. Lee, T. Kinnunen, "Speaker-Aware Anti-spoofing", Proc. INTERSPEECH 2023, 2498-2502, doi: 10.21437/Interspeech.2023-1323
V.P. Singh, M. Sahidullah, T. Kinnunen, "Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech", Proc. INTERSPEECH 2023, 1948-1952, doi: 10.21437/Interspeech.2023-2052
H-j. Shim, J-w. Jung, T. Kinnunen, "Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing", Proc. INTERSPEECH 2023, 3804-3808, doi: 10.21437/Interspeech.2023-1910
S.H. Mun, H.-j. Shim, H. Tak, X. Wang, X. Liu, M. Sahidullah, M. Jeong, M.H. Han, M. Todisco, K.A. Lee, J. Yamagishi, N. Evans, T. Kinnunen, N.S. Kim, J-w. Jung, "Towards Single Integrated Spoofing-aware Speaker Verification Embeddings", Proc. INTERSPEECH 2023, 3989-3993, doi: 10.21437/Interspeech.2023-1402
M. Anderson, T. Kinnunen, N. Harte, "Learnable Frontends That Do Not Learn: Quantifying Sensitivity To Filterbank Initialisation," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095474 .[arxiv]
R. Bhatia and T. Kinnunen, "An Initial study on Birdsong Re-synthesis Using Neural Vocoders", Proc. 24th International Conference on Speech and Computer (SPECOM 2022), pp. 64–74 [arxiv]
J.-w. Jung, H. Tak, H.-j. Shim, H.-S. Heo, B.-J. Lee, S.-W. Chung, H.-J. Yu, N. Evans, T. Kinnunen, "SASV 2022: The First Spoofing-Aware Speaker Verification Challenge", Proc. Interspeech 2022, pp 2893--2897, September 2022. [PDF]
A. Sholokhov, X. Liu, M. Sahidullah, T. Kinnunen, "Baselines and Protocols for Household Speaker Recognition", Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), pp. 185--192 [ISCA archive] [arxiv] [Github ]
S. Ghimire, T. Kinnunen, R. González Hautamäki, "Gamified Speaker Comparison by Listening", Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), pp. 421--427, 2022 [ISCA archive] [arxiv]
X. Liu, M. Sahidullah, T. Kinnunen, "Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation", Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), pp. 85--91, 2022 [ISCA archive] [arxiv]
H.-j. Shim, H. Tak, X. Liu, H.-S. Heo, J.-W. Jung, J.S. Chung, S.-W. Chung, H.-J. Yu, B.-J. Lee, M. Todisco, H. Delgado, K.A. Lee, M. Sahidullah, T. Kinnunen, N. Evans, "Baseline Systems for the First Spoofing-Aware Speaker Verification Challenge: Score and Embedding Fusion", Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), pp. 330--337, 2022 [ISCA Archive] [arxiv]
X. Liu, M. Sahidullah, T. Kinnunen, "Learnable Nonlinear Compression for Robust Speaker Verification", IEEE ICASSP 2022, pp. 7962--7966, May 2022, Singapore [arxiv]
K. Hechmi, T.N. Trong, V. Hautamäki, T. Kinnunen, ”VoxCeleb Enrichment for Age and Gender Recognition”, IEEE Automatic Speech Recognition and Understanding workshop (ASRU), pp. 687--693, December 2021 [arxiv] [Github]
X. Liu, M. Sahidullah, T. Kinnunen, “Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification”, IEEE Automatic Speech Recognition and Understanding workshop (ASRU), pp. 185--190, December 2021. [arxiv]
X. Liu, M. Sahidullah, T. Kinnunen, “Parameterized Channel Normalization for Far-field Deep Speaker Verification”,IEEE Automatic Speech Recognition and Understanding workshop (ASRU), pp. 1132--1138, December 2021. [arxiv]
L. Tavi, T. Kinnunen, E. Meister, R González-Hautamäki, A. Malmi, ”Articulation During Voice Disguise: A Pilot Study”, Proc. Speech and Computer (SPECOM’21), Springer LNAI 12997, pp. 680–691, St. Petersburg, Russia, September 2021. [PDF]
J.-F. Bonastre, H. Delgado, N. Evans, T. Kinnunen, K.A. Lee, X. Liu, A. Nautsch, G. Noé‬, ‪J. Patino, M. Sahidullah, B.M.L. Srivastava, M. Todisco, N. Tomashenko, E. Vincent, X. Wang, J. Yamagishi (2021), "Benchmarking and challenges in security and privacy for voice biometrics. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication", 52-56, doi: 10.21437/SPSC.2021-11 [arxiv]
T. Kinnunen, A. Nautsch, M. Sahidullah, N. Evans, X. Wang, M. Todisco, H. Delgado, J. Yamagishi, K.A. Lee, “Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing”, Proc. Interspeech, 4299-4303, Brno, Czech Republic, 2021. [arxiv link]
B. Chettri, R. González Hautamäki, M. Sahidullah, T. Kinnunen, “Data Quality as Predictor of Voice Anti-Spoofing Generalization”, Proc. Interspeech, 1659-1663, Brno, Czech Republic, 2021. [arxiv link]
X. Liu, M. Sahidullah, T. Kinnunen, “Learnable MFCCs for Speaker Verification”, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 2021), Daegu, Korea, May 2021 [arxiv link]
M. Sahidullah, A.K. Sarkar, V. Vestman, X. Liu, R. Serizel, T. Kinnunen, Z.-H. Tan, E. Vincent, “UIAI System for Short-Duration Speaker Verification Challenge 2020”, Proc. IEEE Spoken Language Technology Workshop (SLT 2021), Shenzhen, China, January 2021 [arxiv link]
R. K. Das, T. Kinnunen, W.-C. Huang, Z. Ling, J. Yamagishi, Y. Zhao, X. Tian, T. Toda, “Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions”, Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, pp. 99--120, 2020. [arxiv link]
Y. Zhao, W.-C. Huang, X. Tian, J. Yamagishi, R.K. Das, T. Kinnunen, Z. Ling, T. Toda, “Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion”, Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, pp. 80--98, 2020. [arxiv link]
A. Sholokhov, T. Kinnunen, V. Vestman, K.A. Lee, “Extrapolating False Alarm Rates in Automatic Speaker Verification”, Proc. Interspeech 2020, pp. 4218--4222, Shanghai, China, October 2020 [PDF]
R. K. Das, X. Tian, T. Kinnunen, H. Li, “The Attacker's Perspective on Automatic Speaker Verification: An Overview”, Proc. Interspeech 2020, pp. 4213--4217, Shanghai, China, October 2020 [PDF]
R. González Hautamäki and T. Kinnunen, “Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data”, Proc. Interspeech 2020, pp. 4313--4317, Shanghai, China, October 2020 [PDF]
X. Liu, M. Sahidullah, T. Kinnunen, “A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings”, Proc. Interspeech 2020, pp. 3221--3225, Shanghai, China, October 2020 [PDF]
V. Vestman, K.A. Lee, T. Kinnunen, “Neural i-Vectors”, Proc. Odyssey 2020, pp. 67--74, Tokyo, Japan, Nov. 2020 [PDF]
B. Chettri, T. Kinnunen, E. Benetos, “Subband modeling for Spoofing Detection in Automatic Speaker Verification”, Proc. Odyssey 2020, pp. 341--348, Tokyo, Japan, Nov. 2020 [PDF]
A. Kanervisto, V. Hautamäki, T. Kinnunen, J. Yamagishi, “An Initial Investigation on Optimizing Tandem Speaker Verification and Countermeasure Systems Using Reinforcement Learning”, Proc. Odyssey 2020, pp. 151--158, Tokyo, Japan, Nov. 2020 [PDF]
R. González Hautamäki and T. Kinnunen, “Towards Controlling False Alarm -- Miss Trade-Off in Perceptual Speaker Comparison via Non-Neutral Listening Task Framing”, Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 749--756, December 2019, Singapore. [PDF]
V. Vestman, K. A. Lee, T. Kinnunen, T. Koshinaka, “Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration”, Proc. Interspeech 2019, pp. 351--355, Graz, Austria, September 2019 [PDF]
M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, K. A. Lee, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection”, Proc. Interspeech 2019, pp. 1008--1012, Graz, Austria, September 2019 [PDF]
K. A. Lee, V. Hautamäki, T. Kinnunen, H. Yamamoto, K. Okabe, V. Vestman, J. Huang, G. Ding, H. Sun, A. Larcher, R. K. Das, H. Li, M. Rouvier, P. Bousquet, W. Rao, Q. Wang, C. Zhang, F. Bahmaninezhad, H. Delgado, M. Todisco, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda, T. N. Trong, M. Sahidullah, F. Lu, Y. Tang, M. Tu, K. K. Teh, H. D. Tran, K. K. George, I. Kukanov, F. Desnous, J. Yang, E. Yılmaz, L. Xu, J. Bonastre, C. Xu, Z. H. Lim, E. S. Chng, S. Ranjan, J. H. L. Hansen, J. Patino, N. Evans, “I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences”, Proc. Interspeech 2019, pp. 1497--1501, Graz, Austria, September 2019 [PDF]
X. Wu, E. Granger, T. Kinnunen, X. Feng, A. Hadid, “Audio-Visual Kinship Verification in the Wild”, Proc. 12th IAPR International Conference On Biometrics (ICB 2019), Crete, Greece, June 2019. [PDF]
T. Kinnunen, R. González Hautamäki, V. Vestman, M. Sahidullah, “Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection”, Proc. IEEE ICASSP, pp. 6146--6150, Brighton, UK, May 2019 [PDF] (Please find extended results/analyses in Computer Speech & Language article here)
V. Vestman, B. Soomro, A. Kanervisto, V. Hautamäki, T. Kinnunen, “Who Do I sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search”, Proc. IEEE ICASSP, pp. 5781--5785, Brighton, UK, May 2019 [PDF]
F. Fang, J. Yamagishi, I. Echizen, M. Sahidullah, T. Kinnunen, “Transforming Acoustic Characteristics to Deceive Playback Spoofing Countermeasures of Speaker Verification Systems”, Proc. IEEE Int. Workshop on Information Forensics and Security (WIFS) 2018, Hong Kong, China, 2018 [PDF]
M. Todisco, H. Delgado, K.A. Lee, M. Sahidullah, N. Evans, T. Kinnunen, J. Yamagishi, “Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion”, Proc. Interspeech 2018, pp. 77--81, Hyderabad, India, September 2018 [PDF].
A. Kato, T. Kinnunen, “Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks”, Proc. Interspeech 2018, pp. 327--331, Hyderabad, India, September 2018 [PDF]
S. Sieranoja, M. Sahidullah, T. Kinnunen, J. Komulainen, A. Hadid, “Audiovisual Synchrony Detection with Optimized Audio Features”, accepted to IEEE 3rd Int. Conference on Signal and Image Processing (ICSIP 2018), Shenzhen, China, July 2018 [PDF]
T. Kinnunen, K.A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, D.A. Reynolds, “t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification”, Proc. Odyssey 2018, pp. 312--319, Les Sables d'Olonne, France, June 2018 [PDF]
T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling, “A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment”, Proc. Odyssey 2018, pp. 187--194, Les Sables d'Olonne, France, June 2018 [PDF (original)], [PDF (corrected version) from arXiv with a bug fix)]
J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling, “The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods”, Proc. Odyssey 2018, pp. 195--202, Les Sables d'Olonne, France, June 2018 [PDF]
[The data and challenge results are available here]
V. Vestman and T. Kinnunen, “Supervector Compression Strategies to Speed up I-Vector System Development”, Proc. Odyssey 2018, pp. 357--364, Les Sables d'Olonne, France, June 2018 [PDF]
R. Gonzalez Hautamäki, A. Kanervisto, V. Hautamäki, T. Kinnunen, “Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification”, Proc. Odyssey 2018, pp. 320--326, Les Sables d'Olonne, France, June 2018 [PDF]
A. Kato and T. Kinnunen, “A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech”, Proc. Odyssey 2018, pp. 275--282, Les Sables d'Olonne, France, June 2018 [PDF]
J. Lorenzo-Trueba, F. Fang, X. Wang, I. Echizen, J. Yamagishi, T. Kinnunen, “Can we steal your vocal identity from the Internet? Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data”, Proc. Odyssey 2018, pp. 240--247, Les Sables d'Olonne, France, June 2018 [PDF]
H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K.A. Lee, J. Yamagishi, “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”, Proc. Odyssey 2018, pp. 296--303, Les Sables d'Olonne, France, June 2018 [PDF]
H. Delgado, M. Todisco, N. Evans, M. Sahidullah, W. M. Liu, F. Alegre, T. Kinnunen, B. Fauve, ”Impact of Bandwidth and Channel Variation on Presentation Attack Detection for Speaker Verification”, Proc. Int. Conf. of the Biometrics Special Interest Group (BIOSIG 2017), Darmstadt, Germany, September 2017 [PDF]
T. Kinnunen, M. Sahidullah, H. Delgado, M. Todisco, N. Evans, J. Yamagishi, K.A. Lee, ”The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection”, Proc. Interspeech 2017, pp. 2--6, Stockholm, Sweden, August 2017 [PDF]
V. Vestman, D. Gowda, M. Sahidullah, P. Alku, T. Kinnunen, ”Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions”, Proc. Interspeech 2017, pp. 1512--1516, Stockholm, Sweden, August 2017 [PDF]
K. A. Lee, V. Hautamäki, T. Kinnunen, A. Larcher, C. Zhang, A. Nautsch, T. Stafylakis, G. Liu, M. Rouvier, W. Rao, F. Alegre, J. Ma, M. W. Mak, A. K. Sarkar, H. Delgado, R. Saeidi, H. Aronowitz, A. Sizov, H. Sun, T. H. Nguyen, G. Wang, B. Ma, V. Vestman, M. Sahidullah, M. Halonen, A. Kanervisto, G. Le Lan, F. Bahmaninezhad, S. Isadskiy, C. Rathgeb, C. Busch, G. Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, P.-M. Bousquet, M. Ajili, W. B. Kheder, D. Matrouf, Z. H. Lim, C. Xu, H. Xu, X. Xiao, E. S. Chng, B. Fauve, K. Sriskandaraja, V. Sethu, W. W. Lin, D. A. L. Thomsen, Z.-H. Tan, M. Todisco, N. Evans, H. Li, J. H. L. Hansen, J.-F. Bonastre, E. Ambikairajah, ”The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016”, Proc. Interspeech 2017, pp. 1328--1332, Stockholm, Sweden, August 2017 [PDF]
A.K. Sarkar, M. Sahidullah, Z.-H. Tan, T. Kinnunen, ”Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data”, Proc. Interspeech 2017, pp. 2611--2615, Stockholm, Sweden, August 2017 [PDF]
T. Kinnunen, L. Juvela, P. Alku, J. Yamagishi, ”Non-parallel Voice Conversion Using i-Vector PLDA: Towards Unifying Speaker Verification and Transformation”, Proc. ICASSP 2017, pp. 5535--5539, New Orleans, US, March 2017 [PDF]
T. Kinnunen, M. Sahidullah, M. Falcone, L. Costantini, R. Gonzalez Hautamäki, D. Thomsen, A. Sarkar, Z.-H. Tan, H. Delgado, M. Todisco, N. Evans, V. Hautamäki, K. A. Lee, ”RedDots Replayed: A New Replay Spoofing Attack Corpus for Text-Dependent Speaker Verification Research”, Proc. ICASSP 2017, pp. 5395--5399, New Orleans, US, March 2017 [PDF] - see also ASVspoof2017 challenge
A. Kanervisto, V. Vestman, M. Sahidullah, V. Hautamäki, T. Kinnunen, ”Effects of Gender Information in Text-Independent and Text-Dependent Speaker Verification”, Proc. ICASSP 2017, pp. 5360--5364, New Orleans, US, March 2017 [PDF]
H. Delgado, M. Todisco, M. Sahidullah, A.K. Sarkar, N. Evans, T. Kinnunen, Z.-H. Tan, ”Further Optimisations of Constant Q Cepstral Processing for Integrated Utterance Verification and Text-Dependent Speaker Verification”, Proc. IEEE Workshop on Spoken Language Technology (SLT), San Diego, US, December 2016. [PDF]
S. Sieranoja, T. Kinnunen, P. Fränti, ”GPS Trajectory Biometrics: From Where You Were to How You Move”, Proc. Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR), pp. 450--560, Mérida, Mexico, December 2016 [PDF]
T. Kinnunen, A. Sholokhov, E. Khoury, D. Thomsen, M. Sahidullah and Z.-H. Tan, ”HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors”, Proc. Interspeech, pp. 2992--2996, San Francisco, US, September 2016 [PDF]
M. Sahidullah, H. Delgado, M. Todisco, H. Yu, T. Kinnunen, N. Evans and Z.-H. Tan, ”Integrated Spoofing Countermeasures and Automatic Speaker Verification: an Evaluation on ASVspoof 2015”, Proc. Interspeech, pp. 1700--1704, San Francisco, US, September 2016 [PDF]
M. Sahidullah, R. González Hautamäki, D.A.L. Thomsen, T. Kinnunen, Z.-H. Tan, V. Hautamäki, R. Parts, M. Pitkänen, ”Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech”, Proc. Interspeech, pp. 1720--1724, San Francisco, US, September 2016 [PDF]
T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco, A. sarkar, N. Thomsen, V. Hautamäki, N. Evans, Z.-H. Tan, ”Utterance Verification for Text-Dependent Speaker Recognition: a Comparative Assessment Using the RedDots Corpus”, Proc. Interspeech, pp. 430--434, San Francisco, US, September 2016 [PDF]
R. Gonzalez Hautamäki, M. Sahidullah, T. Kinnunen, V. Hautamäki, ”Age-related voice disguise and its impact on speaker verification accuracy”, Proc. Odyssey: the Speaker and Language Recognition Workshop, pp. 277--282, Bilbao, Spain, June 2016 [PDF]
A. Sizov, K.A. Lee, T. Kinnunen, ”Discriminating languages in a probabilistic latent subspace”, Proc. Odyssey: the Speaker and Language Recognition Workshop, pp. 81--88, Bilbao, Spain, June 2016 [PDF]
A. H. Poorjam, R. Saeidi, T. Kinnunen, V. Hautamäki, ”Incorporating uncertainty as a quality measure in i-vector based language recognition”, Proc. Odyssey: the Speaker and Language Recognition Workshop, pp. 74--80, Bilbao, Spain, June 2016 [PDF]
H. Behravan, T. Kinnunen, V. Hautamäki, ”Out-of-set i-vector selection for open-set language identification”, Proc. Odyssey: the Speaker and Language Recognition Workshop, 2016, pp. 303--310, Bilbao, Spain, June 2016 [PDF]
A. Sholokhov, T. Kinnunen, S. Cumani, ”Discriminative multi-domain PLDA for speaker verification”, Proc. ICASSP 2016, pp. 5030--5034, Shanghai, China, March 2016 [PDF]
Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilçi, M. Sahidullah, A. Sizov, ”ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge”, Proc. Interspeech 2015, pp. 2037--2041, Dresden, Germany, September 2015 [PDF]
M. Sahidullah, T. Kinnunen, C. Hanilçi, ”A comparison of features for synthetic speech detection”, Proc. Interspeech 2015, pp. 2087--2091, Dresden, Germany, September 2015 [PDF]
C. Hanilçi, T. Kinnunen, M. Sahidullah, A. Sizov, ”Classifiers for synthetic speech detection: a comparison”, Proc. Interspeech 2015, pp. 2057--2061, Dresden, Germany, September 2015 [PDF]
R. Saeidi, T. Niemi, H. Karppelin, J. Pohjalainen, T. Kinnunen, P. Alku, ”Speaker recognition for speech under face cover”, Proc. Interspeech 2015, pp. 1012--1016, Dresden, Germany, September 2015 [PDF]
A. Fedorova, O. Glembek, T. Kinnunen, P. Matějka, ”Exploring ANN back-ends for i-vector based speaker age estimation”, Proc. Interspeech 2015, pp. 3036--3040, Dresden, Germany, September 2015 [PDF]
E. Khoury, T. Kinnunen, A. Sizov, Z. Wu, S. Marcel, ”Introducing i-vectors for joint anti-spoofing and speaker verification”, Proc. Interspeech 2014, pp. 61--65, Singapore, September 2014 [PDF]
H. Behravan, V. Hautamäki, S.M. Siniscalchi, E. Khoury, T. Kurki, T. Kinnunen, C.-H. Lee, ”Dialect levelling in Finnish: a universal speech attribute approach”, Proc. Interspeech 2014, pp. 2165--2169, Singapore, September 2014 [PDF]
A. Sizov, K.A. Lee, T. Kinnunen, ”Unifying probabilistic linear discriminant analysis variants in biometric authentication”, Proc. Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014), pp. 464--475, Joensuu, Finland, August 2014 [PDF] [Fast PLDA implementation]
V. Hautamäki, A. Pöllänen, T. Kinnunen, K.A. Lee, H. Li and P. Fränti, ”A comparison of categorical attribute data clustering methods”, Proc. Joint Int. Workshop on Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2014), pp. 53--62, Joensuu, Finland, August 2014 [PDF]
R. Gonzalez Hautamäki, T. Kinnunen, V. Hautamäki, A.-M. Laukkanen, ”Comparison of human listeners and speaker verification systems using voice mimicry data”, Proc. Odyssey 2014: The Speaker & Language Recognition Workshop, pp. 137--144, Joensuu, Finland, June 2014 [PDF]
C.S. Greenberg, D. Bansé, G.R. Doddington, D. Garcia-Romero, J. J. Godfrey, T. Kinnunen, A.F. Martin, A. McCree, M. Przybocki, D.A. Reynolds, ”The NIST 2014 speaker recognition i-vector machine learning challenge”, Proc. Odyssey 2014: The Speaker & Language Recognition Workshop, pp. 224--230, Joensuu, Finland, June 2014. [PDF]
H. Behravan, V. Hautamäki, S.M. Siniscalchi, T. Kinnunen, C.-H. Lee, ”Introducing attribute features to foreign accent recognition”, Proc. ICASSP 2014, pp. 5332--5336, Florence, Italy, May 2014 [PDF]
A. Sholokhov, T. Pekhovsky, O. Kudashev, A. Shulipa, T. Kinnunen, ”Bayesian analysis of similarity matrices for speaker diarization”, Proc. ICASSP 2014, pp. 106--110, Florence, Italy, May 2014 [PDF]
Z. Wu, T. Virtanen, T. Kinnunen, E.S. Chng, H. Li, ”Exemplar-based voice conversion using non-negative spectrogram deconvolution”, Proc. 8th ISCA Speech Synthesis Workshop (SSW'13), pp. 201--206, Barcelona, Spain, September 2013. [PDF]
T. Kinnunen, Md. J. Alam, P. Matejka, P. Kenny, J. Cernocky, D. O'Shaughnessy, ”Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations”, Proc. Interspeech 2013, pp. 3122--3126, Lyon, France, August 2013 [PDF]
C. Hanilci, T. Kinnunen, P. Rajan, J. Pohjalainen, P. Alku, F. Ertas, Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort”, Proc. Interspeech 2013, pp. 2881--2885, Lyon, France, August 2013 [PDF]
P. Rajan, T. Kinnunen, C. Hanilci, J. Pohjalainen, P. Alku, ”Using group delay functions from all-pole models for speaker recognition”, Proc. Interspeech 2013, pp. 2489--2493, Lyon, France, August 2013 [PDF]
Z. Wu, A. Larcher, K.A. Lee, E.S. Chng, T. Kinnunen, H. Li, ”Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints”, Proc. Interspeech 2013, pp. 950--954, Lyon, France, August 2013. [PDF]
Z. Wu, T. Virtanen, T. Kinnunen, E.S. Chng, H. Li, ”Exemplar-based unit selection for voice conversion utilizing temporal information”, Proc. Interspeech 2013, pp. 3057--3061, Lyon, France, August 2013 [PDF]
V. Hautamäki, K.A. Lee, D. van Leeuwen, R. Saeidi, A. Larcher, T. Kinnunen, T. Hasan, S.O. Sadjadi, G. Liu, H. Boril, J.H.L. Hansen, B. Fauve, "Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion", Proc. Interspeech 2013, pp. 1609--1613, Lyon, France, August 2013. [PDF]
N. Evans, T. Kinnunen, J. Yamagishi, ”Spoofing and countermeasures for automatic speaker verification”, Proc. Interspeech 2013, pp. 925--929, Lyon, France, August 2013 [PDF]
R. Gonzalez Hautamäki, T. Kinnunen, V. Hautamäki, T. Leino, A.-M. Laukkanen, ”I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry”, Proc. Interspeech 2013, pp. 930--934, Lyon, France, August 2013 [PDF]
R. Gonzalez Hautamäki, V. Hautamäki, P. Rajan and T. Kinnunen, ”Merging human and automatic system decisions to improve speaker recognition performance”, Proc. Interspeech 2013, pp. 2519--2523, Lyon, France, August 2013 [PDF]
R. Saeidi, K. A. Lee, T. Kinnunen, T. Hasan, B. Fauve, P. -M. Bousquet, E. Khoury,P. L. Sordo Martinez, J. M. K. Kua, C. H. You, H. Sun, A. Larcher, P. Rajan, V. Hautamäki, C. Hanilci, B. Braithwaite, R. Gonzales-Hautamäki, S. O. Sadjadi, G. Liu, H. Boril, N. Shokouhi, D. Matrouf, L. El Shafey, P.Mowlaee, J. Epps, T. Thiruvaran, D. A. van Leeuwen, B. Ma, H. Li, J. H. L. Hansen, J.-F. Bonastre, S. Marcel, J. Mason, E. Ambikairajah, "I4U submission to NIST SRE 2012: A large-scale collaborative effort for noise-robust speaker verification", Proc. Interspeech 2013, pp. 1986--1990, Lyon, France, August 2013. [PDF]
H. Behravan, V. Hautamäki, T. Kinnunen, ”Foreign Accent Detection from Spoken Finnish Using i-Vectors”, Proc. Interspeech 2013, pp. 79--83, Lyon, France, August 2013 [PDF]
P. Rajan, T. Kinnunen, V. Hautamäki, ”Effect of multicondition training on i-vector PLDA configurations for speaker recogntion”, Proc. Interspeech 2013, pp. 3694--3697, Lyon, France, August 2013 [PDF]
T. Kinnunen, P. Rajan, ”A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data”, Proc. Int. Conf. on Acoustics,
Speech and Signal Processing (ICASSP 2013), pp. 7229--7233, Vancouver, Canada, May 2013. [PDF] [MATLAB CODE]
C. Hanilci, T. Kinnunen, R. Saeidi, J. Pohjalainen, P. Alku, F. Ertas, ”Speaker Identification From Shouted Speech: Analysis and Compensation”, Proc. Int. Conf. on Acoustics, Speech and
Signal Processing (ICASSP 2013), pp. 8027--8031, Vancouver, Canada, May 2013 [PDF]
Z. Wu, T. Kinnunen, E.S. Chng, H. Li, E. Ambikairajah, ”A Study on spoofing attack in state-of-the-art speaker verification: the telephone speech case”, Proc. 2012 Asia-Pacific Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC 2012), pp. 1--5, Hollywood, USA, December 2012 [PDF] (BEST PAPER AWARD)
T. Kinnunen, R. Saeidi, J. Leppänen, J.P. Saarinen, ”Audio context recognition in variable mobile environments from short segments using speaker and language recognizers”, Proc. Odyssey: the speaker and language recognition workshop, Singapore, June 2012. [PDF]
C. Hanilci, T. Kinnunen, R. Saeidi, J. Pohjalainen, P. Alku, F. Ertas, ”Regularization of all-pole models for speaker verification under additive noise”, Proc. Odyssey: the speaker and language recognition workshop, Singapore, June 2012. [PDF]
V. Hautamäki, K.A. Lee, A. Larcher, T. Kinnunen, B. Ma, H. Li, ”Variational Bayes logistic regression as regularized fusion for NIST SRE 2010”, Proc. Odyssey: the speaker and language recognition workshop, Singapore, June 2012. [PDF]
T. Kinnunen, H. Leisma, M. Machunik, T. Kakkonen, J.-L. Lebrun, “SWAN - Scientific Writing AssistaNt. A Tool for Helping Scholars to Write Reader-Friendly Manuscripts”, demonstrator in the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, April 2012. [PDF]
T. Kinnunen, Z.-Z. Wu, K. A. Lee, F. Sedlak, E. S. Chng, H. Li, “Vulnerability of Speaker Verification Systems Against Voice Conversion Spoofing Attacks: the Case of Telephone Speech”, Proc. ICASSP 2012, pp. 4401--4404, Kyoto, Japan, March 2012 [PDF].
S. Siddiq, T. Kinnunen, M. Vainio, S. Werner, “Intonational Speaker Verification: a Study on Parameters and Performance Under Noisy Conditions”, Proc. ICASSP 2012, pp. 4777--4780, Kyoto, Japan, March 2012 [PDF].
C. Hanilci, T. Kinnunen, R. Saeidi, J. Pohjalainen, P. Alku, F. Ertas, J. Sandberg, M. Hansson-Sandsten, “Comparing Spectrum Estimators in Speaker Verification Under Additive Noise”, Proc. ICASSP 2012, pp. 4769--4772, Kyoto, Japan, March 2012 [PDF].
J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy, “Multitaper MFCC Features for Speaker Verification Using i-Vectors”, Proc. IEEE Automatic Speech Recognition and Understanding (ASRU 2011), pp. 547--552, Hawaii, December 2011. [PDF] [Multitaper Matlab code]
J. Rodriguez-Fuentes, M. Penagarikano, A. Varona, M. Diez, G. Bordel, D. Martinez, J. Villalba, A. Miguel, A. Ortega, E. Lleida, A. Abad, O. Koller, I. Trancoso, P. Lopez-Otero, L. Docio-Fernandez, C. Garcia-Mateo, R. Saeidi, M. Soufifar, T. Kinnunen, T. Svendsen, P. Fränti, “Multi-Site Heterogenous System Fusions for the Albayzin 2010 Language Recognition Evaluation”, Proc. IEEE Automatic Speech Recognition and Understanding (ASRU 2011), pp. 377--382, Hawaii, December 2011. [PDF]
Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti, "Combining voice activity detectors using decision fusion", Proc. Speech and Computer (SPECOM 2011), Kazan, Russia, September 2011, pp. 278--283.
V. Hautamäki, K.A. Lee, T. Kinnunen, B. Ma, H. Li, “Regularized Logistic Regression Fusion for Speaker Verification”, Proc. Interspeech 2011, Florence, Italy, pp. 2745-2748, August 2011 [PDF]
P. Mowlaee, R. Saeidi, Z.-H. Tan, M.G. Christensen, T. Kinnunen, P. Fränti, S.H. Jensen, “Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge”, Proc. Interspeech 2011, Florence, Italy, August 2011, pp. 677-680. [PDF]
F. Sedlak, T. Kinnunen, V. Hautamäki, K.A. Lee, H. Li, “Classifier Subset Selection and Fusion for Speaker Verification”, Proc. ICASSP 2011, pp. 4544--4547, Prague, Czech Republic, May 2011. [PDF] [video and slides]
J. Pohjalainen, P. Alku, T. Kinnunen, “Shout Detection in Noise”, Proc. ICASSP 2011, pp. 4968--4971, Prague, Czech Republic, May 2011. [PDF]
T. Kinnunen, R. Saeidi, J. Sandberg, M. Hansson-Sandsten, “What Else is New Than the Hamming Window? Robust MFCCs for Speaker Recognition via Multitapering”, Proc. Interspeech 2010, pp. 2734--2737, Makuhari, Japan, Sept. 2010. [PDF] [Multitaper Matlab code]
J. Pohjalainen, R. Saeidi, T. Kinnunen, P. Alku, “Extended Weighted Linear Prediction (XLP) Analysis of Speech and its Application to Speaker Verification in Adverse Conditions”, Proc. Interspeech 2010, pp. 1477--1480, Makuhari, Japan, Sept. 2010. [PDF]
Z.-Z. Wu, T. Kinnunen, E.S. Chng, H. Li, “Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion”, Proc. Interspeech 2010, pp. 1732--1735, Makuhari, Japan, Sept. 2010. [PDF]
V. Hautamäki, T. Kinnunen, M. Nosratighods, K.A. Lee, B. Ma, H. Li, “Approaching Human Listener Accuracy with Modern Speaker Verification”, Proc Interspeech 2010, pp. 1473--1476, Makuhari, Japan, Sept. 2010. [PDF]
R. Saeidi, P. Mowlaee, T. Kinnunen, Z.-H. Tan, M.G. Christensen, S.H. Jensen, P. Fränti, “Improving Monaural Speaker Identification by Double-Talk Detection”, Proc Interspeech 2010, pp. 1069--1072, Makuhari, Japan, Sept. 2010. [PDF]
R. Saeidi, J. Pohjalainen, T. Kinnunen, P. Alku, “Temporally Weighted Linear Prediction Features for Speaker Verification in Additive Noise”, Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, pp. 40-46, June 2010. [PDF]
R. Saeidi, P. Mowlaee, T. Kinnunen, Z. H Tan, M. G. Christensen, S. H. Jensen and P. Fränti, “Signal-to-signal ratio independent speaker identification for co-channel speech signals”, Int. Conf. on Pattern Recognition (ICPR 2010), pp. 4545--456, Istanbul, Turkey, August 2010. [PDF]
K. A. Lee, H. Li, C. H. You, T. Kinnunen, K. C. Sim, “Discrete Expected Likelihood Kernel for SVM-Based Speaker Verification”, Proc. 18th European Signal Processing Conference (EUSIPCO 2010), pp. 591--595, Aalborg, Denmark, August 2010 [PDF]
T. Kinnunen, F. Sedlak, R. Bednarik, “Towards Task-Independent Person Authentication Using Eye Movement Signals”, Proc. of the 2010 Symposium on Eye-Tracking Research and Applications (ETRA 2010), pp. 187--190, Austin, Texas, March 2010. [PDF]
R. Saeidi, T. Kinnunen, H.R.S. Mohammadi, R. Rodman, P. Fränti, “Joint Frame and Gaussian Selection for Text-Independent Speaker Verification”, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp. 4530--4533, Dallas, Texas, USA, March 2010. [PDF]
T. Kinnunen and P. Alku, “On separating glottal source and vocal tract information in telephony speaker verification”, Proc. Int. conference on acoustics, speech, and signal processing (ICASSP 2009), pp. 4545--4548, Taipei, Taiwan, April 2009. [PDF]
T. Kinnunen, J. Saastamoinen, V. Hautamäki, M. Vinni and P. Fränti, “Comparing maximum a posteriori vector quantization and Gaussian mixture models in speaker verification”, Proc. Int. conference on acoustics, speech, and signal processing (ICASSP 2009) , pp. 4229--4232, Taipei, Taiwan, April 2009. [PDF]
P. Fränti, J. Saastamoinen, I. Kärkkäinen, T. Kinnunen, V. Hautamäki, I. Sidoroff, “Developing Speaker Recognition System: from Prototype to Practical Application”, Int. Conf. Forensic Applications and Techniques in Telecommunications, Information and Multimedia (e-Forensics'09), Adelaide, Australia, LNICST vol. 8, 101-114, January 2009.
K.A. Lee, C. You, H. Li, T. Kinnunen and D. Zhu, “Characterizing Speech Utterances for Speaker Verification with Sequence Kernel SVM”, Proc. Interspeech 2008 , pp. 1397-1400, Brisbane, Australia, 2008. [PDF]
T. Kinnunen, K.A. Lee and H. Li, “Dimension Reduction of the Modulation Spectrogram for Speaker Verification”, Proc. Odyssey: The Speaker and Language Recognition Workshop, Stellenbosch, South Africa, January 2008. [PDF]
T. Kinnunen, E. Chernenko, M. Tuononen, P. Fränti, H. Li, “Voice Activity Detection Using MFCC Features and Support Vector Machine”, Proc. Speech and Computer 2007 (SPECOM), vol. 2, 556-561, Moscow, Russia, October 2007. [PDF]
K.A. Lee, C. You, H. Li, T. Kinnunen, ”A GMM-based Probabilistic Sequence Kernel for Speaker Verification”, Proc. Interspeech 2007, p. 294-297, Antwerp, Belgium, August 2007.[PDF]
T. Kinnunen, B. Zhang, J. Zhu, Y. Wang, ”Speaker Verification with Adaptive Spectral Subband Centroids”, Proc. Int. Conf. Biometrics, pp. 58-66, Lecture Notes in Computer Science 4642, Seoul, Korea, August 2007. [PDF]
R. Saeidi, R.S. Mohammadi, R. Rodman, T. Kinnunen, "A New Segmentation Algorithm Combined with Transient Frames Power for Text-Independent Speaker Verification", Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Vol. 4, pp. 305-308, Honolulu, Hawaii, April 2007. [PDF]
T. Kinnunen, C.W.E. Koh, L. Wang, H. Li, and E.S. Chng, "Temporal Discrete Cosine Transform: Towards Longer Term Temporal Features for Speaker Verification", Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'2006), LNAI 4274, pp. 547-558, Singapore, December 2006.[PDF]
T. Kinnunen, V. Hautamäki, P. Fränti, “On the Use of Long-Term Average Spectrum in Automatic Speaker Recognition” , Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'2006), pp. 559-567, Singapore, December 2006. [PDF]
K.A. Lee, H. Sun, R. Tong, B. Ma, M. Dong, C. You, D. Zhu, C.W.E. Koh, L. Wang, T.Kinnunen, E.S. Chng and H. Li, "The IIR Submission to CSLP 2006 Speaker Recognition Evaluation", Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'2006), LNAI 4274, pp. 494-505, Singapore, December 2006. [PDF]
R. Tong, B. Ma, K.A. Lee, C. You, D. Zhou, T. Kinnunen, H. Sun, M. Dong, E.S. Chng, H. Li, “The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features”, Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'2006), pp. 566-577, Singapore, December 2006. [PDF]
T. Kinnunen, "Joint Acoustic-Modulation Frequency for Speaker Recognition", Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2006), Vol. I, pp. 665-668, May 14-19, 2006, Toulouse, France. [PDF]
T. Kinnunen, R. Gonzalez-Hautamäki, “Long-Term F0 Modeling for Text-Independent Speaker Recognition”, Proc. Int. Conf. on Speech and Computer (SPECOM'2005), pp. 567-570, Patras, Greece, October 2005. [PDF]
J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti, "On factors affecting MFCC-based speaker recognition accuracy", Proc. Int. Conf. on Speech and Computer (SPECOM'05), pp. 503-506, Patras, Greece, October 2005 [PDF].
H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Proc. Int. Conf. on Speech and Computer (SPECOM'05), pp. 551-554, Patras, Greece, October 2005. [PDF]
O. Grebenskaya, T. Kinnunen, P. Fränti, “Speaker Clustering in Speech Recognition”, Proc. 2005 Finnish Signal Processing Symposium (FINSIG’05), pp. 46-49, Kuopio, Finland, August 25, 2005. [PDF]
V. Hautamäki, S. Cherednichenko, I. Kärkkäinen, T. Kinnunen, P. Fränti, “Improving K-Means by Outlier Removal”, Proc. 14th Scandinavian Conference on Image Analysis (SCIA 2005), Lecture Notes in Computer Science 3540, pp. 978-987, Joensuu, Finland, June 19-22, 2005.
R. Bednarik, T. Kinnunen, A. Mihaila, P. Fränti, “Eye-Movements as a Biometric“, Proc. 14th Scandinavian Conference on Image Analysis (SCIA 2005), Lecture Notes in Computer Science 3540, pp. 780-789, Joensuu, Finland, June 19-22, 2005. [PDF]
T. Niemi-Laitinen, J. Saastamoinen, T. Kinnunen, P Fränti, ”Applying MFCC-Based Automatic Speaker Recognition to GSM and Forensic Data“, Proc. Human Language Technologies (HLT’2005), p. 317-322, Tallinn, Estonia, April 4-5, 2005 [PDF]
T. Kinnunen, E. Karpov, P. Fränti, “Efficient Online Cohort Selection Method for Speaker Verification”, Proc. 8th Int. Conf. on Spoken Language Processing (ICSLP 2004), Vol. III, pp. 2401-2402, Jeju Island, Korea, Oct. 4-8, 2004 [PDF].
T. Kinnunen, E. Karpov, P. Fränti, “Real-Time Speaker Identification”, Proc. 8th Int. Conf. on Spoken Language Processing (ICSLP'2004), Vol. III, pp. 1805-1808, Jeju Island, Korea, Oct. 4-8, 2004. [PDF]
T. Kinnunen, V. Hautamäki, P. Fränti, “Fusion of Spectral Feature Sets for Accurate Speaker Identification”, Proc. 9th International Conference Speech and Computer (SPECOM'2004), pp. 361-365, St. Petersburg, Russia, September 20-22, 2004. [PDF]
E. Karpov, T. Kinnunen, P. Fränti, "Symmetric Distortion Measure for Speaker Recognition", Proc. 9th International Conference Speech and Computer (SPECOM'2004), pp. 366-370, St. Petersburg, Russia, September 20-22, 2004. [PDF]
T. Kinnunen, V. Hautamäki, P. Fränti, "On the Fusion of Dissimilarity-Based Classifiers for Speaker Identification", Proc. 8th European Conference on Speech Communiation and Technology (Eurospeech 2003), pp. 2641-2644, Geneva, Switzerland, September 1-4, 2003. [PDF]
T. Kinnunen, E. Karpov, P. Fränti: "A Speaker Pruning Algorithm for Real-Time Speaker Identification", Proc. 4th International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA 2003), pp. 639-646, Guilford, UK, June 9-11, 2003. [PDF]
T. Kinnunen: "Designing a Speaker-Discriminative Adaptive Filter Bank for Speaker Recognition", Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), pp. 2325-2328, Denver, Colorado, USA, September 16-20, 2002. [PDF]
T. Kinnunen, I. Kärkkäinen: "Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification", Proc. Joint IAPR International Workshop on Statistical Pattern Recognition (S+SPR 2002), pp. 681-688, Windsor, Canada, August 6-9, 2002. [PDF]
T. Kinnunen, I. Kärkkäinen, P. Fränti: "Is Speech Data Clustered? - Statistical Analysis of Cepstral Features", Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)), vol. 4, pp. 2627-2630, Aalborg, Denmark, September 3-7, 2001. [PDF]
T. Kinnunen, P. Fränti: "Speaker Discriminative Weighting Method for VQ-Based Speaker Identification", Proc. 3rd International Conference on audio-and video-based biometric person authentication (AVBPA 2001), pp. 150-156, Halmstad, Sweden, June 6-8, 2001. [PDF]
T. Kinnunen, T. Kilpeläinen, P. Fränti: "Comparison of Clustering Algorithms in Speaker Identification", Proc. IASTED Int. Conf. Signal Processing and Communications (SPC 2000), pp. 222-227, Marbella, Spain, September 19-22, 2000. [PDF]

Theses:

T. Kinnunen: Optimizing Spectral Feature Based Text-Independent Speaker Recognition, PhD thesis, Department of Computer Science, University of Joensuu, 2005. [PDF]
T. Kinnunen: Spectral Features for Automatic Text-Independent Speaker Recognition, Ph.Lic. thesis, Department of Computer Science, University of Joensuu, 2003. [PDF]. [Summary of the results in PPT presentation]
T. Kinnunen: Automaattinen puhujan tunnistus, Pro gradu -tutkielma, Tietojenkäsittelytieteen laitos, Joensuun yliopisto, 1999. ("Automatic speaker recognition", M.Sc. thesis, Department of Computer Science, University of Joensuu, 1999.) [PDF] (in Finnish).

Other publications:

T. Kinnunen, M. Launiala, E. Silvennoinen, “Forensinen tiede ja puhujan tunnistaminen”, Edilex 2019/16, ss. 1-19 (in Finnish; “Forensic science and speaker recognition”, a brief overview article of speaker recognition technology targeted for legal experts)
T. Kinnunen, V. Hautamäki, “Automaattinen puhujantunnistus”, in O. Aaltonen, R. Aulanko, A. Iivonen, A. Klippi, M. Vainio (Eds.), Puhuva Ihminen - puhetieteiden perusteet, Otava, 2009. (“Automatic speaker recognition”; a book chapter in Finnish about basics of speaker recognition for non-technical audience).
P. Fränti, J. Saastamoinen, I. Kärkkäinen, T. Kinnunen, V. Hautamäki, I. Sidoroff, ”Implementing Speaker Recognition System: from Matlab to Practice”, Report A-2007-4, University of Joensuu, Department of Computer Science and Statistics (ISBN 978-952-219-061-1, ISSN 1796-7317), November 2007. [PDF]
T. Kinnunen, I. Kärkkäinen, P. Fränti, “The Mystery of Cohort Selection”, Report A-2005-1, Report series A, University of Joensuu, Department of Computer Science (ISBN 952-458-676-2, ISSN 0789-7316).

Tomi H. Kinnunen

Professor, PhD, PhLic, Docent

Professor of speech technology, specialized in speaker recognition and security (spoofing attacks and countermeasures; co-founder of ASVspoof challenge series).