2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
  • Satellite Venue: Shenzhen, China (Postponed)

PAN-4: Beyond Words: Recognition, Spoofing, and Anonymization of Individual Traits in Speech
Thu, 26 May, 15:30 - 17:00 Singapore Time (UTC +8)
Thu, 26 May, 09:30 - 11:00 France Time (UTC +2)
Thu, 26 May, 07:30 - 09:00 UTC
Thu, 26 May, 03:30 - 05:00 New York Time (UTC -4)
Location: TBD
Moderator: Kong Aik LEE, Institute for Infocomm Research, A*STAR, Singapore (in-person)
  • Emmanuel Vincent, Inria, France (in-person)
  • Tomi Kinnunen, University of Eastern Finland, Finland (virtual)
  • Junichi Yamagishi, National Institute of Informatics, Japan (in-person)
  • Oldrich Plchot, Brno University of Technology, Czech Republic (in-person)
  • Rohan Kumar Das, Fortemedia, Singapore (in-person)

Speech is among the most natural and convenient means of biometric authentication. The individual traits embedded in the speech signals form the basis of speaker recognition or voice authentication. With the widespread availability of speech synthesis tools, the threat from spoofing attacks to speaker recognition systems is growing since fraudsters can use these tools to produce a natural-sounding speech of a victim. While research on speech anti-spoofing has seen significant progress in the past few years, privacy concerns have called for the need for speech anonymization. In this panel, we invite world-leading experts to share their opinions on the security and the privacy expects in handling individual traits in speech, the challenges posed by the advancement in neural speech synthesizers, and the collaborative efforts that could be put together in answering the concerns and challenges.

  • 10 minutes of introduction (moderator)
  • 30 minutes of presentation (panelists)
  • 50 minutes of open discussion
  • Call for contribution to ASVspoof5
Kong Aik Lee

Kong Aik Lee is currently a Senior Scientist at the Institute for Infocomm Research, A*STAR, Singapore. He was a Senior Principal Researcher at the Data Science Research Laboratories, NEC Corporation, Japan, from 2018 to 2020. He received his Ph.D. degree from Nanyang Technological University, Singapore, in 2006. After which he joined the Institute for Infocomm Research, Singapore, as a Research Scientist and then a Strategic Planning Manager (concurrent appointment). He was the recipient of the Singapore IES Prestigious Engineering Achievement Award 2013 for his contribution to voice biometrics technology, the Outstanding Service Award by IEEE ICME 2020, and the 2021 A*STAR CRF (UIBR) Award. He was the Lead Guest Editor for the CSL Special Issue on “Two decades into Speaker Recognition Evaluation - are we there yet?” Currently, he serves as an Editorial Board Member for Elsevier Computer Speech and Language (2016 - present) and was an Associate Editor for IEEE/ACM Transactions on Audio, Speech, and Language Processing (2017 - 2021). He is an elected member of the IEEE Speech and Language Processing Technical Committee (2019 – 2021,2022 – 2024) and was the General Chair of the Speaker Odyssey 2020 Workshop. His research focuses on the automatic and para-linguistic analysis of speaker characteristics, ranging from speaker recognition, language, and accent recognition, diarization, voice biometrics, spoofing, and countermeasure.

Emmanuel Vincent

Emmanuel Vincent received the Ph.D. degree in music signal processing from IRCAM in 2004 and joined Inria, the French national research institute for digital science and technology, in 2006. He is currently a Senior Research Scientist and the Head of Science of Inria Nancy - Grand Est. His research covers several speech and audio processing tasks, with a focus on privacy preservation, learning from little or no labeled data, source separation and speech enhancement, and robust speech and speaker recognition. He is a founder of the MIREX, SiSEC, CHiME, and VoicePrivacy challenge series. He is a scientific advisor of the startup company Nijta, which provides speech anonymization solutions.

Tomi H. Kinnunen

Tomi H. Kinnunen is a Professor at the University of Eastern Finland. He received his Ph.D. degree in computer science from the University of Joensuu in 2005. From 2005 to 2007, he was an Associate Scientist at the Institute for Infocomm Research (I2R), Singapore. Since 2007, he has been with UEF. From 2010 to 2012, he was funded by a postdoctoral grant from the Academy of Finland. He has been a PI or co-PI in three other large Academy of Finland-funded projects and a partner in the H2020-funded OCTAVE project. He chaired the Odyssey workshop in 2014. From 2015 to 2018, he served as an Associate Editor for IEEE/ACM Trans. on Audio, Speech, and Language Processing and from 2016 to 2018 as a Subject Editor in Speech Communication. In 2015 and 2016, he visited the National Institute of Informatics, Japan, for 6 months under a mobility grant from the Academy of Finland, with a focus on voice conversion and spoofing. Since 2017, he has been Associate Professor at UEF, where he leads the Computational Speech Group. He is one of the cofounders of the ASVspoof challenge, a nonprofit initiative that seeks to evaluate and improve the security of voice biometric solutions under spoofing attacks.

Junichi Yamagishi

Junichi Yamagishi is a professor at the National Institute of Informatics in Japan. He is also a senior research fellow in the Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK. He was awarded a Ph.D. by the Tokyo Institute of Technology in 2006 for a thesis that pioneered speaker-adaptive speech synthesis and was awarded the Tejima Prize as the best Ph.D. thesis at the Tokyo Institute of Technology in 2007. Since 2006, he has authored and co-authored over 250 refereed papers in international journals and conferences. He was awarded the Itakura Prize from the Acoustic Society of Japan, the Kiyasu Special Industrial Achievement Award from the Information Processing Society of Japan, the Young Scientists’ Prize from the Minister of Education, Science and Technology, the JSPS Prize, the Docomo mobile science award in 2010, 2013, 2014, 2016, and 2018, respectively. He served previously as a co-organizer for the bi-annual ASVspoof special sessions at INTERSPEECH 2013-9, the bi-annual Voice conversion challenge at INTERSPEECH 2016, and Odyssey 2018, an organizing committee member for the 10th ISCA Speech Synthesis Workshop 2019 and a technical program committee member for IEEE ASRU 2019. He also served as a member of the IEEE Speech and Language Technical Committee, as an Associate Editor of the IEEE/ACM TASLP, and as the Lead Guest Editor for the IEEE JSTSP SI on Spoofing and Countermeasures for Automatic Speaker Verification. He is currently a PI of JST-CREST and ANR supported VoicePersonae project. He also serves as a chairperson of ISCA SynSIG and as a Senior Area Editor of the IEEE/ACM TASLP.

Oldrich Plchot

Oldrich Plchot, Ing. [MS]. Brno University of Technology, 2007, Ph.D. Brno University of Technology, 2014, is senior researcher in BUT Speech@FIT research group. He worked on EU-sponsored project MOBIO (7th FP) as well as in several projects sponsored at the local Czech level. He was the technical lead of US-Air Force EOARD sponsored project “Improving the capacity of language recognition systems to handle rare languages using radio broadcast data”, and key member of personnel in BEST project and RATS Patrol project sponsored by U.S. IARPA and DARPA respectively. He participated at several high-profile international research workshops: BOSARIS held in Brno in 2010 and 2012 and at the Johns Hopkins University (MD, USA) summer research workshop in 2013. He significantly contributed to the success of BUT team in international evaluations organized by NIST (Speaker recognition since 2010, Language recognition since 2007) as well as in evaluations organized within IARPA and DARPA projects. He has authored or co-authored more than 50 papers including IEEE Transactions on Audio, Speech, and Language Processing and high-profile conferences such as ICASSP, and Interspeech. He is recipient of 2016 “Josef Hlávka Prize” awarded to the most talented PhD students and young researchers of Czech technical Universities.

Rohan Kumar Das

Rohan Kumar Das is currently a Research and Development (R&D) Manager at Fortemedia, Singapore division. Prior to that he was associated with National University of Singapore as a Research Fellow from 2017-2021 and as a Data Scientist in KOVID Research Labs, India in the year 2017. He is a Ph.D. graduate from Indian Institute of Technology (IIT) Guwahati. He was one of the organizers of the special sessions on “The Attacker’s Perspective on Automatic Speaker Verification”, “Far-Field Speaker Verification Challenge 2020” in Interspeech 2020, and the Voice Conversion Challenge 2020. He served as Publication Chair of IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019 and one of the Chairs of Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020. He is a Senior Member of IEEE, a member of ISCA and APSIPA. His research interests are speech/audio signal processing, speaker verification, anti-spoofing, social signal processing and various applications of deep learning.