2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
  • Satellite Venue: Shenzhen, China (Postponed)

IEP-14: Closing the Gap Between Probabilities and Decisions in Temporal Detection and Classification
Thu, 12 May, 21:00 - 21:45 Singapore Time (UTC +8)
Thu, 12 May, 15:00 - 15:45 France Time (UTC +2)
Thu, 12 May, 13:00 - 13:45 UTC
Thu, 12 May, 09:00 - 09:45 New York Time (UTC -4)
Location: Gather Area P
Presented by: Dr Çağdaş Bilen, Audio Analytic Ltd

Sound recognition is a prominent field of machine learning that has penetrated our everyday lives. It is already in active use in millions of smart homes, and on millions of smartphones and smart speakers.

Temporal event detection applications such as sound event detection (SED) or keyword spotting (KWS), often aim for low-latency and low-power operation in a broad range of constrained devices without compromising the quality of performance.

In these temporal event detection problems, models are often designed and optimized to estimate instantaneous probabilities and rely on ‘ad-hoc decision post-processing’ to determine the occurrence of events. Within the constraints of commercial deployment, the impact of post-processing on the final performance of the product can be significant, sometimes reducing errors by orders of magnitude. However, these constraints are often ignored in academic challenges and publications, therefore the focus is mainly on improving the performance of the ML models. Furthermore, both in academia and industry, the design and optimization of the ML models often disregards the effect of such decision post-processing, hence potentially leading to suboptimal performance and products.

In this talk, I aim to demonstrate the importance of decision post-processing in temporal ML problems and help to bring it to the attention of the broad research community so that more optimized solutions can be realized.


Dr Çağdaş Bilen gained his PhD from NYU Tandon School of Engineering and went on to do postdoctoral work at Strasbourg University, Technicolor and INRIA. He has also worked in other research labs such as AT&T (Bell) Labs and HP Labs before joining Audio Analytic in 2018.

He has a keen interest in a greater sense of hearing.

Dr Bilen has authored articles in highly respected international journals and conferences and holds numerous patents on the topics of audio and multimedia signal representation, estimation and modelling. These include topics such as audio inverse problems (audio inpainting, source separation and audio compression) using nonnegative matrix factorization and on fast image search algorithms with sparsity and deep learning.

“My role at Audio Analytic allows me the opportunity to apply my passion for signal processing and machine learning and to explore how a greater sense of hearing can re-shape the way that humans and machines interact.”

Çağdaş leads Audio Analytic’s respected research team in developing core technologies and tools that can further advance the field of machine listening. This cutting-edge work has led to a number of significant technical breakthroughs and patents, such as loss function frameworks, post-biasing technology, a powerful temporal decision engine, and an approach to model evaluation called Polyphonic Sound Detection Score (PSDS) which has been adopted as an industry-standard metric by the DCASE Challenge.