Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning

Aditya Arie Nugraha; Diego Di Carlo; Yoshiaki Bando; Mathieu Fontaine; Kazuyoshi Yoshii

Communication Dans Un Congrès Année : 2023

Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning

(1) , (1) , (2) , (3) , (4)

1
2
3
4

Aditya Arie Nugraha

Fonction : Auteur

RIKEN Center for Advanced Intelligence Project [Tokyo]

Diego Di Carlo

Fonction : Auteur

RIKEN Center for Advanced Intelligence Project [Tokyo]

Yoshiaki Bando

Fonction : Auteur

National Institute of Advanced Industrial Science and Technology

Mathieu Fontaine

Fonction : Auteur

Laboratoire Traitement et Communication de l'Information

Kazuyoshi Yoshii

Fonction : Auteur

Kyoto University

Résumé

This paper revisits single-channel audio source separation based on a probabilistic generative model of a mixture signal defined in the continuous time domain. We assume that each source signal follows a non-stationary Gaussian process (GP), i.e., any finite set of sampled points follows a zero-mean multivariate Gaussian distribution whose covariance matrix is governed by a kernel function over time-varying latent variables. The mixture signal composed of such source signals thus follows a GP whose covariance matrix is given by the sum of the source covariance matrices. To estimate the latent variables from the mixture signal, we use a deep neural network with an encoder-separator-decoder architecture (e.g., Conv-TasNet) that separates the latent variables in a pseudo-time-frequency space. The key feature of our method is to feed the latent variables into the kernel function for estimating the source covariance matrices, instead of using the decoder for directly estimating the time-domain source signals. This enables the decomposition of a mixture signal into the source signals with a classical yet powerful Wiener filter that considers the full covariance structure over all samples. The kernel function and the network are trained jointly in the maximum likelihood framework. Comparative experiments using two-speech mixtures under clean, noisy, and noisy-reverberant conditions from the WSJ0-2mix, WHAM!, and WHAMR! benchmark datasets demonstrated that the proposed method performed well and outperformed the baseline method under noisy and noisy-reverberant conditions.

Mots clés

Time-domain audio source separation Gaussian processes deep kernel learning Time-domain audio source separation Gaussian processes deep kernel learning

Domaines

Traitement du signal et de l'image [eess.SP] Machine Learning [stat.ML]

Fichier principal

_WASPAA_23__Time_Domain_Audio_Source_Separation_Based_on_Gaussian_Processes_with_Deep_Kernel_Learning-1.pdf (922.8 Ko)

Origine	Fichiers produits par l'(les) auteur(s)

Mathieu Fontaine : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04172863

Soumis le : vendredi 28 juillet 2023-09:52:29

Dernière modification le : jeudi 11 juillet 2024-14:32:03

Archivage à long terme le : dimanche 29 octobre 2023-18:10:49

Dates et versions

hal-04172863 , version 1 (28-07-2023)

Identifiants

HAL Id : hal-04172863 , version 1

Citer

Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii. Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning. WASPAA, Oct 2023, New Paltz, France. ⟨hal-04172863⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM LTCI IDS S2A IP_PARIS ANR

175 Consultations

180 Téléchargements

Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager