Post-Training Latent Dimension Reduction in Neural Audio Coding - ARCHITECTURE
Pré-Publication, Document De Travail (Preprint/Prepublication) Année : 2024

Post-Training Latent Dimension Reduction in Neural Audio Coding

Résumé

This work addresses the problem of latent space quantization in neural audio coding. A covariance analysis of latent space is performed on several pre-trained audio coding models (Lyra V2, EnCodec, AudioDec). It is proposed to truncate latent space dimension using a fixed linear transform. The Karhunen-Lo`eve transform (KLT) is applied on learned residual vector quantization (RVQ) codebooks. The proposed method is applied in a backward-compatible way to EnCodec, and we show that quantization complexity and codebook storage are reduced (by 43.4%), with no noticeable difference in subjective AB tests.
Fichier principal
Vignette du fichier
EUSIPCO_2024_V3.pdf (555.1 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04488929 , version 1 (04-03-2024)

Identifiants

  • HAL Id : hal-04488929 , version 1

Citer

Thomas Muller, Stéphane Ragot, Pierrick Philippe, Pascal Scalart. Post-Training Latent Dimension Reduction in Neural Audio Coding. 2024. ⟨hal-04488929⟩
244 Consultations
218 Téléchargements

Partager

More