Speech synthesis Tutorial

VALL-E Family

VALL-E 2 is the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Building upon the ...

Gemini Voice Brings Fast Multi-Speaker Audio, Rich Styles and 32k Context Window

Built on Gemini 2.5 Flash and Pro with a 32,000-token context window, you get faster results and precise delivery for ...

Microsoft

Autoregressive Speech Synthesis without Vector Quantization

We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from ...

IEEE

UnitDiff: A Unit-Diffusion Model for Code-Switching Speech Synthesis

Abstract: Given the scarcity of Code-Switching (CS) datasets, most researchers synthesize CS speech using multiple monolingual datasets. However, this approach presents challenges in synthesizing CS ...

IEEE

Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks

Abstract: Articulatory copy synthesis (ACS) refers to the synthetic reproduction of natural utterances. The existing methods of ACS have the limitations of poor generalizability for unknown speakers, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results