What is Emotion transfer?

Overview

Emotion Transfer is Dubformer’s in-house technology for generating speech with performance transferred from the original audio.

It helps reproduce the original delivery — including intonation, intensity, rhythm, and timing — cue by cue.

Emotion Transfer supports two modes:

Mode	Description	Best for
Clones the original voice and recreates its performance.	Clones the original speaker’s voice and transfers the original performance to the generated audio.	Keeping the result close to the original speaker.
Recreates the original performance using any selected voice.	Transfers the original performance to the voice selected for the chunk.	Keeping the selected voice while matching the original delivery.

Emotion Transfer is especially useful for dialogue-heavy content, expressive speech, and videos with a wide range of intonations.

English (United Kingdom)

English (United States)

French

German

Brazilian Portuguese

Castilian Spanish

Latin American Spanish

Italian

Russian

How it works

The voices are created for each phrase individually based on the original phrases (1 or 2). The technology captures vocal features and emotional coloring of the original phrase and transfers them to the translated phrase.

By default, a voice for each phrase is created based on its respective original phrase. In some cases, the adjacent phrase is added as a second reference.

The language of synthesis is defined not by the phrase text but by the video’s target language. For example, if the video is being translated into French, the synthesis will assume that all the phrases in the script are in French and will pronounce them accordingly.