Voice Cloning MOS Evaluation

MOS Score Speech Quality Description Speech Similarity Description
1 Not understandable at all Definitely not the same person, even the gender is different
2 Some words are unclear and has pronunciation issues Low chance of being the same person: There is much difference
3 Generally understandable and acceptable but the rhythmic pause is not good enough. High chance of being the same person: There is slight similarity.
4 Natural, clear, and understandable. Sounds like the same person, but tone and speaking style don't match
5 Broadcasting level: Unable to distinguish between human voice and synthesized voice Definitely sounds like the same person: Tone and speaking style match

Cloned Audio Samples

NOTE: For each speaker, evaluate the quality and similarity of the cloned audio samples.

Speaker 1

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 2

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 3

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 4

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 5

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 6

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 7

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 8

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 9

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text

Speaker 10

Original Audio

Cloning with Same Input

Cloning with Different Input

Short Text
Medium Text
Long Text