| MOS Score | Speech Quality Description | Speech Similarity Description |
| 1 | Not understandable at all | Definitely not the same person, even the gender is different |
| 2 | Some words are unclear and has pronunciation issues | Low chance of being the same person: There is much difference |
| 3 | Generally understandable and acceptable but the rhythmic pause is not good enough. | High chance of being the same person: There is slight similarity. |
| 4 | Natural, clear, and understandable. | Sounds like the same person, but tone and speaking style don't match |
| 5 | Broadcasting level: Unable to distinguish between human voice and synthesized voice | Definitely sounds like the same person: Tone and speaking style match |
Cloned Audio Samples
NOTE: For each speaker, evaluate the quality and similarity of the cloned audio samples.
Speaker 1
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 2
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 3
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 4
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 5
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 6
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 7
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 8
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 9
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text
Speaker 10
Original Audio
Cloning with Same Input
Cloning with Different Input
Short Text
Medium Text
Long Text