Twilight Sparkle’s Voice Compromised
Brony AI seizes cartoon vocal chords (via @gwern)
The Pony Preservation Project undertakes to model (with machine learning) the voices of the My Little Pony: Friendship is Magic characters, thus granting them immortality. And, for Twilight Sparkle, the decorum of a sailor.
I don’t know if linking to 4chan is considered bad form - Gwern did the footwork on this, though, so who am I to say? Audio deepfakes, but for cartoon ponies. I’m just going to yank the text from 4chan, since I never know when these pages will disappear.
Pay particular notice to the Google Doc below - it contains rough instructions for training. You need a transcript for each audio clip that you’re processing, so a long-running series like Friendship is Magic is helpful, as you have a wide-ranging corpus to begin with. Background noise also needs to be removed from clips, there is a ‘sorting’ process - which also involves assigning ‘moods’ it seems - and there is also some reference to using Praat, which is used to annotate the files, identifying specific sounds.[1]
TwAIlight welcomes you to the Pony Voice Preservation Project!
https://clyp.it/qrnafm4yThis project is the first part of the “Pony Preservation Project” dealing with the voice. It’s dedicated to save our beloved pony’s voices by creating a neural network based Text To Speech for our favorite ponies. Videos such as “Steamed Hams But It’s Trump & Obama” or “RealTalk Joe Rogan” have proven that we now have the technology to generate convincing voices using machine learning algorithms “trained” on nothing but clean audio clips. With roughly 10 seasons (8 soon to be 9 seasons and 5 movies) worth of voice lines available, we have more than enough material to apply this tech for our deviant needs.
Any anon is free to join, and many are already contributing. Just read the guide to learn how you can help bring on the wAIfu revolution. Whatever your technical level, you can help. Document: docs.google.com
We now have a working TwAIlight that any Anon can play with: Instructions
>Active Tasks
Create a dataset for speech synthesis (https://youtu.be/KmpXyBbOObM)
Test some AI program with the current dataset
Research AI (read papers and find open source projects)
Track down remaining English/Foreign dubs that are missing
Evaluate cleaned audio samples
Phonetic dictionary/tagging
AI Training/Interface>Latest Developments
https://clyp.it/xp4q1bru [Yay!]
Anons are investigating Deepvoice3, Tacotron2 with GSTs, SV2TTS, and Mellotron
New tool to test audio clips
New “special source” audio
Several new AInons>Voice samples (So far)
https://clyp.it/2pb4bp05
https://clyp.it/s0klxftk
https://clyp.it/samzm4sk
https://pastebin.com/JUpDRsiw>Clipper Anon’s Master File:
https://mega.nz/#F!L952DI4Q!nibaVrvxbwgCgXMlPHVnVw>Synthbot’s Torrent Resources
In the doc at end of resources.
Gwern also found a larger directory of clips, same voice.
Predictions:
- Fanfic will gain a serious boost when AI-generated voices can simply be fed scripts to generate audiobooks.
- Couple this with animation networks (also via gwern) and The Simpsons may never need to end.
- The power of the novel in previous generations was due to the fact that a single writer could produce one without relying on anyone else - finding collaborators in close proximity is a luxury some don’t have. This technology could make cartoons and film largely the domain of lone writers with no staff.
- It will be a long time before this ever catches up to human voices and hand-drawn frames. In fact, this could increase the value of those artworks. (In the way that algorithms have really helped us see the value of human curation.)
- Someone who is able to use the tech with a clever flair will have an edge. (As has been the case with CGI.)
I’m still not too hyped by machine learning, though. It seems pretty weak given the empire frothing around it. But these small iterations are cool. And you have to love when it comes out of a random subculture rather than the military. Who can’t respect this kind of insanely determined fandom? Impressive work for one week.
A good start on this is “Analyze Your Voice” video by Prof Merryman. ↩︎