Welcome
VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).
We will make available all submitted audio files under the GPL license, and then 'compile' them into acoustic models for use with Open Source speech recognition engines such as Sphinx, ISIP, Julius and HTK (note: HTK has distribution restrictions).
Why Do We Need Free GPL Speech Audio?
Most acoustic models used by 'Open Source' speech recognition (or Speech-to-Text) engines are 'Closed Source'. They do not give you access to the speech audio and transcriptions (i.e. the speech corpus) used to create the acoustic model.
The reason for this is that Free and Open Source ('FOSS') projects are
required to purchase large speech
corpora with restrictive licensing. Although there are a
few instances of small FOSS speech corpora that could be used to
create acoustic models, the vast majority of corpora (especially
large corpora best suited to building good acoustic models) must be
purchased under restrictive licenses.
How Can You Help?
Record yourself reading some text, and upload your recordings to VoxForge using one of the following approaches:
- your computer (using a Java applet which provides you with a list of prompts to read, and a "one-click" uploader; mirrors);
- your telephone (free long-distance telephone service providers).
- Record an AudioBook chapter with the LibriVox project and then submit it to VoxForge (in an uncompressed audio format);
- Record a short passage of poetry or prose and submit it to MojoMove411.com;
- Voice2Type (commercial initiative to collect telephony speech, with collected speech to be donated to VoxForge under GPL);


