Persian Speech Corpus

This ~2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in a Tehrani accent using a professional studio. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

It is released here under the non-commercial creative commons license specified below. In case further rights are required, or you require consultancy for building Persian speech corpora, please contact Nawar Halabi by email. Thank you for your interest.

Download Corpus Package

Hosting this corpus costs time and money. If you want to support us please click the button below :)

399 .wav files containing spoken utterances.
399 .lab files containing phonetic utterances.
399 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. These files can be opened using Praat software.
aligned.mlf which contains the HTS friendly alignments.
orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line.

More documentation will be added in the future. Please refer to Nawar Halabi's PhD Thesis for more details. Feel free to visit the Persian Speech Corpus Wikipedia page for more information about the corpus.

It is important to note that the project was funded by MicroLinkPC, Southampton, an assistive technology provider in the UK. MircoLinkPC is the entity responsible for giving more permissive licenses after agreement. Please get in touch with Nawar Halabi or MicroLinkPC for commercial licensing.

Persian Speech Corpus by Nawar Halabi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at www.persianspeechcorpus.com.

Hosting this corpus costs time and money. If you want to support us please click the button below :)

Persian Speech Corpus

The package includes

Documentation

License

Help us keep the corpus free