Sunday, July 05, 2009  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | CD Search | Sitemap | Advanced Search

• Ad :- Enterprise Connect Awards 09: Nominations Open • Ad: Force.com Cloud Developer Challenge: Participate to win Apple MacBook
   
 Home > Developer > Voice @ Work

How Text-to-Speech Works

The text that you type is matched against acoustic data to generate phenomes. These are  converted to speech wave forms that your PC speaks out

Anil Chopra

Friday, April 12, 2002

Text-to-speech (TTS) is an area of major development work. This technology is meant to read out any kind of text using several techniques. A rule-based technique does the text to speech modeling based on a set of rules. These are derived from phonetic theories and acoustic analysis of the text data. This technique is, however, highly system dependant and thrives on the system architecture designed for it. It’s not something that can be replicated by others. The other procedure is known as the corpus-based approach, and can be replicated easily because of its structure. It has fixed data sets containing acoustic-phonetic labels and syntactic bracketing, which form the foundation for the system.

The basic challenge of speech synthesis from text is to produce natural and pleasant sound with correct pronunciation. So the input to the speech-synthesis engine in such a case would be a string of phonemes along with insertions of necessary accents and pauses. Some transformations would be applied to this to obtain the acoustical transcript. These include models to generate the fundamental frequency and duration of each speech segment. The last step is synthesis of the speech waveform using the parameters generated in the earlier stage. Three types of speech synthesizers are used: articulatory, format and concatenative synthesizers. 

Tremendous research is being conducted on speech synthesis, and the challenge is to convert the text to speech and make it sound as natural as possible. Plus, of course, it also has to conform to the geographic location it’s catering to. For instance, speech synthesis with a French accent wouldn’t be suitable in India. 

Hear it on your Mac

Sound on the Mac has three different layers. The first is the hardware API on top of which you have the Sound Manager and above that come the Speech Manager and QuickTime API. The Sound Manager allows applications to interact with the audio hardware on your machine. The Speech Manager is the programming interface that deals with synthesized speech and allows applications to communicate with the actual speech synthesizer. This is how it works.

An application passes a string of text to the Speech Manager that in turn sends it to the speech synthesizer (a code that sits in your system resources). The speech synthesizer contains dictionaries and punctuation rules to read the text. It also determines the type of voices that are used to read aloud the text string. 

The speech synthesizer is in communication with the Sound Manager and hence the audio hardware further in the line. As for the reading pitch and speeds, the Speech Manager has the subroutines to control this. 

Next Page :

Software: Cool App

Page(s)   1  2  



Untitled Document



Innovation, Winning the future with ZTE


Reduce your TCO now with INGRES


   
 


 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest | Advertising - Print | jobs@cybermedia