| The voice from the PCM
interface (that is, the input device) goes to the Echo Canceller. You need
an echo canceller to cut out the feedback of your own voice in the
earphone/headset. A VAD (Voice Activity Detector) monitors the input
device for voice so that the process of sending it over the network occurs
only in case of voice activity at the mouthpiece. This can be compared to
something called VOX used in lots of handheld tape recorders.
A Tone/DTMF Receiver at the user's end checks if the ‘connection’
is alive. The generator for these signals lies at the receiver end. If the
connection is alive, the voice signal from the PCM interface is encoded by
the Voice Encoder and sent over to the host interface that is the gateway
to the carrier like TCP/IP. The Voice Decoder will decode this signal at
the receiver end. Now comes an interesting part. When we talk, about 60
percent of the time we are silent. This refers to the breath pauses that
we have between speaking words. So if we went about encoding, both voice
and silence, we would waste enormous bandwidth. Here comes in the Comfort
Noise Encoder. The device at the user’s end detects these pauses and ‘tells’
the Comfort Noise Generator at the receiver's end to generate 'noise'.
This gives the feeling of the conversation being 'live'.
If you are using the same setup for fax also, a Classifier comes into
the picture, which separates voice and fax signals for further processing
at the sender's end. An optional component called the modem/ fax
demodulator and remodulator is also added to the setup to process fax
data.
Ashish Sharma |