Sunday, June 17, 2012

SOURCE CODING OF SPEECH FOR WIRELESS COMMUNICATION




SOURCE CODING OF SPEECH FOR WIRELESS         COMMUNICATION

INTRODUCTION
          Speech based wireless communication systems can be analogue or
digital based. The digital type is now the far more common … as
digital communication systems have many advantages over
analogue systems.
         Speech coding generally refers to coding the analogue signal into a
digital signal (ADC). This is conducted at the transmitter. The
decoding (DAC) is conducted at the receiver. The coding/decoding
system is often referred to as a codec.
          Speech coding also refers to `coding’ the bits of a digital speech
signal….again, this is conducted at the transmitter, while the
equivalent decoding is carried out at the receiver.
The main goal of this form of speech coding is to reduce the
number of bits required to represent the speech signal….speech
compression. This leads to a reduction in the required transmitted
bit rate Rb , and subsequently to a smaller required channel
bandwidth BC (approx. equal to Rb).
         Generally there is a trade-off between a lower bit rate and coding
accuracy (how accurately the coded signal represents the original
speech signal). Other factors that need to be considered are the
computational complexity of the speech coder and the processing
delay.

SPEECH CODERS TYPES

            Speech coders fall into two main categories
                      (i) waveform coders
                      (ii) source coders.






Waveform coders:
               These coders aim at representing the time
domain waveform of the speech signal….that is the aim is to have
the reconstructed speech signal time waveform very similar to the
original speech signal time waveform.
            These coders are relatively insensitive to the characteristics of the
particular speech signal and to noisy environments. Also, they
have a relatively low computational complexity, but they provide
only a relatively small reduction in the bit rate.

Source coders:
               These coders aim to have the reconstructed speech
signal ‘sound’ very similar to the original speech signal. These
coders are based on representing the speech signal using dynamic
models. These dynamic models are based on assumptions on the
speech signals. For this reason they are relatively sensitive to the
characteristics of the particular speech signal. Also, they have a
relatively high computational complexity, but they do provide a
relatively large reduction in the bit rate.
We focus on waveform coders.

Time Domain Waveform Coders
            We begin considering the most basic coder … Pulse code
modulation (PCM) coder, which is essentially an analogue to
digital conversion (ADC) device.Sampling the first step is to sample the analogue speech signal s(t), such that
the sampled signal accurately represents the analogue signal … or
alternatively, such that the analogue signal can be reconstructed
from the sampled signal.

          
           The Nyquist theorem states that a bandlimited analogue signal can
be reconstructed from its uniformly sampled equivalent if the
sampling interval T0 <= 1/(2B) where B=bandwidth of analgue
signal…alternatively, if the sampling rate 1/T0=fs>=2B.
            Furthermore, reconstruction of the analogue signal is achieved
simply by passing the sampled signal through a (ideal) LPF with
bandwidth B=fs/2 .
           If the Nyquist sampling criterion is not satisfied, then the
reconstructed signal will show distortion … known as aliasing
distortion.
       The typical speech signal has negligible signal power outside of
the frequency range 300Hz<f<3.4kHz. Based on this, PCM coders
commonly use a sampling frequency of fs=8kHz.

Note: In practice,
the speech signal is first passed through a LPF to ensure signal
power above 4kHz is negligible.

Quantisation
          The second step is to quantise or discretise each sample value. The
discrete valued quantised samples then have values which belong
to a finite alphabet (as opposed to the infinite alphabet of the
continuously valued non-quantised samples). This is required if we
ultimately want to represent the sample with a finite number of
bits.
           Unlike the sampling operation, quantisation causes signal
distortion. Often ,however, this distortion is referred to as
quantisation noise.
            If uniform quantisation is used (the quantisation levels are
uniformly spaced) then, ignoring all other forms of distortion and
noise, the signal-to-quantisation noise ratio (SQNR) is
SQNR=bM2


        
              Where M=number of quantisation levels
b=3,  if considering peak signal power to average quantisation noise
power
b=1,  if considering average signal power to average quantisation
noise power.

Bit encoding
             The 3rd step is conversion of each quantised sample value into a
sequence of bits. Assuming we use the same number `n’ of bits for
each of the M quantisation levels, then we require
n=log2M … M=2n.
It follows that,
SQNR(dB)PEAK = 6.02n +4.77
SQNR(dB)AVE = 6.02n
… each additional bit used in bit encoding leads to a 6dB
improvement in SQNR. However, this comes at the cost of a
higher bit rate.

Non-uniform quantization

            The mean/average quantisation noise power is given by
NQ=int{x=-infinity _ +infinity} [x-fQ(x)]2p(x) dx
Where x=non-quantised sample value, fQ(x)=quantised sample
value, p(x)=pdf of x.

          It follows that NQ is reduced if we use finer quantisation level
spacing at values of x for which p(x) is larger. This is the basis of
non-uniform quantisation.

Note:  Uniform quantisation is optimal if p(x) is flat, that is x is
uniformly distributed.





              The (non-quantised) samples of a sampled speech signal have a
greater probability of having a `small’ amplitude than a `large’.
With the aim of reducing the quantisation noise power (for a given
number M of quantisation levels … or alternatively with the aim of
reducing M for a given quantisation noise power) speech coding
systems typically use a logarithmically based non-uniform
quantiser … more finely spaced quantisation levels at lower
amplitude values.
      In practice, non-uniform quantisation is achieved via (i) passing
the analogue signal through a compressor … which nonuniformly
compresses the dynamic range of the signal or flattens out the
signal pdf p(x); then (ii) passing the resulting analogue signal
through a uniform quantising coder.
The action of the compressor is reversed at the receiver by an
expandor
      Two popular compressor systems are m -law (used in Australia,
USA, Canada and Japan) and A-law (used in Europe)
m-law: |vOUT |= ln(1+m |vIN|)/ln(1+m) where |vIN| <=1
A-law: |vOUT| = A|vIN|/(1+lnA), 0<=|vIN|<=1/A
= [1+ln(A|vIN|)]/(1+lnA), 1/A<=|vIN|<=1.
Note: These formulae assume vIN has been normalized to |vIN|<=1.
Most common are m =255 and A=87.6.

          The above approach to converting an analogue signal into a bit
stream is known as Pulse Code Modulation (PCM)
Example: Speech signal is converted into a PCM signal using a
sampling rate of fs=8kHz and M=256 quantisation levels. It
follows that number of bits per sample is n=log2256=8, and that
the bit rate
is Rb=64kbps. This is the bit rate commonly used for standard
PCM speech signals.




Reducing the bit rate of PCM based signals

          Let Ds(k)=s(k)-s(k-1) be the kth sample difference, where
s(k)=current sample value. It follows that
s(k)=s(0)+S{i=1k} Ds
    (i) Thus any sample of a sampled signal can be reconstructed from the
sum of the past sample differences.
       Adjacent samples of a speech waveform are often highly
correlated. That is, on average, Ds(k) has a significantly smaller
dynamic range than s(k).
     Based on the above, a reduction in the transmitted bit rate can be
achieved by transmitting quantised Ds(k) sample values….since
fewer quantisation levels are required for Ds(k) than s(k).
        A further improvement (smaller bit rate) can be obtained if we use
a pre-determined model of the speech signal to predict the current
value s(k) from previous values s(k-1), s(k-2),…
ss(k)=a1.s(k-1)+a2.s(k-2)+…ap.s(k-p), where {ai}=predetermined
set of constants (typically determined via linear
prediction techniques).
    Then we transmit a quantised version of the difference between the
predicted value and the actual value:
Dss(k)=s(k)-ss(k).
        
 Differential pulse code modulation (DPCM).
            As long as the set of coefficients {ai} are available at the receiver, then a quantised version of s(k) can be reconstructed at the
receiver.  Adaptive DPCM adaptively updates the coefficient set {ai} to
ensure the model suitably describes the current speech signal.
(Adaptive) Delta modulation is a one bit version of (Adaptive)
DPCM.




Frequency Domain waveform coding
        The two most common types are
                          (i) subband coding (SBC) and
                            (ii)block transform coding
  Subband coding

            It  involves passing the speech signal through a
parallel set of L bandpass filters (essentially non-overlapping in
frequency). The result is a set of L subband speech signals. Each of
these signals is then passed through a PCM device, each with its
own number of quantisation levels. The lower the power in a
particular subband then the smaller the number of quantisation
levels.

Block transform coding
            
          It involves passing time-windowed segments
of the sampled speech signal through a transform device. The
output coefficients are then each separately quantised and bit
encoded. The smaller coefficients are coursely quantised
(essentially ignored), leading to a reduction in the total number of
bits required to represent the particular windowed segment.

No comments:

Post a Comment