SOURCE CODING
OF SPEECH FOR WIRELESS
COMMUNICATION
INTRODUCTION
Speech based wireless communication
systems can be analogue or
digital based. The digital type
is now the far more common … as
digital communication systems
have many advantages over
analogue systems.
Speech coding generally refers to
coding the analogue signal into a
digital signal (ADC). This is
conducted at the transmitter. The
decoding (DAC) is conducted at
the receiver. The coding/decoding
system is often referred to as a
codec.
Speech coding also refers to `coding’
the bits of a digital speech
signal….again, this is conducted
at the transmitter, while the
equivalent decoding is carried
out at the receiver.
The main goal of this form of
speech coding is to reduce the
number of bits required to
represent the speech signal….speech
compression. This leads to a
reduction in the required transmitted
bit rate Rb , and
subsequently to a smaller required channel
bandwidth BC (approx. equal
to Rb).
Generally there is a trade-off between a lower
bit rate and coding
accuracy (how accurately the
coded signal represents the original
speech signal). Other factors
that need to be considered are the
computational complexity of the
speech coder and the processing
delay.
SPEECH CODERS TYPES
Speech coders fall into two main
categories
(i) waveform coders
(ii) source coders.
Waveform coders:
These coders aim at representing
the time
domain waveform of the speech
signal….that is the aim is to have
the reconstructed speech signal
time waveform very similar to the
original speech signal time
waveform.
These coders are relatively insensitive
to the characteristics of the
particular speech signal and to
noisy environments. Also, they
have a relatively low computational
complexity, but they provide
only a relatively small reduction
in the bit rate.
Source coders:
These coders aim to have the
reconstructed speech
signal ‘sound’ very similar to
the original speech signal. These
coders are based on representing
the speech signal using dynamic
models. These dynamic models are
based on assumptions on the
speech signals. For this reason
they are relatively sensitive to the
characteristics of the particular
speech signal. Also, they have a
relatively high computational
complexity, but they do provide a
relatively large reduction
in the bit rate.
We focus on waveform coders.
Time Domain Waveform Coders
We begin considering the most basic
coder … Pulse code
modulation (PCM) coder, which is
essentially an analogue to
digital conversion (ADC)
device.Sampling the first step is to sample the analogue speech signal s(t),
such that
the sampled signal accurately
represents the analogue signal … or
alternatively, such that the
analogue signal can be reconstructed
from the sampled signal.
The Nyquist theorem states
that a bandlimited analogue signal can
be reconstructed from its
uniformly sampled equivalent if the
sampling interval T0 <= 1/(2B)
where B=bandwidth of analgue
signal…alternatively, if the
sampling rate 1/T0=fs>=2B.
Furthermore, reconstruction of the
analogue signal is achieved
simply by passing the sampled
signal through a (ideal) LPF with
bandwidth B=fs/2 .
If the Nyquist sampling criterion is
not satisfied, then the
reconstructed signal will show
distortion … known as aliasing
distortion.
The typical speech signal has negligible
signal power outside of
the frequency range
300Hz<f<3.4kHz. Based on this, PCM coders
commonly use a sampling frequency
of fs=8kHz.
Note:
In practice,
the
speech signal is first passed through a LPF to ensure signal
power
above 4kHz is negligible.
Quantisation
The second step is to quantise or
discretise each sample value. The
discrete valued quantised samples
then have values which belong
to a finite alphabet (as
opposed to the infinite alphabet of the
continuously valued non-quantised
samples). This is required if we
ultimately want to represent the
sample with a finite number of
bits.
Unlike the sampling operation,
quantisation causes signal
distortion. Often ,however, this
distortion is referred to as
quantisation noise.
If uniform quantisation is used
(the quantisation levels are
uniformly spaced) then, ignoring
all other forms of distortion and
noise, the signal-to-quantisation
noise ratio (SQNR) is
SQNR=bM2
Where M=number of quantisation
levels
b=3, if considering peak signal power to
average quantisation noise
power
b=1, if considering average signal power to
average quantisation
noise power.
Bit
encoding
The 3rd step is
conversion of each quantised sample value into a
sequence of bits. Assuming we use
the same number `n’ of bits for
each of the M quantisation
levels, then we require
n=log2M … M=2n.
It follows that,
SQNR(dB)PEAK = 6.02n +4.77
SQNR(dB)AVE = 6.02n
… each additional bit used in bit
encoding leads to a 6dB
improvement in SQNR. However,
this comes at the cost of a
higher bit rate.
Non-uniform
quantization
The mean/average quantisation noise
power is given by
NQ=int{x=-infinity _ +infinity} [x-fQ(x)]2p(x) dx
Where x=non-quantised sample
value, fQ(x)=quantised
sample
value, p(x)=pdf of x.
It follows that NQ is reduced if we
use finer quantisation level
spacing at values of x for which
p(x) is larger. This is the basis of
non-uniform quantisation.
Note: Uniform quantisation is optimal if p(x) is
flat, that is x is
uniformly
distributed.
The (non-quantised) samples of a
sampled speech signal have a
greater probability of having a
`small’ amplitude than a `large’.
With the aim of reducing the
quantisation noise power (for a given
number M of quantisation levels …
or alternatively with the aim of
reducing M for a given
quantisation noise power) speech coding
systems typically use a
logarithmically based non-uniform
quantiser … more finely spaced
quantisation levels at lower
amplitude values.
In practice, non-uniform quantisation is
achieved via (i) passing
the analogue signal through a compressor
… which nonuniformly
compresses the dynamic range of
the signal or flattens out the
signal pdf p(x); then (ii)
passing the resulting analogue signal
through a uniform quantising
coder.
The action of the compressor is
reversed at the receiver by an
expandor
Two popular compressor systems are m -law (used in
Australia,
USA, Canada and Japan) and A-law
(used in Europe)
m-law: |vOUT |= ln(1+m |vIN|)/ln(1+m) where |vIN| <=1
A-law: |vOUT| = A|vIN|/(1+lnA),
0<=|vIN|<=1/A
= [1+ln(A|vIN|)]/(1+lnA),
1/A<=|vIN|<=1.
Note: These formulae assume vIN has been
normalized to |vIN|<=1.
Most common are m =255 and A=87.6.
The above approach to converting an
analogue signal into a bit
stream is known as Pulse Code
Modulation (PCM)
Example: Speech signal is
converted into a PCM signal using a
sampling rate of fs=8kHz and M=256
quantisation levels. It
follows that number of bits per
sample is n=log2256=8, and that
the bit rate
is Rb=64kbps. This is
the bit rate commonly used for standard
PCM speech signals.
Reducing the bit
rate of PCM based signals
Let Ds(k)=s(k)-s(k-1)
be the kth sample difference, where
s(k)=current sample value. It
follows that
s(k)=s(0)+S{i=1k} Ds
(i) Thus any sample of a sampled signal can
be reconstructed from the
sum of the past sample
differences.
Adjacent samples of a speech waveform
are often highly
correlated. That is, on average, Ds(k) has a
significantly smaller
dynamic range than s(k).
Based on the above, a reduction in the
transmitted bit rate can be
achieved by transmitting
quantised Ds(k)
sample values….since
fewer quantisation levels are
required for Ds(k) than s(k).
A further improvement (smaller bit
rate) can be obtained if we use
a pre-determined model of the
speech signal to predict the current
value s(k) from previous values
s(k-1), s(k-2),…
ss(k)=a1.s(k-1)+a2.s(k-2)+…ap.s(k-p),
where {ai}=predetermined
set of constants (typically
determined via linear
prediction techniques).
Then we transmit a quantised version of the
difference between the
predicted value and the actual
value:
Dss(k)=s(k)-ss(k).
Differential pulse code modulation (DPCM).
As long as the set of coefficients
{ai} are available at the receiver, then a quantised version of s(k) can be
reconstructed at the
receiver. Adaptive DPCM adaptively updates the
coefficient set {ai} to
ensure the model suitably
describes the current speech signal.
(Adaptive) Delta modulation is a
one bit version of (Adaptive)
DPCM.
Frequency Domain
waveform coding
The two most common types are
(i) subband coding
(SBC) and
(ii)block transform
coding
Subband coding
It
involves passing the speech signal through a
parallel set of L bandpass
filters (essentially non-overlapping in
frequency). The result is a set
of L subband speech signals. Each of
these signals is then passed
through a PCM device, each with its
own number of quantisation
levels. The lower the power in a
particular subband then the
smaller the number of quantisation
levels.
Block
transform coding
It involves passing time-windowed
segments
of the sampled speech signal
through a transform device. The
output coefficients are then each
separately quantised and bit
encoded. The smaller coefficients
are coursely quantised
(essentially ignored), leading to
a reduction in the total number of
bits required to represent the
particular windowed segment.
No comments:
Post a Comment