Electrical,Electronics and Instrumentation Engineering: SOURCE CODING OF SPEECH FOR WIRELESS COMMUNICATION

SOURCE CODING OF SPEECH FOR WIRELESS COMMUNICATION

INTRODUCTION

Speech based wireless communication systems can be analogue or

digital based. The digital type is now the far more common … as

digital communication systems have many advantages over

analogue systems.

Speech coding generally refers to coding the analogue signal into a

digital signal (ADC). This is conducted at the transmitter. The

decoding (DAC) is conducted at the receiver. The coding/decoding

system is often referred to as a codec.

Speech coding also refers to `coding’ the bits of a digital speech

signal….again, this is conducted at the transmitter, while the

equivalent decoding is carried out at the receiver.

The main goal of this form of speech coding is to reduce the

number of bits required to represent the speech signal….speech

compression. This leads to a reduction in the required transmitted

bit rate Rb , and subsequently to a smaller required channel

bandwidth BC (approx. equal to Rb).

Generally there is a trade-off between a lower bit rate and coding

accuracy (how accurately the coded signal represents the original

speech signal). Other factors that need to be considered are the

computational complexity of the speech coder and the processing

delay.

SPEECH CODERS TYPES

Speech coders fall into two main categories

(i) waveform coders

(ii) source coders.

Waveform coders:

These coders aim at representing the time

domain waveform of the speech signal….that is the aim is to have

the reconstructed speech signal time waveform very similar to the

original speech signal time waveform.

These coders are relatively insensitive to the characteristics of the

particular speech signal and to noisy environments. Also, they

have a relatively low computational complexity, but they provide

only a relatively small reduction in the bit rate.

Source coders:

These coders aim to have the reconstructed speech

signal ‘sound’ very similar to the original speech signal. These

coders are based on representing the speech signal using dynamic

models. These dynamic models are based on assumptions on the

speech signals. For this reason they are relatively sensitive to the

characteristics of the particular speech signal. Also, they have a

relatively high computational complexity, but they do provide a

relatively large reduction in the bit rate.

We focus on waveform coders.

Time Domain Waveform Coders

We begin considering the most basic coder … Pulse code

modulation (PCM) coder, which is essentially an analogue to

digital conversion (ADC) device.Sampling the first step is to sample the analogue speech signal s(t), such that

the sampled signal accurately represents the analogue signal … or

alternatively, such that the analogue signal can be reconstructed

from the sampled signal.

The Nyquist theorem states that a bandlimited analogue signal can

be reconstructed from its uniformly sampled equivalent if the

sampling interval T0 <= 1/(2B) where B=bandwidth of analgue

signal…alternatively, if the sampling rate 1/T0=fs>=2B.

Furthermore, reconstruction of the analogue signal is achieved

simply by passing the sampled signal through a (ideal) LPF with

bandwidth B=fs/2 .

If the Nyquist sampling criterion is not satisfied, then the

reconstructed signal will show distortion … known as aliasing

distortion.

The typical speech signal has negligible signal power outside of

the frequency range 300Hz<f<3.4kHz. Based on this, PCM coders

commonly use a sampling frequency of fs=8kHz.

Note: In practice,

the speech signal is first passed through a LPF to ensure signal

power above 4kHz is negligible.

Quantisation

The second step is to quantise or discretise each sample value. The

discrete valued quantised samples then have values which belong

to a finite alphabet (as opposed to the infinite alphabet of the

continuously valued non-quantised samples). This is required if we

ultimately want to represent the sample with a finite number of

bits.

Unlike the sampling operation, quantisation causes signal

distortion. Often ,however, this distortion is referred to as

quantisation noise.

If uniform quantisation is used (the quantisation levels are

uniformly spaced) then, ignoring all other forms of distortion and

noise, the signal-to-quantisation noise ratio (SQNR) is

SQNR=bM2

Where M=number of quantisation levels

b=3, if considering peak signal power to average quantisation noise

power

b=1, if considering average signal power to average quantisation

noise power.

Bit encoding

The 3rd step is conversion of each quantised sample value into a

sequence of bits. Assuming we use the same number `n’ of bits for

each of the M quantisation levels, then we require

n=log2M … M=2n.

It follows that,

SQNR(dB)PEAK = 6.02n +4.77

SQNR(dB)AVE = 6.02n

… each additional bit used in bit encoding leads to a 6dB

improvement in SQNR. However, this comes at the cost of a

higher bit rate.

Non-uniform quantization

The mean/average quantisation noise power is given by

NQ=int{x=-infinity _ +infinity} [x-fQ(x)]2p(x) dx

Where x=non-quantised sample value, fQ(x)=quantised sample

value, p(x)=pdf of x.

It follows that NQ is reduced if we use finer quantisation level

spacing at values of x for which p(x) is larger. This is the basis of

non-uniform quantisation.

Note: Uniform quantisation is optimal if p(x) is flat, that is x is

uniformly distributed.

The (non-quantised) samples of a sampled speech signal have a

greater probability of having a `small’ amplitude than a `large’.

With the aim of reducing the quantisation noise power (for a given

number M of quantisation levels … or alternatively with the aim of

reducing M for a given quantisation noise power) speech coding

systems typically use a logarithmically based non-uniform

quantiser … more finely spaced quantisation levels at lower

amplitude values.

In practice, non-uniform quantisation is achieved via (i) passing

the analogue signal through a compressor … which nonuniformly

compresses the dynamic range of the signal or flattens out the

signal pdf p(x); then (ii) passing the resulting analogue signal

through a uniform quantising coder.

The action of the compressor is reversed at the receiver by an

expandor

Two popular compressor systems are m -law (used in Australia,

USA, Canada and Japan) and A-law (used in Europe)

m-law: |vOUT |= ln(1+m |vIN|)/ln(1+m) where |vIN| <=1

A-law: |vOUT| = A|vIN|/(1+lnA), 0<=|vIN|<=1/A

= [1+ln(A|vIN|)]/(1+lnA), 1/A<=|vIN|<=1.

Note: These formulae assume vIN has been normalized to |vIN|<=1.

Most common are m =255 and A=87.6.

The above approach to converting an analogue signal into a bit

stream is known as Pulse Code Modulation (PCM)

Example: Speech signal is converted into a PCM signal using a

sampling rate of fs=8kHz and M=256 quantisation levels. It

follows that number of bits per sample is n=log2256=8, and that

the bit rate

is Rb=64kbps. This is the bit rate commonly used for standard

PCM speech signals.

Reducing the bit rate of PCM based signals

Let Ds(k)=s(k)-s(k-1) be the kth sample difference, where

s(k)=current sample value. It follows that

s(k)=s(0)+S{i=1k} Ds

(i) Thus any sample of a sampled signal can be reconstructed from the

sum of the past sample differences.

Adjacent samples of a speech waveform are often highly

correlated. That is, on average, Ds(k) has a significantly smaller

dynamic range than s(k).

Based on the above, a reduction in the transmitted bit rate can be

achieved by transmitting quantised Ds(k) sample values….since

fewer quantisation levels are required for Ds(k) than s(k).

A further improvement (smaller bit rate) can be obtained if we use

a pre-determined model of the speech signal to predict the current

value s(k) from previous values s(k-1), s(k-2),…

ss(k)=a1.s(k-1)+a2.s(k-2)+…ap.s(k-p), where {ai}=predetermined

set of constants (typically determined via linear

prediction techniques).

Then we transmit a quantised version of the difference between the

predicted value and the actual value:

Dss(k)=s(k)-ss(k).

Differential pulse code modulation (DPCM).

As long as the set of coefficients {ai} are available at the receiver, then a quantised version of s(k) can be reconstructed at the

receiver. Adaptive DPCM adaptively updates the coefficient set {ai} to

ensure the model suitably describes the current speech signal.

(Adaptive) Delta modulation is a one bit version of (Adaptive)

DPCM.

Frequency Domain waveform coding

The two most common types are

(i) subband coding (SBC) and

(ii)block transform coding

Subband coding

It involves passing the speech signal through a

parallel set of L bandpass filters (essentially non-overlapping in

frequency). The result is a set of L subband speech signals. Each of

these signals is then passed through a PCM device, each with its

own number of quantisation levels. The lower the power in a

particular subband then the smaller the number of quantisation

levels.

Block transform coding

It involves passing time-windowed segments

of the sampled speech signal through a transform device. The

output coefficients are then each separately quantised and bit

encoded. The smaller coefficients are coursely quantised

(essentially ignored), leading to a reduction in the total number of

bits required to represent the particular windowed segment.

Electrical,Electronics and Instrumentation Engineering

Sunday, June 17, 2012

SOURCE CODING OF SPEECH FOR WIRELESS COMMUNICATION

No comments:

Post a Comment