
Sound is an analog
phenomenon. The changing air pressure that pushes and pulls on our eardrums
varies smoothly rather than jumping discretely from one pressure to another.
Most electrical audio signals are also analog; the voltage in a cable varies
smoothly in a way that mimics the changing air pressure of the sound represented
by the signal. In fact, the signal's changing voltage is analogous to the changing
pressure of the sound it represents, hence the term analog audio.
Recently, however,
the landscape has been altered somewhat. Audio signals are now commonly stored
and transmitted as digital information. This offers several advantages over
analog audio. For example, there is no loss of audio quality as you make copies
of the data. In addition, it is much easier to edit and assemble digital audio
information. Finally, there is virtually no tape noise when recording digital
audio.
Fortunately,
the same basic principles apply to all forms of digital-audio recording, storage,
and playback. This includes samplers, digital multitrack tape decks, DATs, hard-disk
recorders, and CDs. If this realm remains foreign to you, read on.
DIGITAL BASICS
Humans use ten
digits0 to 9to express all numbers; this is called the decimal number
system. The decimal system probably arose because we have ten fingers (which
are also called digits). To express numbers larger than 9, we combine two or
more digits. For example, with two decimal digits, we can express 100 numbers
from 0 to 99. With three decimal digits, we can express 1,000 numbers from 0
to 999.
Computers use
only two digits: 0 and 1. This is called the binary number system, and binary
digits are called bits (short for Binary digITS). Like humans, computers combine
two or more bits to express larger numbers. For example, with two bits, you
can express four numbers: 00, 01, 10, and 11. With three bits, you can express
eight numbers, from 000 to 111.
Are you starting
to see a pattern here? The pattern is this:
Number of numbers
you can express = 2(number of bits you combine)
So, if you have
eight bits, you can express 28 = 256 numbers; with sixteen bits, you can express
216 = 65,536 numbers.
Computers almost
universally combine eight bits into what is called a byte; a group of four bits
is half a byte, which is called a nibble. These days, most computers also work
with groups of bits called words.
A/D CONVERSION
The starting
point of most digital-audio systems is an analog audio signal from a microphone
or other analog source. (Some systems can generate digital audio from scratch
without an analog source, but I'm going to put this idea aside for now.) The
goal is to convert the analog audio signal into a series of discrete digital
numbers that a computer can deal with.
A sample-and-hold
circuit measures, or samples, the instantaneous voltage, or amplitude, of an
analog audio signal and holds that value until an analog-to-digital converter
(ADC) converts it into a binary number. The sample-and-hold circuit then reads
the next instantaneous amplitude and holds it for the ADC. This occurs many
times per second as the signal's alternating voltage rises and falls. As a result,
the smoothly varying analog waveform is converted into a series of "stair
steps".
In some systems,
the lowest possible instantaneous amplitude is represented by a string of zeros,
and the highest possible instantaneous amplitude is represented by a string
of ones. In other systems, a string of 0s represents the middle of the possible
amplitudes. Values with a zero as the first bit represent amplitudes above the
middle (positive), while values with a one as the first bit represent amplitudes
below the middle (negative). This is called two's-complement representation,
which allows for positive and negative numbers.
Stereo signals
are converted separately and then multiplexed, or combined, into a single stream
of binary numbers. The numbers representing the right and left channels are
interleaved, or alternated, in the stream.
The most common
technique for encoding each instantaneous amplitude is called pulse-code modulation
(PCM). Each bit is a code for an electrical or optical pulse; 1 = high-level
pulse, 0 = low-level pulse. For example, if an instantaneous amplitude is represented
by the binary number 1101, four pulses are sent: high, high, low, high. The
rate at which the measurements are taken and the number of bits used to represent
each measurement are the two most fundamental concepts in digital audio.
SAMPLING RATE
The rate at which
the instantaneous-amplitude measurements are taken is called the sampling rate,
and the time between measurements is called the sampling period. The more often
measurements are taken, the higher the frequency that can be accurately represented.
However, more measurements require more storage (which we'll discuss in more
detail shortly).
If the frequency
of the analog signal is low compared with the sampling rate, you get an accurate
representation of the signal. If the frequency of the signal is over half the
sampling rate, though, some weird things start to happen (more in a moment).
The frequency that corresponds to half the sampling rate is called the Nyquist
frequency after American engineer Harry Nyquist. For example, if the sampling
rate is 48 kHz (48,000 measurements per second), the Nyquist frequency is 24
kHz.
The Nyquist frequency
is the maximum frequency that the system can accurately represent and reproduce.
This is called the audio bandwidth of the system. For example, if the sampling
rate is 48 kHz, the system can represent and reproduce audio signals at frequencies
from 0 to 24 kHz. In other words, the audio bandwidth of the system is 24 kHz.
By contrast,
the digital bandwidth of the system is the maximum number of bits per second
it can transmit or receive. For example, if the maximum sampling rate is 48
kHz and each instantaneous-amplitude measurement is represented with sixteen
bits (more in a moment), the digital bandwidth is 48,000 x 16 = 768,000 bits
per second, or 768 kbps. In a stereo system, this digital bandwidth would double
to 1.536 megabits per second (Mbps).
When digitizing
a signal whose frequency is greater than the Nyquist frequency, you run into
a problem called aliasing. In this case, the measurements of instantaneous amplitude
don't accurately reflect the shape of the original signal's waveform. The measurements
are taken at disparate points along the waveform. When these measurements are
reconstructed into an analog signal, it has a lower frequency than the original.
(In fact, several alias signals appear above and below the original frequency.)
As a precaution
against aliasing, the input signal is sent through an antialiasing filter before
it reaches the sample-and-hold circuit. This lowpass filter blocks any frequencies
that are greater than the Nyquist frequency of the system while passing all
frequencies below the Nyquist limit. The slope of the filter is very steep,
which leads many people to call it a brickwall filter.
All CDs use one
sampling rate44.1 kHzwhich is also common among samplers, DATs,
hard-disk recorders, and digital multitracks. This rate was adopted as a standard
because its Nyquist frequency is 22.05 kHz, which is just above the top of the
human hearing range. As a result, all frequencies we can hear are accurately
represented. However, there is much debate in the audio industry about whether
or not overtones above 20 kHz make an audible contribution to the entire signal.
In fact, some DATs are now available with a sampling rate of 96 kHz to address
this issue.
Many professional
systems offer a sampling rate of 48 kHz in addition to 44.1 kHz. Multimedia
titles often use lower sampling rates of 11 kHz or 22 kHz to reduce storage
requirements. This yields lower audio quality, which isn't considered as critical
in this application because most computer audio-playback systems have relatively
low fidelity anyway.
In many samplers,
it's possible to use different sampling rates to conserve storage requirements.
For example, you might sample the lowest notes of a bass at 11 kHz; there are
probably no overtones above 5.5 kHz, so you don't lose anything by sampling
these notes at a lower rate. Higher notes can be sampled at 44.1 kHz and combined
with the low notes to form an entire sampled bass.
In some systems,
the input is sampled at a higher rate than will be used to reproduce the signal;
this is called oversampling. As you might imagine, this increases the Nyquist
frequency and reduces aliasing. After the signal has been sampled, a digital
filter removes any frequency components above the final Nyquist frequency, and
the data is output at the final sampling rate.
RESOLUTION
The number of
bits used to represent each instantaneous measurement is called the resolution
or word length. The greater the resolution, the more accurately each measurement
is represented. However, the more bits you use, the greater the storage requirements
(more in a moment). Until very recently, the most common resolution for digital
audio was 16 bits. However, many digital audio products use 18 bits, and some
professional systems use 20 or 24 bits, whereas multimedia titles often use
8 bits to conserve storage.
The resolution
determines the number of steps between the lowest and highest instantaneous
amplitude the system can represent. With 16-bit resolution, there are 65,536
steps between the lowest and highest amplitudes. This defines the dynamic range
of the system. Theoretically, the dynamic range of a 16-bit system is 98 dB,
but various factors reduce this figure to about 90 dB for practical purposes.
No matter how
many bits are used to represent each instantaneous measurement, the representation
is not always completely accurate. In most cases, the actual measurement value
must be rounded to the nearest binary number. This is called quantization, and
the difference between the actual measured amplitude and the quantized binary
representation is called quantization error.
Quantization
error can lead to audible quantization noise, which is particularly apparent
in signals of low amplitude because only a few bits are used to represent the
entire signal. As a result, you should try to keep the input signal's overall
amplitude as close as possible to the maximum level that the system can accommodate.
Optimizing the gain structure of your audio system can be a big help in this
regard (see "Recording Musician: Gain Stages" in the November 1993
EM.)
However, you
must be careful not to exceed the system's maximum signal level. If the instantaneous
amplitude of the input signal rises above the highest point that can be represented
by the binary numbers, the signal will be clipped (i.e., the top of the waveform
will be chopped off, forming a horizontal line). This makes a very unpleasant
noise. Unlike analog recorders, the input-signal level must not exceed 0 on
the VU meter in order to avoid clipping. Some digital recorders actually calibrate
the 0 VU point a few dB below the actual clipping point so users can exceed
this level without clipping as if they were using an analog recorder.
The most common
solution to quantization noise is called dithering. In this process, a small
amount of noise is added to the input signal before it is measured and quantized.
This randomizes the quantization error, reducing its audible effect. For this
reason, it is particularly important to apply dithering to minimize audible
artifacts that arise when the resolution of a digital-audio signal is reduced,
which is a common procedure in multimedia titles.
STORAGE
Once the signal
has been digitized into a stream of binary numbers, it is stored in one medium
or another. Common media include magnetic tape or disk, optical disc, RAM, and
ROM. At a sampling rate of 44.1 kHz and a resolution of sixteen bits, digital
audio data consumes over 5 MB per minute for a monaural file or 10 MB per minute
for a stereo file. Digital-audio data stored in this manner is referred to as
being linear.
To reduce storage
requirements, you can reduce the sampling rate and/or resolution, but this also
reduces audio quality. Another option is called compression, which is often
used in multimedia titles. In this process, the digital-audio data is compressed
to reduce storage requirements by as much as 4:1 or 5:1. In other words, a given
amount of digital-audio date requires 1/4 or 1/5 as much storage as an equivalent
amount of linear data.
There are many
types of digital-audio compression, which can be divided into two broad categories:
lossy and lossless. Lossy compression provides the greatest storage reduction,
but some of the information is lost forever. As a result, lossy compression
schemes are designed to lose information that in theory represents sound we
wouldn't hear anyway due to masking and other psychoacoustic effects. (However,
with most currently available compression schemes, you can, in fact, hear the
difference.) Lossless compression retains all the information in a file, but
the storage reduction is not as dramatic.
D/A CONVERSION
To play a digital-audio
signal, it must be converted back into analog form. After some error correction,
the digital signal is sent to a digital-to-analog converter (DAC). If it's a
stereo signal, it is first demultiplexed to separate the right and left channels.
The analog output
of the DAC still has a stair-step shape, which introduces high-frequency artifacts
into the signal. In addition, the process of digitization creates images of
the original waveform's harmonic spectrum centered at multiples of the sampling
rate. For example, if the sampling rate is 44.1 kHz, images of the original
spectrum appear centered at 88.2 kHz, etc. You might think that there is no
need to bother with these images, which lie outside the human hearing range.
However, these frequencies can cause audible problems in other audio components.
And if the sampling rate is relatively low (e.g., 11 kHz), the images can be
audible.
To solve both
problems, another brickwall lowpass filter, called an anti-imaging filter, is
traditionally placed after the DAC to remove any sonic components above the
Nyquist frequency and smooth out the stair steps. These days, many systems use
a digital anti-imaging filter before the DAC, which reduces the phase anomalies
that are so problematic with analog brickwall filters.
In many modern
systems, a digital filter uses oversampling to create a smoother, more accurate
output. In this process, the filter interpolates between the original sample
points.
LOW-BIT SYSTEMS
Although many
systems use sixteen bits or more to represent each instantaneous measurement,
another approach is gaining popularity. This approach is called low-bit conversion
because it uses only a few bits, sometimes even a single bit, to represent the
audio signal.
How is this possible?
Consider the following analogy. Traditional digital-audio systems are like a
row of sixteen light bulbs, each controlled by its own switch. There are 65,536
possible on/off combinations, which determine the brightness in the room. Room
brightness is analogous to the instantaneous amplitude of an audio signal. However,
each bulb has a different inherent brightness, which introduces error into the
system. This is analogous to the error introduced by high-bit converters.
You can also
control the brightness in the room with a single light bulb by switching it
on and off at a high rate. The brightness is determined by how long the light
is on relative to how long it is off. This is analogous to a 1-bit converter.
When the instantaneous amplitude is high, the converter sends mostly ones; when
the amplitude is low, the converter sends mostly zeros. Low-bit converters are
inherently more accurate than high-bit converters, but their sampling rate must
be much higher than high-bit designs.
One way to use
fewer bits is called differential coding. This technique is based on measuring
the difference between one instantaneous amplitude and the next rather than
the amplitudes themselves. It generally requires fewer bits to accurately represent
the differences, which are smaller than the actual amplitudes. For example,
delta modulation quantizes the difference (which is often represented by the
Greek letter delta) between consecutive amplitudes.
A more sophisticated
variation is called delta-sigma modulation. (This is sometimes called sigma-delta
modulation, although some audio professionals make a distinction between these
terms, using them to describe slightly different techniques.) This process takes
the difference (delta) between the current instantaneous amplitude and the integral
of the quantized previous difference. (Integrals are mathematical operations
related to sums, and sums are often represented by the Greek letter sigma.)
Delta-sigma converters provide excellent sound quality at a lower price, which
is why they are used so much these days.
Digital-audio
systems are difficult to design and build, but the basic concepts are relatively
easy to understand. Once you grasp these concepts, you can optimize your use
of samplers, DATs, digital multitracks, and hard-disk recorders and enjoy high-quality
audio for relatively little monetary investment. In addition, digital products
always improve their performance while falling in price, so the future looks
bright for all forms of digital audio.
Scott Wilkinson
digs digital audio. Thanks to Ken Pohlmann for his help with this article.
|