blank
blank blank blank
blank
Product Categories
Analog Consoles
Audio Patch Bays
Cassette Multitracks
CD Recorders
Channel Strips
Computer-based DAWs
Digital Audio Converters
Digital Consoles
Digital Mixdown
Direct Boxes
Drum Machines
Dynamics Processors
Effects Processors
Equalizers
Headphones
Keyboard Synths
Microphone Preamps
Microphones
MIDI Interfaces
Modular Digital Multitracks
Modular Hard-disk Recorders
Portable Digital Studios
Power Amps
Reference Monitors
Sequencers
Sonic Treatment
Studio Furniture
Synchronizers
Synth/Sampler Modules
blank

Digging Into Digital Audio

 Scott Wilkinson

Electronic Musician, Feb 1 1996

The same basic principles apply to all forms of digital-audio recording, storage, and playback. This includes samplers, digital multitrack tape decks, DATs, hard-disk recorders, and CDs. If this realm remains foreign to you, read on.

Print-friendly format E-mail this information


Sound is an analog phenomenon. The changing air pressure that pushes and pulls on our eardrums varies smoothly rather than jumping discretely from one pressure to another. Most electrical audio signals are also analog; the voltage in a cable varies smoothly in a way that mimics the changing air pressure of the sound represented by the signal. In fact, the signal's changing voltage is analogous to the changing pressure of the sound it represents, hence the term analog audio.

Recently, however, the landscape has been altered somewhat. Audio signals are now commonly stored and transmitted as digital information. This offers several advantages over analog audio. For example, there is no loss of audio quality as you make copies of the data. In addition, it is much easier to edit and assemble digital audio information. Finally, there is virtually no tape noise when recording digital audio.

Fortunately, the same basic principles apply to all forms of digital-audio recording, storage, and playback. This includes samplers, digital multitrack tape decks, DATs, hard-disk recorders, and CDs. If this realm remains foreign to you, read on.

DIGITAL BASICS

Humans use ten digits—0 to 9—to express all numbers; this is called the decimal number system. The decimal system probably arose because we have ten fingers (which are also called digits). To express numbers larger than 9, we combine two or more digits. For example, with two decimal digits, we can express 100 numbers from 0 to 99. With three decimal digits, we can express 1,000 numbers from 0 to 999.

Computers use only two digits: 0 and 1. This is called the binary number system, and binary digits are called bits (short for Binary digITS). Like humans, computers combine two or more bits to express larger numbers. For example, with two bits, you can express four numbers: 00, 01, 10, and 11. With three bits, you can express eight numbers, from 000 to 111.

Are you starting to see a pattern here? The pattern is this:

Number of numbers you can express = 2(number of bits you combine)

So, if you have eight bits, you can express 28 = 256 numbers; with sixteen bits, you can express 216 = 65,536 numbers.

Computers almost universally combine eight bits into what is called a byte; a group of four bits is half a byte, which is called a nibble. These days, most computers also work with groups of bits called words.

A/D CONVERSION

The starting point of most digital-audio systems is an analog audio signal from a microphone or other analog source. (Some systems can generate digital audio from scratch without an analog source, but I'm going to put this idea aside for now.) The goal is to convert the analog audio signal into a series of discrete digital numbers that a computer can deal with.

A sample-and-hold circuit measures, or samples, the instantaneous voltage, or amplitude, of an analog audio signal and holds that value until an analog-to-digital converter (ADC) converts it into a binary number. The sample-and-hold circuit then reads the next instantaneous amplitude and holds it for the ADC. This occurs many times per second as the signal's alternating voltage rises and falls. As a result, the smoothly varying analog waveform is converted into a series of "stair steps".

In some systems, the lowest possible instantaneous amplitude is represented by a string of zeros, and the highest possible instantaneous amplitude is represented by a string of ones. In other systems, a string of 0s represents the middle of the possible amplitudes. Values with a zero as the first bit represent amplitudes above the middle (positive), while values with a one as the first bit represent amplitudes below the middle (negative). This is called two's-complement representation, which allows for positive and negative numbers.

Stereo signals are converted separately and then multiplexed, or combined, into a single stream of binary numbers. The numbers representing the right and left channels are interleaved, or alternated, in the stream.

The most common technique for encoding each instantaneous amplitude is called pulse-code modulation (PCM). Each bit is a code for an electrical or optical pulse; 1 = high-level pulse, 0 = low-level pulse. For example, if an instantaneous amplitude is represented by the binary number 1101, four pulses are sent: high, high, low, high. The rate at which the measurements are taken and the number of bits used to represent each measurement are the two most fundamental concepts in digital audio.

SAMPLING RATE

The rate at which the instantaneous-amplitude measurements are taken is called the sampling rate, and the time between measurements is called the sampling period. The more often measurements are taken, the higher the frequency that can be accurately represented. However, more measurements require more storage (which we'll discuss in more detail shortly).

If the frequency of the analog signal is low compared with the sampling rate, you get an accurate representation of the signal. If the frequency of the signal is over half the sampling rate, though, some weird things start to happen (more in a moment). The frequency that corresponds to half the sampling rate is called the Nyquist frequency after American engineer Harry Nyquist. For example, if the sampling rate is 48 kHz (48,000 measurements per second), the Nyquist frequency is 24 kHz.

The Nyquist frequency is the maximum frequency that the system can accurately represent and reproduce. This is called the audio bandwidth of the system. For example, if the sampling rate is 48 kHz, the system can represent and reproduce audio signals at frequencies from 0 to 24 kHz. In other words, the audio bandwidth of the system is 24 kHz.

By contrast, the digital bandwidth of the system is the maximum number of bits per second it can transmit or receive. For example, if the maximum sampling rate is 48 kHz and each instantaneous-amplitude measurement is represented with sixteen bits (more in a moment), the digital bandwidth is 48,000 x 16 = 768,000 bits per second, or 768 kbps. In a stereo system, this digital bandwidth would double to 1.536 megabits per second (Mbps).

When digitizing a signal whose frequency is greater than the Nyquist frequency, you run into a problem called aliasing. In this case, the measurements of instantaneous amplitude don't accurately reflect the shape of the original signal's waveform. The measurements are taken at disparate points along the waveform. When these measurements are reconstructed into an analog signal, it has a lower frequency than the original. (In fact, several alias signals appear above and below the original frequency.)

As a precaution against aliasing, the input signal is sent through an antialiasing filter before it reaches the sample-and-hold circuit. This lowpass filter blocks any frequencies that are greater than the Nyquist frequency of the system while passing all frequencies below the Nyquist limit. The slope of the filter is very steep, which leads many people to call it a brickwall filter.

All CDs use one sampling rate—44.1 kHz—which is also common among samplers, DATs, hard-disk recorders, and digital multitracks. This rate was adopted as a standard because its Nyquist frequency is 22.05 kHz, which is just above the top of the human hearing range. As a result, all frequencies we can hear are accurately represented. However, there is much debate in the audio industry about whether or not overtones above 20 kHz make an audible contribution to the entire signal. In fact, some DATs are now available with a sampling rate of 96 kHz to address this issue.

Many professional systems offer a sampling rate of 48 kHz in addition to 44.1 kHz. Multimedia titles often use lower sampling rates of 11 kHz or 22 kHz to reduce storage requirements. This yields lower audio quality, which isn't considered as critical in this application because most computer audio-playback systems have relatively low fidelity anyway.

In many samplers, it's possible to use different sampling rates to conserve storage requirements. For example, you might sample the lowest notes of a bass at 11 kHz; there are probably no overtones above 5.5 kHz, so you don't lose anything by sampling these notes at a lower rate. Higher notes can be sampled at 44.1 kHz and combined with the low notes to form an entire sampled bass.

In some systems, the input is sampled at a higher rate than will be used to reproduce the signal; this is called oversampling. As you might imagine, this increases the Nyquist frequency and reduces aliasing. After the signal has been sampled, a digital filter removes any frequency components above the final Nyquist frequency, and the data is output at the final sampling rate.

RESOLUTION

The number of bits used to represent each instantaneous measurement is called the resolution or word length. The greater the resolution, the more accurately each measurement is represented. However, the more bits you use, the greater the storage requirements (more in a moment). Until very recently, the most common resolution for digital audio was 16 bits. However, many digital audio products use 18 bits, and some professional systems use 20 or 24 bits, whereas multimedia titles often use 8 bits to conserve storage.

The resolution determines the number of steps between the lowest and highest instantaneous amplitude the system can represent. With 16-bit resolution, there are 65,536 steps between the lowest and highest amplitudes. This defines the dynamic range of the system. Theoretically, the dynamic range of a 16-bit system is 98 dB, but various factors reduce this figure to about 90 dB for practical purposes.

No matter how many bits are used to represent each instantaneous measurement, the representation is not always completely accurate. In most cases, the actual measurement value must be rounded to the nearest binary number. This is called quantization, and the difference between the actual measured amplitude and the quantized binary representation is called quantization error.

Quantization error can lead to audible quantization noise, which is particularly apparent in signals of low amplitude because only a few bits are used to represent the entire signal. As a result, you should try to keep the input signal's overall amplitude as close as possible to the maximum level that the system can accommodate. Optimizing the gain structure of your audio system can be a big help in this regard (see "Recording Musician: Gain Stages" in the November 1993 EM.)

However, you must be careful not to exceed the system's maximum signal level. If the instantaneous amplitude of the input signal rises above the highest point that can be represented by the binary numbers, the signal will be clipped (i.e., the top of the waveform will be chopped off, forming a horizontal line). This makes a very unpleasant noise. Unlike analog recorders, the input-signal level must not exceed 0 on the VU meter in order to avoid clipping. Some digital recorders actually calibrate the 0 VU point a few dB below the actual clipping point so users can exceed this level without clipping as if they were using an analog recorder.

The most common solution to quantization noise is called dithering. In this process, a small amount of noise is added to the input signal before it is measured and quantized. This randomizes the quantization error, reducing its audible effect. For this reason, it is particularly important to apply dithering to minimize audible artifacts that arise when the resolution of a digital-audio signal is reduced, which is a common procedure in multimedia titles.

STORAGE

Once the signal has been digitized into a stream of binary numbers, it is stored in one medium or another. Common media include magnetic tape or disk, optical disc, RAM, and ROM. At a sampling rate of 44.1 kHz and a resolution of sixteen bits, digital audio data consumes over 5 MB per minute for a monaural file or 10 MB per minute for a stereo file. Digital-audio data stored in this manner is referred to as being linear.

To reduce storage requirements, you can reduce the sampling rate and/or resolution, but this also reduces audio quality. Another option is called compression, which is often used in multimedia titles. In this process, the digital-audio data is compressed to reduce storage requirements by as much as 4:1 or 5:1. In other words, a given amount of digital-audio date requires 1/4 or 1/5 as much storage as an equivalent amount of linear data.

There are many types of digital-audio compression, which can be divided into two broad categories: lossy and lossless. Lossy compression provides the greatest storage reduction, but some of the information is lost forever. As a result, lossy compression schemes are designed to lose information that in theory represents sound we wouldn't hear anyway due to masking and other psychoacoustic effects. (However, with most currently available compression schemes, you can, in fact, hear the difference.) Lossless compression retains all the information in a file, but the storage reduction is not as dramatic.

D/A CONVERSION

To play a digital-audio signal, it must be converted back into analog form. After some error correction, the digital signal is sent to a digital-to-analog converter (DAC). If it's a stereo signal, it is first demultiplexed to separate the right and left channels.

The analog output of the DAC still has a stair-step shape, which introduces high-frequency artifacts into the signal. In addition, the process of digitization creates images of the original waveform's harmonic spectrum centered at multiples of the sampling rate. For example, if the sampling rate is 44.1 kHz, images of the original spectrum appear centered at 88.2 kHz, etc. You might think that there is no need to bother with these images, which lie outside the human hearing range. However, these frequencies can cause audible problems in other audio components. And if the sampling rate is relatively low (e.g., 11 kHz), the images can be audible.

To solve both problems, another brickwall lowpass filter, called an anti-imaging filter, is traditionally placed after the DAC to remove any sonic components above the Nyquist frequency and smooth out the stair steps. These days, many systems use a digital anti-imaging filter before the DAC, which reduces the phase anomalies that are so problematic with analog brickwall filters.

In many modern systems, a digital filter uses oversampling to create a smoother, more accurate output. In this process, the filter interpolates between the original sample points.

LOW-BIT SYSTEMS

Although many systems use sixteen bits or more to represent each instantaneous measurement, another approach is gaining popularity. This approach is called low-bit conversion because it uses only a few bits, sometimes even a single bit, to represent the audio signal.

How is this possible? Consider the following analogy. Traditional digital-audio systems are like a row of sixteen light bulbs, each controlled by its own switch. There are 65,536 possible on/off combinations, which determine the brightness in the room. Room brightness is analogous to the instantaneous amplitude of an audio signal. However, each bulb has a different inherent brightness, which introduces error into the system. This is analogous to the error introduced by high-bit converters.

You can also control the brightness in the room with a single light bulb by switching it on and off at a high rate. The brightness is determined by how long the light is on relative to how long it is off. This is analogous to a 1-bit converter. When the instantaneous amplitude is high, the converter sends mostly ones; when the amplitude is low, the converter sends mostly zeros. Low-bit converters are inherently more accurate than high-bit converters, but their sampling rate must be much higher than high-bit designs.

One way to use fewer bits is called differential coding. This technique is based on measuring the difference between one instantaneous amplitude and the next rather than the amplitudes themselves. It generally requires fewer bits to accurately represent the differences, which are smaller than the actual amplitudes. For example, delta modulation quantizes the difference (which is often represented by the Greek letter delta) between consecutive amplitudes.

A more sophisticated variation is called delta-sigma modulation. (This is sometimes called sigma-delta modulation, although some audio professionals make a distinction between these terms, using them to describe slightly different techniques.) This process takes the difference (delta) between the current instantaneous amplitude and the integral of the quantized previous difference. (Integrals are mathematical operations related to sums, and sums are often represented by the Greek letter sigma.) Delta-sigma converters provide excellent sound quality at a lower price, which is why they are used so much these days.

Digital-audio systems are difficult to design and build, but the basic concepts are relatively easy to understand. Once you grasp these concepts, you can optimize your use of samplers, DATs, digital multitracks, and hard-disk recorders and enjoy high-quality audio for relatively little monetary investment. In addition, digital products always improve their performance while falling in price, so the future looks bright for all forms of digital audio.

Scott Wilkinson digs digital audio. Thanks to Ken Pohlmann for his help with this article.



© 2008, Primedia Business Magazines and Media, a PRIMEDIA company. All rights reserved. This article is protected by United States copyright and other intellectual property laws and may not be reproduced, rewritten, distributed, redisseminated, transmitted, displayed, published or broadcast, directly or indirectly, in any medium without the prior written permission of PRIMEDIA Business Corp.

Get Copyright Clearance Want to use this article? Click here for options!
© 2008, PRIMEDIA Business Magazines & Media Inc.

Print-friendly format E-mail this information

blank
blank