 When shopping for new hardware and software, desktop musicians must wade through a veritable alphabet soup of audio and MIDI protocols. Consumers are constantly confronted with a confusing array of abbreviations, from ASIO to WDM, representing technologies that may or may not be compatible. With so many protocols from which to choose, it's ironic that sometimes the word standard is used to describe them.
For the most part, a protocol is a clearly established, standardized method of handling communication. In a diplomatic context, protocols prevent people from offending folks with big weapons and bad attitudes. In a music-technology context, protocols provide for efficient and reliable handling of audio and MIDI data between hardware and software or between a host program and plug-ins.
Both Macs and PCs have such protocols built into their operating systems, but historically they have been geared toward general-purpose multimedia and gaming rather than critical audio production. The need to achieve consistent professional-level real-time performance with high-resolution multichannel audio is the driving force behind the development of third-party protocols.
Desktop musicians must be concerned with three primary kinds of protocols: audio drivers, plug-in formats, and MIDI-interface engines.
ASIO LIKE IT
Audio drivers are the bits of code that make hardware devices available to programs. No matter how different two audio interfaces may be at the hardware level, they need to present their functions to the software in a uniform way, and the audio driver enables them to do that.
Probably the best-known effort to improve the state of computer audio drivers is Steinberg's Audio Stream Input/Output (ASIO) specification, which applies to Macs and PCs. In essence, it bypasses Apple's Sound Manager and the Windows MME layer and communicates directly with hardware to provide multichannel, high-resolution capabilities. A hardware manufacturer that writes ASIO drivers can expect any ASIO-compliant program to be able to communicate with the company's hardware, and software developers can expect the same.
In addition to making extra features available, ASIO improves on the performance of OS-level audio protocols. In particular, it succeeds in reducing audio-processing delays (latency) that have traditionally been a relatively low priority for the designers of operating systems. With Windows MME drivers, for example, it could take one-half to three-quarters of a second for audio to pass from an input to an output. ASIO brings latencies as low as 6 to 8 ms with current drivers. It accomplishes that in part by giving users control over their audio-buffer settings (see the sidebar “Loving Latency Lost”). The faster your system, the less read-ahead buffering it will need to keep up with the flow of audio, and the lower the latency will be as a result (see Fig. 1).
In another approach to reducing latency, many hardware manufacturers build direct input monitoring into their interfaces, allowing incoming audio to go immediately to an output without passing through the computer. That lets a guitarist hear what he or she is playing in real time along with the prerecorded tracks, for example, but it doesn't let you process the guitar with any software effects.
ASIO 2.0 addresses that problem by allowing you to bypass the hardware's direct monitoring and pass the input through the software. You can then process the signal with reverb and other plug-ins and monitor the audio with a manageable 6 to 10 ms delay. Version 2.0 also adds sample-accurate positioning for ADAT Optical transfers and provides for multiclient functionality, letting audio devices be shared among applications.
NICE AND EASI
Although ASIO successfully delivered performance that was not possible with the default Mac or Windows audio drivers, Steinberg wasn't the only company working to make things better. Emagic developed its own solution, called EASI (Enhanced Audio Streaming Interface). EASI was announced after ASIO 1.0 was released, and it anticipated some of ASIO 2.0's improvements, which included sample-accurate synchronization. EASI was the audio driver protocol of Logic Audio starting with version 4.0, and Emagic opened it to developers as a potential standard.
Ultimately, ASIO 2.0 grabbed the spotlight, and EASI now enjoys only limited support. Some hardware manufacturers continue to provide EASI drivers, but in a bow to the market's realities, all Logic Audio versions also support the ASIO standard.
MODEL DRIVER
Meanwhile, Cakewalk, unconcerned with cross-platform issues for its Windows-only Pro Audio 9, was convinced that the shortest path to superior performance was through the operating system itself. Instead of jumping on the ASIO bandwagon, Cakewalk worked to convince Microsoft of the importance of pro-level audio support. As a result, Cakewalk's support for input monitoring and other features lagged behind its competitors. However, with the release of Sonar, the successor to Pro Audio 9, Cakewalk's efforts have finally paid off.
Sonar is the first application to take full advantage of the new Windows Driver Model (WDM), the core audio driver for Windows 98 SE/ME and Windows 2000/XP. WDM kernel streaming represents a huge leap forward for audio performance on a PC, providing a much more efficient and flexible link between hardware and software. WDM audio drivers finally solve, at the OS level, the problems for which ASIO and EASI had previously been the only solutions. For example, when I switched from the old MME drivers to WDM drivers on my Pentium II/266 MHz laptop system, my latency when playing a virtual instrument dropped from more than half a second to an imperceptible amount. The performance of WDM drivers is said to be especially good under Windows 2000.
Microsoft isn't alone in finally seeing the light. Apple has implemented a new set of audio application programming interfaces (APIs) called CoreAudio in Mac OS X, and its stated intention is to make third-party protocols (such as ASIO and EASI) unnecessary. Apple's list of objectives for CoreAudio includes goodies such as multichannel audio and application-determined latency, and an independent test shows that CoreAudio is indeed capable of impressively low latency.
Does that mean ASIO is in the position of answering a question nobody's asking? CoreAudio's impact is not yet clear, but many major hardware manufacturers are or soon will be offering WDM drivers, and other developers of Windows audio software are working on WDM compatibility.
Still, ASIO is familiar ground for many developers and consumers, and because a lot of companies make products for both Macs and PCs, it might make sense for them to keep ASIO around to facilitate cross-platform development. My crystal ball shows hardware manufacturers turning out three drivers for every audio interface — ASIO/Windows, ASIO/Mac, and WDM — at least until some consensus about WDM and CoreAudio is reached. (The same crystal ball misses the Lotto numbers week after week, so who knows?)
A couple of other terms you'll often hear in the same breath with ASIO, EASI, and WDM are MAS (MOTU Audio System) and DAE (Digidesign Audio Engine). They are horses of a different color, though: technically, they're hard-disk recording engines, not I/O protocols. As such, they function at a different level, communicating with audio hardware through the drivers. MAS, for example, can work with ASIO or Sound Manager.
PLUGGED IN
Plug-ins operate at yet another level by assimilating into the host application. Plug-ins are everywhere these days, even popping up in music-notation programs. I'll focus on host-based real-time audio plug-in formats that are open to third-party developers, because that's where much of the action is. Of the major real-time native plug-in formats, two, Steinberg's Virtual Studio Technology (VST) and Microsoft's DirectX, are supported by multiple host programs (see the sidebar “Who's Plugged In?”). The other biggie is MAS (yes, that name does double duty). Although it's exclusive to Mark of the Unicorn's (MOTU's) Digital Performer and AudioDesk, it is supported by a significant roster of big-name third-party developers.
Although some people use the term VST to refer to Cubase VST, it actually applies specifically to Steinberg's cross-platform, real-time native plug-in format. VST defines the ways in which plug-in developers integrate their designs with VST-compatible host programs.
Like other native plug-in formats, VST provides a 32-bit floating-point architecture with automation and MIDI control. Those features let you record changes to plug-in settings and also sync effects to the tempo of your sequence. Cubase VST's Automatic Plug-In Delay Compensation (see Fig. 2) can shift tracks slightly ahead to offset the small amount of time required for plug-ins to process audio.
VST version 2.0 introduced support for virtual instruments, sometimes informally referred to as VSTi. In the past, software synthesizers ran as separate programs, and you had to redirect MIDI and audio between applications using so-called virtual cables such as the Mac's IAC Bus or Hubi's Loopback on the PC. By integrating software synths into the host application, VST 2.0 not only simplified the setup but also reduced the strain on system resources. It's not unreasonable to suggest that VSTi added momentum to the software-synthesizer revolution by making the technology so easy to use.
THE DIRECT APPROACH
Microsoft's DirectX 8 is the other factor in the WDM and Sonar revolution. DirectX (technically DirectShow, part of the DirectX family) has provided basic real-time plug-in support for Windows audio programs for years, but now its feature set has been enhanced significantly.
Prior to version 8, DirectX just wasn't in the same league as VST or MAS. Although its 32-bit architecture allowed developers to write competitive-quality effects, DirectX's lack of support for automation and virtual instruments made it a less interesting format than the other two. That has changed, and DirectX 8 supports automation with subsample timing resolution.
Working closely with Microsoft, Cakewalk developed a way to incorporate virtual instruments into the DirectX plug-in environment. Cakewalk's DirectX Instruments (DXi) format was launched in early 2001 with support from several software-synthesizer manufacturers, and the instrument list continues to grow (see Fig. 3). Like VSTi, DXi supports SysEx and provides for communication of patch names from the synth to the host program.
There has been a perception in the industry that DirectX plug-ins run less efficiently than their VST counterparts, and some circumstantial evidence has supported that impression. According to plug-in developers with whom I spoke, though, that is not the fault of the DirectX specification — VST and DirectX offer identical latency specifications in most cases.
With the major plug-in formats sporting such attractive features, the only potential downside for users would be having to choose just one. Fortunately, the major VST hosts support DirectX plug-ins in their Windows versions, and there are programs designed to make VST plug-ins operate under DirectX or MAS. That is accomplished by wrapping a plug-in in a thin layer of code that makes a VST plug-in appear to be a DirectX or MAS plug-in to a DirectX- or MAS-compatible host. One such program is VST-DX Adapter from FXpansion Audio. Digital Performer devotees can use Audio Ease's VST Wrapper to add VST plug-ins to their palettes.
In fact, plug-in developers may write for one format and then simply add a wrapper layer to make their plug-in function in the other format rather than completely rewrite the plug-in code. The wrapper layers can be implemented efficiently enough that performance degradation is negligible.
BATTLING MIDI INTERFACES
It seems as though everybody is coming out with new MIDI-interface protocols these days. The good news is that they're typically cross-platform solutions; the bad news is that they're all proprietary.
Emagic led the way with AMT (Active MIDI Transmission), first implemented in its Unitor MIDI interface. In an effort to improve MIDI timing, Emagic devised a way to buffer MIDI messages in the interface before they are sent. That allows data at different ports to be sent simultaneously, instead of sequentially, for tighter timing.
Emagic claims accuracy of better than 1 ms when the Unitor is used with Logic Audio. That is the sword's other edge. Although such precise computer-to-interface timing is great, it requires an interface capable of buffering the MIDI data and a compatible host program that can presend the data to the interface. That's why AMT and its rivals are proprietary systems. Use the Unitor with a different program, and AMT no longer works for you.
A conceptually similar timing scheme joins MOTU's Digital Performer with its line of USB MIDI interfaces. A hardware-based MIDI-transmission engine called MIDI Time Stamping (MTS) leverages the increased bandwidth of USB (over serial) to deliver what MOTU claims to be ⅓ ms timing between the computer and the interface. The latency for recording MIDI data is said to be even lower: only 1/12 ms. Like AMT, MTS makes use of data buffering within the interface. But MTS time-stamps every data event, regardless of which cable it's going to, so MTS affects the accuracy of every data event, not just data events distributed across multiple cables. Digital Performer's MIDI resolution has been upgraded to take advantage of that precision, so now you can adjust the resolution over a wide range and select microresolutions with as many as four decimal places.
Not to be outdone, Steinberg jumped into the fray with a similar technology called Linear Time Base (LTB). LTB is automatically enabled when you use Steinberg's Midex8 interface with the latest revision of Cubase VST/32 5.0. Like its counterparts, LTB makes use of time stamping and buffering to increase the accuracy of MIDI timing.
STANDARD DREAMS
If, like me, you dream of genuine universal standards (of which MIDI itself may be the best example) this part of the story doesn't have a fairy-tale ending — at least not yet. The three proprietary MIDI timing schemes aren't likely to merge, but Apple's Mac OS X includes another little gem called the CoreMIDI API, which, like CoreAudio, is intended to make third-party standards obsolete. Its performance objectives include latency lower than 1 ms and very low jitter (timing variations). Windows reportedly also has such an infrastructure. If hardware and software manufacturers adopt these standards, everyone will benefit.
Now that the major audio and MIDI developers have raised the bar and, in doing so, have finally made Apple and Microsoft take notice, professional-level support for MIDI and audio is widely available through operating-system protocols and through various independent standards.
Brian Smithers
is the associate course director of MIDI at Full Sail Real World Education in Winter Park, Florida.
LOVING LATENCY LOST
Achieving the lowest latency your system can support without degrading audio performance requires balancing three factors: the Disk Block Buffer size, the number of buffers, and the amount of memory that is allocated for each channel. Fundamentally, the smaller the buffer size, the lower the latency. Keep in mind, however, that the smaller the buffer size, the closer to the ragged edge of disaster your system is running. If you hear audio degradation (clicking or dropouts) you should increase the buffer size or number of buffers. That requires a corresponding increase in the amount of memory that is allocated per channel, which can decrease the maximum number of channels your system will support (see Fig. A).
Steinberg's Cubase VST manual has detailed instructions for adjusting those settings and encourages experimentation to find the optimum settings for your system and needs. Some users deliberately set their buffers a bit too low when recording VSTi parts, enduring some sonic compromises in favor of more responsive timing. During mixing, they set their buffers higher, ensuring flawless audio performance despite somewhat slower response.
Your system's best performance will probably be achieved by using your audio interface's dedicated ASIO drivers (if it has them) rather than the ASIO Multimedia or ASIO DirectX Full-Duplex drivers. For information about audio cards that offer ASIO drivers, check out the web site http://service.steinberg.net/testbase.nsf or www.kvr-vst.com/asio.php.
WHO'S PLUGGED IN?
Here's a partial list of host programs that offer native (without the benefit of wrappers) support for the ubiquitous DirectX and VST real-time plug-in formats.
DirectX: Cakewalk Pro Audio and Sonar (only Sonar supports DXi), Emagic Logic Audio (PC version), IQS SAW, Magix Samplitude, Sonic Foundry Sound Forge and Vegas, Steinberg Cubase VST and WaveLab (PC versions), Syntrillium Cool Edit Pro
VST: BIAS Peak; Cakewalk Metro; Emagic Logic Audio; E-mu PARIS; IQS SAW; Steinberg Cubase VST, WaveLab, and Nuendo; TC Works Spark
|