The EMU 8000 and the AWE -
the beauty and the beast

It has been claimed that the AWE's synth chip is the same one that is used in E-Mu's latest pro-level samplers. Even if this is true, the AWE is far from professional. Read this and find out why.

Written by Mathias C. Hjelt
V3.0 - 18 Sep 95
PREFACE

This is a replacement for the old "EMU-8000 - Professional or not?" page that previously was found here. Big parts are the same, but idea behind the whole page has changed, it approaches the problem from another point of view - you can see that if you compare the old and the new title.

INTRODUCTION

Anyone should understand that - a sound-card that is this cheap can't be as good as a sampler that is dedicated to professional musicians and that costs ten times more. Still, many seem to believe it.

On the other hand, some people seem believe that the AWE's built-in General MIDI sounds are lousy only because the EMU-8000 isn't capable of producing better sound. Wrong again. Anyone could try pressing 128 sounds and a drum kit into one megabyte of memory and see what it is sounds like. It's a matter of space, not performance.

However, the performance of the AWE's synth is limited. Badly. On this page, I've described the various terrible drawbacks of what CL so proudly calls "Advanced WavEffects synthesis". But first you have to know a little about how synth chips work in general, in order to be able to understand where all these problems come from.

A BEAUTY AND A BEAST? REALLY?

For a while, I believed that the EMU-8000 was a very cheesy synth, something that perhaps was based on E-Mu's real chips but had been modified heavily to fit Creative Lab's requirements regarding production costs and such quality-tearing matters. However, although nothing has been proved, I have now changed oppinion on this. I believe that the EMU-8000 would behave in much more professional ways if it was not treated like a caged pig.

The catch is that a synth chip alone is as dumb as a computer without a program. A synth chip needs something that tells it what to do - what sounds to play, when to play them, how loud, what pitch, etc. Without such a companion, it does nothing at all, it does not know what to do. In normal synthesizer keyboards and such devices, there is a microprocessor or micro-controller that handles this, and several sound-cards and all Waveblaster-compatible boards for PC have built-in processors that do it. On the AWE, there's no such processor - your PC is the master.

To reduce costs, Creative Labs decided not to put in a processor on the AWE. The result is that the EMU won't respond to music sent to the MPU-401 port like other General MIDI cards do - simply because it can't. It can't, because it does not understand MIDI information, which is what the MPU port delivers. We all know what this means - many games won't work with the AWE although they work with many other wavetable cards and Waveblaster-compatible daughter-boards.

This does also mean that the driver that the PC is running (the Windows driver, AWEUTIL, or any DOS program using CL's API) has got to keep track of a lot things. It reads MIDI data, checks out which samples to use for the current patches, determines volumes, pitches and stuff, and sends short commands to the EMU which then handles the practical things - it reads samples from memory and plays them with the pitch, volume and other settings demanded by the driver.

It is difficult to know how big parts of the work is done by the PC, and how much the EMU-8000 can do itself, but it seems like the driver does everything except for the actual playback of the invididual samples. This means that the reliability of various functions that the SBK format provides is equal to the reliability of the driver - if the driver works perfectly, the patches sound perfectly, if the driver doesn't do its duties right, the.. well, we'll come to that.

To put all that in a nutshell - many of the things that makes the AWE useless for demanding musicians is not caused by the EMU-8000, but by the drivers. And that explains the title for this whole page, that stuff about the beauty and the beast - the power and the partly unknown beauty of the EMU is imprisoned by the rough and dirty things that make up the rest of the AWE, the beast. (Would this be the concept for a new-age Disney hit? That's a joke, folks)

SO, WHAT'S WRONG WITH IT?

Several things. Let's start with some of the more embarrassing stuff to humiliate ourselves a bit. It's better to face the facts than to live in an imaginary euphoria based on vicious self-delusion, right? (Anyone who doesn't agree and want to discuss this with a philosophical approach is free to mail me.)

Note: Everything I write here is very likely true - anyone can try it out and experience it themselves. If something is not completely correct, it is still close to the truth (some of the matters discussed here can't be known for sure without access to things like the micro-code of the EMU and stuff like that, so it's only based on careful observations and experiments). I do not want to give the AWE an unfair negative reputation (why would I, I have one myself and I like it), I just want to make the truth known - many have noticed that EMU is funny, and have thanked me for giving the explanation for it all on my pages. So please stay cool. This is what I'm going to tell you about the EMU being Creative's slave:
..but honestly, it's nothing to get upset about. it's just a PC card, right?



STEREO AND MULTI-LAYERING - OH YEAH?

Right, the EMU 8000 is a stereo synthesizer, with panning, stereo effects and all, but under the command of the present controlling system (the beast), it can't play stereo samples correctly. Not as long as you have to create them through multi-layering, i.e having one layer panned to the left, another to the right. The problem is very simple but hard to accept (you DID buy a stereo card, didn't you?) and really annoying (- this particular card IS famous for its multi-layering and stacking capabilities, isn't it?). The catch is that the various samples of a multi-layered patch do not start playing simultaneously when a note is triggered, and this leads to nasty phase errors between the layers. And the worst thing of all is that the phase error isn't predictable - it's totally random. Every time you trigger a note, you get a new phase offset for each layer, and a new sound.

This is caused by sloppy driver code in combination with a simple instrument definition standard. When the driver causes a delay between the trigging of two layers, the EMU simply can't be held responsible for the phase errors that occour - it doesn't know that anything's wrong, and happily plays the layers out of sync. Another question is - how on earth can the driver do things like this, in any case? I mean, you wouldn't expect people to write drivers in BASIC, right?

What's the audible result of this? It depends on how the layers are configured (panning, volume, etc), but usually this results in frequencies (or even whole layers, if you layer several similar samples) being more or less cancelled out. The timbre of the sound changes slightly, and the volume can vary drastically, especially with doubled samples. In a left-right panned stereo patch, the sound can seem to appear on different places in the stereo field every time it is triggered, although it ought to stay centered. All this sounds rather sad, doesn't it? Then think about all the cool things about it: your music becomes less robotic, it gets an unpredictable element added to it, panning is no more as strict as it could be, and sounds don't always sound the same. Now isn't that exciting?

But don't get desperate - according to Creative, our hero and saviour, a new SoundFont standard is about to be released. The new drivers already support it, and along with various new features it has got a new way of defining stereo samples which will fix this problem. Atleast that's what they say.



GOODBYE, ATTACKS

Now that we know that the AWE can't do multi-layering and stereo samples correctly, we can continue making the day even worse (nothing is ever so bad it can't get worse, right?). To accomplish this, let's think about what it actually means that all samples always have a 10ms attack (fade-in) time, even when the amplitude evelope's attack time is zero.

It is very obvious that a fade-in delay like this will cause short attacks and transients in the beginning of samples to be either left out or just more or less damped. In either case the sound will lose a part of its original punch or sharpness. Especially percussion samples are in the danger zone. Melodic sounds will just get a few cycles cut out, and that's not very critical, but stuff like a hihat or a real punchy kick will suffer. Prevent this by putting a padding space of a some 300-600 zero samples in the beginning of all samples and get one hell of a trouble with timing (7..14 ms delay).

This forced attack time was probably implemented to avoid the risk of having pops and clicks in the beginning of samples that do not start at zero level. Not only the attack is smoothed, it's the same thing with the release envelope parameter. Not even a zero setting makes the sound stop immediately when you release they key - there's a 10ms delay here too. It makes the EMU foolproof but stupid.

"AWE Tech - a look under the cover".

The only poor consolation we have is that the same operators can used for both DRAM refresh and FM passthrough - we should be happy that we didn't lose four ops, only two!

No, don't say that 30 voices is more than enough. All you need to do is to play a few cool cords using a few multi-layered (did you hear that? multi-layered! ha!) patches, and poof - you're running out of polyphony. Let's have a band enter the stage. First we have a Hammond B-3 player. With all those rumblings and funky cords, he needs maybe 5 notes, and since the hammond patch happened to be multi-layered, he eats up 10 voices. Another person plays the piano with a lot of sustained stuff, and easily consumes 8 voices just like that. Strings or other synth pads, ah well, let's say 6 voices (multi-layering is fun, just like playing chords). Sax lead, one voice (for those occasions when you have to let two tones overlap to get the right growl). Vast percussion section, 5 voices. Now, where the heck do you think you're gonna put the bass player? Two more voices could very valuable, no question about it.

Since the EMU (or the driver) has got some pretty smart way of determining what do when it is running out of voices, it wasn't easy to confirm that these 2 voices really are missing, but after a while I succeeded - and there were indeed only 30 voices left. Once again, sad but true.



WHY 16 BITS NEVER ARE 16



Have you ever wondered why your samples don't sound quite right when played through the EMU? Have you ever noticed that they become more noisy? Then read this. Read it even if you haven't found anything to complain about, this is just a piece of common digital audio theory which you may find interesting.

This is something that isn't specific for the AWE/EMU. I guess most "16-bit" wavetable / sample playback cards operate this way, simply because there is no way around it. It's about maths and bits. To make it as comprehensible as possible for as many as possible (even for those who don't quite know what a "bit" is mathematically speaking), I'll first give a few general examples, and then get into the AWE/EMU details. So this is not only about why the EMU sucks, it's about sample mixing in general.

Some theory..

Since sound usually is considered an undulatory wave motion, it can be described as a function over time. In a graph, the X-axis is time, and the Y-axis shows the level. A sampled sound has got a finite number of "Y"-levels that can be used, and the number of bits determines how many they are. A 16-bit sample has got 65536 different levels, since (2^16 = 65536). During the digitizing process, the level of the incoming signal is quantized, rounded, to the closest possible of these 65536 values, and a 16-bit value describing the current level is achieved, and this is repeated 44100 times/second if a sampling rate of 44.1kHz is used. Since signed numbers are used to describe the waveform, a scale that goes from -32768 to 32767 is used (click if you wonder why), so a graph of a sine wave with maximal peak-to-peak amplitude (65536) would look like this:

Graph of sinewave

Now imagine you want to play two such waves at the same time - you want to mix them (in this case we only play with sine waves of the same frequency, phase and amplitude, to make it easier, but you could imagine that one of the waves is a piano and the second is a snare or whatever..). That is done simply by adding them, that is, the value ("level") of each sample point in wave 1, is added to the value of the corresponding sample point of wave 2. This is the result, simply a sinewave that has got an amplitude that is twice that of the two original waves:

Graph of double sinewave

The important thing is that the amplitude has been now doubled, it's now 65536 + 65536 = 131072. To express this large a value, we need 17 bits (2^17 = 131072). And this is where we run into trouble. We want to play this sound through or 16 bit DAC, but that's impossible, because we're playing with too large numbers. Somehow, we have to squeeze it down.

The clipping method

If we just try to limit the values of our doubled wave, by clipping off everything that exceeds the 16-bit range, we get a wave that looks like this:

Graph of clipped sinewave

It does indeed fit into a 16-bit DAC, but sounds like crap. When you do that to a sound, you add an infinite number of harmonics to it, and bascially that's just what distortion is all about. That's what you do to your guitar sound to get that killer sound. (If you don't want a killer sound but a smooth fuzz or tube-like overdrive, you'll have to do smooth clipping, but that's a whole other story). In any case, this is definitely not what you want on regular music.

In the example above, only two 16-bit samples were mixed. Imagine what it'd be like if you'd mix 32 sounds - you'd get a wave with a maximal amplitude of 65536 * 32 = 2097152. 21 bits are needed to express that. If you'd clip that down so it'd fit into a 16-bit range, you'd get more distortion than you could bear. But on the other hand, this example with two sine waves playing at the same frequency, in the same phase, and at maximum volume was an extreme - in reality you seldom do that kind of things. It is very likely that the values of the mixed samples are so different - one may be negative, while another one is positive - that they don't exceed the 16-bit limits when they are added, especially since you never play samples at maximum volume. However, in case of normal music with lots of complex waves being mixed, clipping would occour on completely unpredicable occasions, probably pretty often - and one clipped sample point is one too many. So, this way of mixing samples doesn't seem very good. Let's have a look at another one.

The dividing method

Here it is: after adding all the sounds together, divide the resulting values by the number of mixed sounds. This way, clipping can never occour - the result of the division will always fits into a 16-bit number. For example, put four identical waves on top of each other, each with an amplitude of 65536. The amplitude of the resulting mix is 4 * 65536 = 262144, so you can see very clearly why the amplitude of the mix should be divided by 4 to get it back in range.

Dividing has got one big drawback: each mixed sample involved loses a bunch of bits. When the result of the previous example is divided by four, it has the same effect as if each original sample only had an amplitude of 16384 (4 * 16384 = 262144 / 4 = 65536). This means that each of the four mixed samples only have 14 bits each (2^14 = 16384). If you'd do this with 32 samples, each sample could have a maximum amplitude of 2048 (65536 / 32), which means 11 bits (2^11 = 2048). You never get any clipping, no matter what kind of samples you have, but instead only 11 bits are used of your cool 16-bit samples. The rest (the 5 lowest bits) are scaled out by the final division that puts the mix back into a 16-bit range.

How necessary are those bits? The lowest bits of a binary number are called LSB - least significant bits. When talking about digital audio, they are far from LS, they are "VS" - very significant. The more bits you have, the more precision and accuracy do you get, which means that the quantizing or rounding errors gets smaller and the sample sounds more like the original. Quantizing errors caused by too few bits are audible as noise - that's why 16-bit samples sound cleaner than 8-bit ones. The lack of bits is usually very evident in sounds that have pretty silent parts, like the decay or fade-out of a piano note, or something like that. As an example I'll use a continuous sine tone that fades from maximum amplitude to silence. In the beginning of the fade, 65536 levels are used to describe the sine wave, but as the amplitude decreases, the number of levels decreases too, so the quantizing errors become bigger and bigger. In the very end, only two or three levels are left, and with the values -1, 0 and 1, you can't describe a very good sine wave. It's not a plain sinewave anymore, it gets lot of overtones = harmonic distortion = noise. When only two levels are left, -1 and 0, it's nothing but a square wave anymore, and a square wave consists of an infinite number of harmonics, and does not sound like a sine wave at all. When the volume decreases even more, the square wave becomes less and less square, and soon there's just a few peaks left - nothing but noise. Here's a simulated sine tone that fades from maximum amplitude to zero in an ideal manner, with a few points magnified so you can see how the sine wave gets more and more mistreated as the number of levels decrease and the quantizing becomes rougher:

Graph of fading sine tone

This is natural. It always happens when you store and play samples digitally using a linear scale, without any companding (compressor-expander) system. But the catch is that if you have 16 bits, the volume can become very low before this distortion starts happening, so it'll be so quiet that no one except Neil Young can hear it. If you have less bits to play with, the wave starts getting distorted much earlier, at a much higher volume, so all this noise becomes audible, even for us normal mortal human beings.

A compromise

So, neither of the two methods described above are perfect. If you clip, you don't lose any bits but get distortion, and if you divide, you don't get distortion but you lose bits instead. So, the best way to go would be to combine these two methods. A bit of clipping can always be done without any bigger clipping risk, since all samples seldom are in phase and they don't play at maximum volume and velocity, and a little dividing doesn't distroy the sound completely, the effect is hardly noticable when several instruments are playing at a reasonable volume.

EMU details

The EMU employs a variation of this combined method to produce a 16-bit signal. Exactly how much it cuts and clips is hard to tell, and in practice it's even more difficult to know because of the Equalize() preprocessing routine, which increases the volume of silent parts and evens out certain peaks, making testing a bit harder.

Basically, the three lowest bits of a sample won't get through directly (given that all volumes are set to max). They are not left out completely since the EMU's internal data paths are wider than 16 bits, but they really disappears when the amplitude of a signal falls below 16, and no sound is outputted. This is around 72dB below full amplitude of a sample. This means that the end of long fade-outs will get cut out. Since you never play your samples at maximum volume and amplitude, even more bits are left out. However, reverb and chorus lowers the threshold somewhat since they increase the volume of the sound slightly.

The worst part of this bit-kicking is that samples that don't rotate perfectly around zero, but have a slight DC offset, which creates a really weird effect. Unfortunately most sounds seem to cause these problems, perhaps because of the code that all samples are prepared with, or perhaps because of a slight bug in the EMU code. The catch is that when the levels are getting sort of few when a sample decays, even a slight offset error will cause the signal to become very asymmetric, and thus have a lot of harmonics/noise added. The noise can for example sound like pulse width modulation, clipping, and stuff like that. In the extreme end, the signal does not only because asymmetric. If let's say a sine wave does not go from 10 to -10, but from 13 to -7 because of the offset error, only the positive peaks will reach over the 3-bit threshold, causing a positive peak in the output, while the negative peaks don't get through at all. So what you get is a short peak at half the frequency of the original wave. Noise. Digital noise. Ugly noise. Without the offset bug the bit chopping would be as bad. In any case, this is why silent fade outs sound so incredibly bad when played through the EMU. Reverb trails are often so silent that they only play around in the lowest bits, bringing up a lot of this type of noise caused by the symmetry and threshold problems. That's the digital "waves on the beach" noise effect I've been talking about for quite a while. (Not until recently did I manage to find out what the hell is exactly going on. All-digital experiments made it all very clear).

Back to the numbers. Given that the digital bass/treble EQ is disabled, a full-amplitude wave at max volume reaches up to a little less than 1/5 of the maximum full-range 16 bit amplitude - more precisly speaking it's something between 1/4.5 and 1/4.7, depending on the frequency - the digital EQ can't be disabled totally. Now, a division by 4.6 is in theory the same thing leaving out 2.2 bits. My tests show that 3 bits are left out which would mean that a division by 8 is done, and it very clearly isn't, since it would allow 8 full-amp waves to play without clipping. However, bits and floating point stuff don't fit very well together, so it's very likely that a 2.2-bit kick equals a 3-bit one. In any case, everything that exceeds the limit of 4.6 full-amplitude waves gets cut off. If you'd manage to get 32 sine waves trigged in phase (impossible thanks to bad driver code), you'd get very nice distorition, and the output would be more like a square-wave than anything else.

How bad is this?

So much for numbers and theory. I guess you are more interested in what this does to your music. The result of having this many LSBs cut out is that all voices sink down towards the noise floor drastically, and silent stuff drowns in noise and distortion caused by quantizing, offset/symmetry errors, lousy linearity, and other nasty things. Take an open hihat or a crash cymbal, for instance. They simply won't sound right, when played at a volume which will fit into your music, simply because lots of necessary bits are missing and the ones that are left are screwed up.

And what about the clipping? As long as you only use normal volume settings and don't hit lots of peaky voices simultaneously, you can have all 30 voices play without clipping. However, as soon as you blast a little, you'll see what hardware-clipping is like. "SYMPHONY.MID", one of the demo songs that came with the AWE, has got exceptionally high volumes, and the clipping can clearly be heard by ear if you just know what to listen for. And that's just a demo song.

The dynamics suffer terribly from this. You simply don't have the headroom that some music styles require. If you keep your forte stuff so quiet that it won't clip, your pianissimo parts will get dangerously close to the noise floor, and not even the best DAC in the world can resuce the rudely removed bits, because this noise is not just analog, it's also mathematical. Pro gear do not have 20-bit DACs just because it's cool, in fact there is a very good reason for it. Even some PC sound-cards (naturally some Turtle Beach model, what else, plus X-Technologies' TopWave series) have 18-bit DACs to make the situation slightly better.

CD and DAT only have 16 bits, why would I need more?

You may ask, what the heck is so great about a 18, 20 or even 24-bit DAC, as long as the output is placed on a CD which only has got 16 bits anyway? The catch is that if you mix the analog signals of several synthesizers and instruments before they are placed on the CD, or want to tweak the signal in some effects processor, equalizer, or anything, good SNR and dynamics are necessary.

An example of processing that often is done to music before it's put on 16-bit is volume maximizing and boosting. The point is that there always is quite a lot of unused headroom in music - only a few blasts reach up to the 16 bit max, and if you use synths or sound cards that refuse to do clipping unlike the AWE, your music maybe never reaches even close to this max, so there is a lot of free dynamic space to be used. On classical music and such stuff, a mere increase of volume is done, which takes the highest peaks to the max, but not the slightest bit higher. On popular music a process called boosting is usual. It increases the volume so much that most of the music continuously is very close to the max, and the peaks that normally would get clipped are damped. This gives a thicker and more powerful sound, and lifts up the silent parts from the noise floor that a 16-bit digital audio medium like the CD is bound to have. It is done by running the sound through a compressor and/or limiter which does not do a constant, arithmetical increase of volume, but a smooth and dynamical processing that attenuates peaks and lifts up more silent parts. In case there are no more than 16 bits on the input signal, the noise may rise dangerously high. So, wether you need more than 16 bits or not depends on what you want to do with the signal. For a direct digital transfer to CD, you can do fine with 16 bits all the way, but if you want to do any processing at all (boosting, equalizing, just anything..), 16 bits won't do.