The EMU 8000 and the AWE -
the beauty and the beast
It has been claimed that the AWE's synth chip is the same one
that is used in E-Mu's latest pro-level samplers. Even if this is true,
the AWE is far from professional. Read this and find out why.
Written by Mathias C. Hjelt
V3.0 - 18 Sep 95
PREFACE
This is a replacement for the old "EMU-8000 - Professional or not?" page that previously was
found here. Big parts are the same, but idea behind the whole page has changed, it approaches
the problem from another point of view - you can see that if you compare the old and the new title.
INTRODUCTION
Anyone should understand that - a sound-card that is this cheap can't be as good as a
sampler that is dedicated to professional musicians and that costs ten times more. Still,
many seem to believe it.
On the other hand, some people seem believe that the AWE's built-in General MIDI sounds are
lousy only because the EMU-8000 isn't capable of producing better sound. Wrong again.
Anyone could try pressing 128 sounds and a drum kit into one megabyte of memory and see
what it is sounds like. It's a matter of space, not performance.
However, the performance of the AWE's synth is limited. Badly. On this page, I've described the
various terrible drawbacks of what CL so proudly calls "Advanced WavEffects synthesis". But first you
have to know a little about how synth chips work in general, in order to be able to understand where all these
problems come from.
A BEAUTY AND A BEAST? REALLY?
For a while, I believed that the EMU-8000 was a very cheesy synth, something that perhaps was
based on E-Mu's real chips but had been modified heavily to fit Creative Lab's requirements
regarding production costs and such quality-tearing matters. However, although nothing has been
proved, I have now changed oppinion on this. I believe that the EMU-8000 would behave in much
more professional ways if it was not treated like a caged pig.
The catch is that a synth chip alone is as dumb as a computer without a program. A synth chip
needs something that tells it what to do - what sounds to play, when to play them, how loud,
what pitch, etc. Without such a companion, it does nothing at all, it does not know what to do.
In normal synthesizer keyboards and such devices, there is a microprocessor or micro-controller
that handles this, and several sound-cards and all Waveblaster-compatible boards for PC have
built-in processors that do it. On the AWE, there's no such processor - your PC is the master.
To reduce costs, Creative Labs decided not to put in a processor on the AWE. The result is that
the EMU won't respond to music sent to the MPU-401 port like other General MIDI cards do - simply because
it can't. It can't, because it does not understand MIDI information, which is what the MPU port
delivers. We all know what this means - many games won't work with the AWE although they work with
many other wavetable cards and Waveblaster-compatible daughter-boards.
This does also mean that the driver that the PC is running (the Windows driver, AWEUTIL, or any DOS
program using CL's API) has got to keep track of a lot things. It reads MIDI data, checks out which
samples to use for the current patches, determines volumes, pitches and stuff, and sends short commands
to the EMU which then handles the practical things - it reads samples from memory and plays them with
the pitch, volume and other settings demanded by the driver.
It is difficult to know how big parts of the work is done by the PC, and how much the EMU-8000 can
do itself, but it seems like the driver does everything except for the actual playback of the invididual
samples. This means that the reliability of various functions that the SBK format provides is equal
to the reliability of the driver - if the driver works perfectly, the patches sound perfectly, if
the driver doesn't do its duties right, the.. well, we'll come to that.
To put all that in a nutshell - many of the things that makes the AWE useless for demanding musicians
is not caused by the EMU-8000, but by the drivers. And that explains the title for this whole page,
that stuff about the beauty and the beast - the power and the partly unknown beauty of the EMU is imprisoned
by the rough and dirty things that make up the rest of the AWE, the beast. (Would this be the concept for
a new-age Disney hit? That's a joke, folks)
SO, WHAT'S WRONG WITH IT?
Several things. Let's start with some of the more embarrassing stuff to humiliate ourselves a bit.
It's better to face the facts than to live in an imaginary euphoria based on vicious self-delusion,
right? (Anyone who doesn't agree and want to discuss this with a philosophical approach is free to
mail me.)
Note: Everything I write here is very likely true - anyone can try it out and experience it themselves.
If something is not completely correct, it is still close to the truth (some of the matters
discussed here can't be known for sure without access to things like the micro-code of the EMU and
stuff like that, so it's only based on careful observations and experiments). I do not want to give the
AWE an unfair negative reputation (why would I, I have one myself and I like it), I just want to make
the truth known - many have noticed that EMU is funny, and have thanked me for giving the explanation for
it all on my pages. So please stay cool. This is what I'm going to tell you about the EMU being Creative's
slave:
- that it can't play stereo samples
- that it can't play multi-layered samples correctly
- that it can't play any samples correctly
- that it only has got 30 voices (polyphony), not 32
- that it makes smooth looping unnecessarily difficult
- that it doesn't use more than 13 bits of your 16 bit samples, clips, adds digital noise, and much more..
..but honestly, it's nothing to get upset about. it's just a PC card, right?
STEREO AND MULTI-LAYERING - OH YEAH?
Right, the EMU 8000 is a stereo synthesizer, with panning, stereo effects and all, but
under the command of the present controlling system (the beast), it can't play stereo samples
correctly. Not as long as you have to create them through multi-layering, i.e having one layer panned to
the left, another to the right. The problem is very simple but hard to accept (you DID buy a
stereo card, didn't you?) and really annoying (- this particular card IS famous for its
multi-layering and stacking capabilities, isn't it?). The catch is that the various samples
of a multi-layered patch do not start playing simultaneously when a note is triggered, and this
leads to nasty phase errors between the layers. And the worst thing of all is that the phase
error isn't predictable - it's totally random. Every time you trigger a note, you get a new phase
offset for each layer, and a new sound.
This is caused by sloppy driver code in combination with a simple instrument definition standard. When
the driver causes a delay between the trigging of two layers, the EMU simply can't be held responsible for
the phase errors that occour - it doesn't know that anything's wrong, and happily plays the layers out of
sync. Another question is - how on earth can the driver do things like this, in any case? I mean, you
wouldn't expect people to write drivers in BASIC, right?
What's the audible result of this? It depends on how the layers are configured (panning,
volume, etc), but usually this results in frequencies (or even whole layers, if you layer
several similar samples) being more or less cancelled out. The timbre of the sound changes
slightly, and the volume can vary drastically, especially with doubled samples. In a left-right
panned stereo patch, the sound can seem to appear on different places in the stereo field every
time it is triggered, although it ought to stay centered. All this sounds rather sad, doesn't it?
Then think about all the cool things about it: your music becomes less robotic, it gets an
unpredictable element added to it, panning is no more as strict as it could be, and sounds don't
always sound the same. Now isn't that exciting?
But don't get desperate - according to Creative, our hero and saviour, a new SoundFont standard is
about to be released. The new drivers already support it, and along with various new features
it has got a new way of defining stereo samples which will fix this problem. Atleast that's what
they say.
GOODBYE, ATTACKS
Now that we know that the AWE can't do multi-layering and stereo samples correctly, we can continue making
the day even worse (nothing is ever so bad it can't get worse, right?). To accomplish this, let's
think about what it actually means that all samples always have a 10ms attack (fade-in) time,
even when the amplitude evelope's attack time is zero.
It is very obvious that a fade-in delay like this will cause short attacks and transients in the beginning
of samples to be either left out or just more or less damped. In either case the sound will lose a part of
its original punch or sharpness. Especially percussion samples are in the danger zone. Melodic sounds will
just get a few cycles cut out, and that's not very critical, but stuff like a hihat or a real punchy kick
will suffer. Prevent this by putting a padding space of a some 300-600 zero samples in the beginning
of all samples and get one hell of a trouble with timing (7..14 ms delay).
This forced attack time was probably implemented to avoid the risk of having pops and clicks in the
beginning of samples that do not start at zero level. Not only the attack is smoothed, it's the same
thing with the release envelope parameter. Not even a zero setting makes the sound stop immediately
when you release they key - there's a 10ms delay here too. It makes the EMU foolproof but stupid.
"AWE Tech - a look under the cover".
The only
poor consolation we have is that the same operators can used for both DRAM refresh and FM passthrough
- we should be happy that we didn't lose four ops, only two!
No, don't say that 30 voices is more than enough. All you need to do is to play a few cool cords
using a few multi-layered (did you hear that? multi-layered! ha!) patches, and poof - you're
running out of polyphony. Let's have a band enter the stage. First we have a Hammond B-3 player.
With all those rumblings and funky cords, he needs maybe 5 notes, and since the hammond patch
happened to be multi-layered, he eats up 10 voices. Another person plays the piano with a lot
of sustained stuff, and easily consumes 8 voices just like that. Strings or other synth pads,
ah well, let's say 6 voices (multi-layering is fun, just like playing chords). Sax lead, one
voice (for those occasions when you have to let two tones overlap to get the right growl).
Vast percussion section, 5 voices. Now, where the heck do you think you're gonna put the bass
player? Two more voices could very valuable, no question about it.
Since the EMU (or the driver) has got some pretty smart way of determining what do when it is running
out of voices, it wasn't easy to confirm that these 2 voices really are missing, but after a while
I succeeded - and there were indeed only 30 voices left. Once again, sad but true.
WHY 16 BITS NEVER ARE 16
Have you ever wondered why your samples don't sound quite right when played through the EMU? Have
you ever noticed that they become more noisy? Then read this. Read it even if you haven't found
anything to complain about, this is just a piece of common digital audio theory which you may
find interesting.
This is something that isn't specific for the AWE/EMU. I guess most "16-bit" wavetable / sample
playback cards operate this way, simply because there is no way around it. It's about maths and bits.
To make it as comprehensible as possible for as many as possible (even for those who don't quite
know what a "bit" is mathematically speaking), I'll first give a few general examples, and then get
into the AWE/EMU details. So this is not only about why the EMU sucks, it's about sample mixing
in general.
Some theory..
Since sound usually is considered an undulatory wave motion, it can be described as a function over
time. In a graph, the X-axis is time, and the Y-axis shows the level. A sampled sound has got a
finite number of "Y"-levels that can be used, and the number of bits determines how many they are.
A 16-bit sample has got 65536 different levels, since (2^16 = 65536). During the digitizing process,
the level of the incoming signal is quantized, rounded, to the closest possible of these 65536 values,
and a 16-bit value describing the current level is achieved, and this is repeated 44100 times/second
if a sampling rate of 44.1kHz is used. Since signed numbers are used to describe the waveform, a scale
that goes from -32768 to 32767 is used (click
if you wonder why), so a graph of a sine wave with maximal peak-to-peak amplitude (65536)
would look like this:
Now imagine you want to play two such waves at the same time - you want to mix them (in this case
we only play with sine waves of the same frequency, phase and amplitude, to make it easier, but you
could imagine that one of the waves is a piano and the second is a snare or whatever..). That is done
simply by adding them, that is, the value ("level") of each sample point in wave 1, is added to the value of the
corresponding sample point of wave 2. This is the result, simply a sinewave that has got an amplitude
that is twice that of the two original waves:
The important thing is that the amplitude has been now doubled, it's now 65536 + 65536 = 131072. To
express this large a value, we need 17 bits (2^17 = 131072). And this is where we run into trouble.
We want to play this sound through or 16 bit DAC, but that's impossible, because we're playing with
too large numbers. Somehow, we have to squeeze it down.
The clipping method
If we just try to limit the values of our doubled wave, by clipping off everything that exceeds the
16-bit range, we get a wave that looks like this:
It does indeed fit into a 16-bit DAC, but sounds like crap. When you do that to a sound, you add
an infinite number of harmonics to it, and bascially that's just what distortion is all about. That's
what you do to your guitar sound to get that killer sound. (If you don't want a killer sound but a smooth
fuzz or tube-like overdrive, you'll have to do smooth clipping, but that's a whole other story). In any
case, this is definitely not what you want on regular music.
In the example above, only two 16-bit samples were mixed. Imagine what it'd be like if you'd mix
32 sounds - you'd get a wave with a maximal amplitude of 65536 * 32 = 2097152. 21 bits are needed
to express that. If you'd clip that down so it'd fit into a 16-bit range, you'd get more distortion
than you could bear. But on the other hand, this example with two sine waves playing at the same
frequency, in the same phase, and at maximum volume was an extreme - in reality you seldom do that
kind of things. It is very likely that the values of the mixed samples are so different - one may
be negative, while another one is positive - that they don't exceed the 16-bit limits when they
are added, especially since you never play samples at maximum volume. However, in case of normal music
with lots of complex waves being mixed, clipping would occour on completely unpredicable occasions,
probably pretty often - and one clipped sample point is one too many. So, this way of mixing samples
doesn't seem very good. Let's have a look at another one.
The dividing method
Here it is: after adding all the sounds together, divide the resulting values by the number of mixed
sounds. This way, clipping can never occour - the result of the division will always fits into a 16-bit
number. For example, put four identical waves on top of each other, each with an amplitude of 65536.
The amplitude of the resulting mix is 4 * 65536 = 262144, so you can see very clearly why the amplitude
of the mix should be divided by 4 to get it back in range.
Dividing has got one big drawback: each mixed sample involved loses a bunch of bits. When the result of the
previous example is divided by four, it has the same effect as if each original sample only had an
amplitude of 16384 (4 * 16384 = 262144 / 4 = 65536). This means that each of the four mixed samples
only have 14 bits each (2^14 = 16384). If you'd do this with 32 samples, each sample could have a
maximum amplitude of 2048 (65536 / 32), which means 11 bits (2^11 = 2048). You never get any clipping,
no matter what kind of samples you have, but instead only 11 bits are used of your cool 16-bit samples.
The rest (the 5 lowest bits) are scaled out by the final division that puts the mix back into a 16-bit
range.
How necessary are those bits? The lowest bits of a binary number are called LSB - least significant
bits. When talking about digital audio, they are far from LS, they are "VS" - very significant.
The more bits you have, the more precision and accuracy do you get, which means that the quantizing
or rounding errors gets smaller and the sample sounds more like the original. Quantizing errors
caused by too few bits are audible as noise - that's why 16-bit samples sound cleaner than 8-bit
ones. The lack of bits is usually very evident in sounds that have pretty silent parts, like the decay
or fade-out of a piano note, or something like that. As an example I'll use a continuous sine tone
that fades from maximum amplitude to silence. In the beginning of the fade, 65536 levels are used
to describe the sine wave, but as the amplitude decreases, the number of levels decreases too, so
the quantizing errors become bigger and bigger. In the very end, only two or three levels are left,
and with the values -1, 0 and 1, you can't describe a very good sine wave. It's not a plain sinewave
anymore, it gets lot of overtones = harmonic distortion = noise. When only two levels are left, -1 and
0, it's nothing but a square wave anymore, and a square wave consists of an infinite number of harmonics,
and does not sound like a sine wave at all. When the volume decreases even more, the square wave
becomes less and less square, and soon there's just a few peaks left - nothing but noise. Here's
a simulated sine tone that fades from maximum amplitude to zero in an ideal manner, with a few points
magnified so you can see how the sine wave gets more and more mistreated as the number of levels decrease
and the quantizing becomes rougher:
This is natural. It always happens when you store and play samples digitally using a linear scale, without
any companding (compressor-expander) system. But the catch is that if you have 16 bits, the volume can
become very low before this distortion starts happening, so it'll be so quiet that no one except Neil Young
can hear it. If you have less bits to play with, the wave starts getting distorted much earlier, at a much
higher volume, so all this noise becomes audible, even for us normal mortal human beings.
A compromise
So, neither of the two methods described above are perfect. If you clip, you don't lose any bits but
get distortion, and if you divide, you don't get distortion but you lose bits instead. So, the best way
to go would be to combine these two methods. A bit of clipping can always be done without any bigger
clipping risk, since all samples seldom are in phase and they don't play at maximum volume and velocity,
and a little dividing doesn't distroy the sound completely, the effect is hardly noticable when several
instruments are playing at a reasonable volume.
EMU details
The EMU employs a variation of this combined method to produce a 16-bit signal. Exactly how much it cuts
and clips is hard to tell, and in practice it's even more difficult to know because of the Equalize()
preprocessing routine, which increases the volume of silent parts and evens out certain peaks, making testing
a bit harder.
Basically, the three lowest bits of a sample won't get through directly (given that all volumes are
set to max). They are not left out completely since the EMU's internal data paths are wider than 16 bits,
but they really disappears when the amplitude of a signal falls below 16, and no sound is outputted. This
is around 72dB below full amplitude of a sample. This means that the end of long fade-outs will get cut out.
Since you never play your samples at maximum volume and amplitude, even more bits are left out. However,
reverb and chorus lowers the threshold somewhat since they increase the volume of the sound slightly.
The worst part of this bit-kicking is that samples that don't rotate perfectly around zero, but have a slight
DC offset, which creates a really weird effect. Unfortunately most sounds seem to cause these problems, perhaps
because of the code that all samples are prepared with, or perhaps because of a slight bug in the EMU code.
The catch is that when the levels are getting sort of few when a sample decays, even a slight offset error
will cause the signal to become very asymmetric, and thus have a lot of harmonics/noise added. The
noise can for example sound like pulse width modulation, clipping, and stuff like that. In the extreme end, the
signal does not only because asymmetric. If let's say a sine wave does not go from 10 to -10, but from 13 to -7
because of the offset error, only the positive peaks will reach over the 3-bit threshold, causing a positive
peak in the output, while the negative peaks don't get through at all. So what you get is a short peak at half
the frequency of the original wave. Noise. Digital noise. Ugly noise. Without the offset bug the bit chopping would
be as bad. In any case, this is why silent fade outs sound so incredibly bad when played through the EMU. Reverb
trails are often so silent that they only play around in the lowest bits, bringing up a lot of this type of noise
caused by the symmetry and threshold problems. That's the digital "waves on the beach" noise effect I've been talking
about for quite a while. (Not until recently did I manage to find out what the hell is exactly going on. All-digital
experiments made it all very clear).
Back to the numbers. Given that the digital bass/treble EQ is disabled, a full-amplitude wave at max volume reaches
up to a little less than 1/5 of the maximum full-range 16 bit amplitude - more precisly speaking it's something between
1/4.5 and 1/4.7, depending on the frequency - the digital EQ can't be disabled totally. Now, a division by 4.6 is in
theory the same thing leaving out 2.2 bits. My tests show that 3 bits are left out which would mean that a division by
8 is done, and it very clearly isn't, since it would allow 8 full-amp waves to play without clipping. However, bits
and floating point stuff don't fit very well together, so it's very likely that a 2.2-bit kick equals a 3-bit one.
In any case, everything that exceeds the limit of 4.6 full-amplitude waves gets cut off. If you'd manage to get 32 sine
waves trigged in phase (impossible thanks to bad driver code), you'd get very nice distorition, and the output would be
more like a square-wave than anything else.
How bad is this?
So much for numbers and theory. I guess you are more interested in what this does to your music.
The result of having this many LSBs cut out is that all voices sink down towards the noise floor
drastically, and silent stuff drowns in noise and distortion caused by quantizing, offset/symmetry errors,
lousy linearity, and other nasty things. Take an open hihat or a crash cymbal, for instance. They simply
won't sound right, when played at a volume which will fit into your music, simply because lots of
necessary bits are missing and the ones that are left are screwed up.
And what about the clipping? As long as you only use normal volume settings and don't hit lots of peaky
voices simultaneously, you can have all 30 voices play without clipping. However, as soon as you
blast a little, you'll see what hardware-clipping is like. "SYMPHONY.MID", one of the demo songs that
came with the AWE, has got exceptionally high volumes, and the clipping can clearly be heard by ear if
you just know what to listen for. And that's just a demo song.
The dynamics suffer terribly from this. You simply don't have the headroom that some music styles
require. If you keep your forte stuff so quiet that it won't clip, your pianissimo parts will get
dangerously close to the noise floor, and not even the best DAC in the world can resuce the rudely
removed bits, because this noise is not just analog, it's also mathematical. Pro gear do not have
20-bit DACs just because it's cool, in fact there is a very good reason for it. Even some PC sound-cards
(naturally some Turtle Beach model, what else, plus X-Technologies' TopWave series) have 18-bit DACs to
make the situation slightly better.
CD and DAT only have 16 bits, why would I need more?
You may ask, what the heck is so great about a 18, 20 or even 24-bit DAC, as long as the output is
placed on a CD which only has got 16 bits anyway? The catch is that if you mix the analog signals of
several synthesizers and instruments before they are placed on the CD, or want to tweak the signal
in some effects processor, equalizer, or anything, good SNR and dynamics are necessary.
An example of processing that often is done to music before it's put on 16-bit is volume maximizing and
boosting. The point is that there always is quite a lot of unused headroom in music - only a few blasts reach
up to the 16 bit max, and if you use synths or sound cards that refuse to do clipping unlike the AWE, your music
maybe never reaches even close to this max, so there is a lot of free dynamic space to be used. On classical
music and such stuff, a mere increase of volume is done, which takes the highest peaks to the max, but not the
slightest bit higher. On popular music a process called boosting is usual. It increases the volume so much
that most of the music continuously is very close to the max, and the peaks that normally would get clipped
are damped. This gives a thicker and more powerful sound, and lifts up the silent parts from the noise floor
that a 16-bit digital audio medium like the CD is bound to have. It is done by running the sound through a
compressor and/or limiter which does not do a constant, arithmetical increase of volume, but a smooth and
dynamical processing that attenuates peaks and lifts up more silent parts. In case there are no more than
16 bits on the input signal, the noise may rise dangerously high. So, wether you need more than 16 bits or
not depends on what you want to do with the signal. For a direct digital transfer to CD, you can do fine with
16 bits all the way, but if you want to do any processing at all (boosting, equalizing, just anything..),
16 bits won't do.