VOGONS


Mp3 decoder

Topic actions

First post, by nocash

User metadata
Rank Newbie
Rank
Newbie

Hello vogons forum, I've made an mp3 decoder in 80x86 asm code aimed at old hardware. The problem is that I have no idea if it's anywhere faster (or slower) than other mp3 decoders. I would be glad if somebody could run the built-in benchmark /test feature on a 80486, benchmarks for Pentium 1 or 80386 would be also interesting.

The source code and win32 executables can be found here:
http://problemkaputt.de/mp3.htm

It's currently win32 only. I would consider making a DOS version if the benchmarks on old windows PCs suggest that it could work fast enough on old DOS computers. I got told that the win32 executable could also work using things like HXDOS without needing windows.

Reply 2 of 33, by wbahnassi

User metadata
Rank Oldbie
Rank
Oldbie

386 and win32? I think that's a bit of a stretch. I guess someone can try it with Win32s on Windows 3.11, as I doubt Win95 will allow the 386 to breath let alone run your program.

Turbo XT 12MHz, 8-bit VGA, Dual 360K drives
Intel 386 DX-33, TSeng ET3000, SB 1.5, 1x CD
Intel 486 DX2-66, CL5428 VLB, SBPro 2, 2x CD
Intel Pentium 90, Matrox Millenium 2, SB16, 4x CD
HP Z400, Xeon 3.46GHz, YMF-744, Voodoo3, RTX2080Ti

Reply 3 of 33, by debs3759

User metadata
Rank Oldbie
Rank
Oldbie

Nice! What assembler did you use? (I normally use nasm, as it's free and can run in most OS, but sure I could easily convert it)

See my graphics card database at www.gpuzoo.com
Constantly being worked on. Feel free to message me with any corrections or details of cards you would like me to research and add.

Reply 4 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

The assembler is borland TASM, but using microsoft MASM syntax.
If you want to play around with the source code, do you have an idea how to produce stable waveout playback?

The v1.0 version playback could run into nested callbacks on slow PCs (with less than 800MHz or so).
I have fixed that problem in v1.1, and it's working fine for me (I am using win98), but the v1.1 playback seems to hang for everyone else on newer PCs.
The v1.2 update contains not less than 3 executables supposed to fix that problem, but none of them work on newer PCs.

I don't have much of a clue what is wrong there, it could be an out-of-memory error when calling waveout, but could such things really happen with eight small 4.5Kbyte blocks? My best (or wildest) guess is that the last block might end in an incomplete 4Kbyte page, causing memory locking to fail.

Anyways, the /test switch for the benchmark test works even on PCs that hang during waveout playback.

wbahnassi wrote on 2024-09-08, 22:00:

386 and win32? I think that's a bit of a stretch. I guess someone can try it with Win32s on Windows 3.11, as I doubt Win95 will allow the 386 to breath let alone run your program.

Do you mean HXDOS won't work on 386, or is it more difficult to install than windows?
Running the program shouldn't be a problem, mp3 decoding is just number crunching, and opcodes don't execute slower in windows than dos.

Reply 5 of 33, by wbahnassi

User metadata
Rank Oldbie
Rank
Oldbie
nocash wrote on 2024-09-09, 16:46:

Do you mean HXDOS won't work on 386, or is it more difficult to install than windows?
Running the program shouldn't be a problem, mp3 decoding is just number crunching, and opcodes don't execute slower in windows than dos

You suggested trying the program on a 386, and you provided a Win32 executable, so that only runs on Win9X or Win3.1+Win32s. Win9X is too heavy on a 386, as it is multitasking probably more programs than Win3.1. So that can leave even less cycles for your program to crunch numbers, resulting in more stuttering. It is best to make a pure DOS program if you like to target slower systems like a 386. This has nothing to do with opcode perf on one OS vs. another.

Turbo XT 12MHz, 8-bit VGA, Dual 360K drives
Intel 386 DX-33, TSeng ET3000, SB 1.5, 1x CD
Intel 486 DX2-66, CL5428 VLB, SBPro 2, 2x CD
Intel Pentium 90, Matrox Millenium 2, SB16, 4x CD
HP Z400, Xeon 3.46GHz, YMF-744, Voodoo3, RTX2080Ti

Reply 6 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

Using windows on a 386 was your idea, not mine.
If you really don't understand what I am talking about, you can read more about HXDOS here: https://www.japheth.de/HX.html

Reply 7 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

I have meanwhile released a few more updates, and added more mp3 benchmarks from people with faster PCs on the webpage.
As for older PCs, the oldest working PC that I could find is my 1GHz Pentium III.

The mp3 decoder's CPU load on the Pentium III is between 12MHz (perfect quality) and 4MHz (good quality). If the 80486 opcodes, pipeline, multiplier, cache are perhaps 10x slower (?) then good quality mp3 decoding on 80486 might require about 40MHz (or maybe twice as much or half as much, it's hard to estimate).

Reply 8 of 33, by Tiido

User metadata
Rank l33t
Rank
l33t

My main use 486 is being revived finally. It ended up having bad RAM, right now I'm testing new RAM on it and afterwards hopefully the win95 installation on it isn't corrupt (there was a scandisk with that bad RAM...) and I can give it a go there.

EDIT: That machine has a more serious problem than RAM, the old stick tests fine in another machine and new good stick tests bad in this one...

So I set up another 486 machine based on 100MHz AMD DX4 and 32MB RAM... and that ended up accidentally running Win98SE, with optimisations for a PIII machine I normally use it with... I realised my mistake only when I saw windows boot logo appear and then I was in for an exercise in patience 🤣. After a looooooong waiting time, it managed to install all the new hardware and I could finally run the program although general UI navigation took looooong time, you could see screen updates go by so slowly... Unfortunately the test results are all zeros with nonsensical decode speed.

I then went to HXDOS route, which I should have done from start, and ran the thing again with 133MHz AMD 5x86 and this time timed the execution time and it was near 1 minute and 50 seconds, slower than playback time of the test MP3 file by a large margin.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 9 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

Thanks for the testing efforts!
The 134,472ms decoding time on a pre-pentium 133MHz PC is at least twice as slow as what I had hoped for.
But with the /fast /half /mono switches it should be about 3x faster, which would be almost twice as fast as needed for realtime decoding on that PC.
(Longer reply in https://forums.nesdev.org/viewtopic.php?p=296002#p296002 nesdev forum).

Reply 10 of 33, by Many Bothans

User metadata
Rank Newbie
Rank
Newbie

nocash, I'm playing around with my 486 system this weekend and might be able to get you some benchmarks for the upper end... give me an hour or two.

  • Zenith Z386SX-20, 8MB FPM, Video 7 1024i, Unhoused
  • AOpen AP43, Am5x86-133@160, 1MB L2, 128MB FPM, Stealth III S540 32MB Savage4, SB32
  • ITX-Llama, 3Dfx V3
  • Asus CUV4X-E, P3-933, 512MB PC133, Hercules 3D Prophet II MX 32MB, SB Live!

Reply 11 of 33, by Many Bothans

User metadata
Rank Newbie
Rank
Newbie

Okay, here are the results using the AOpen AP43 in my sig with an AMD Am5x86-P75 (133 ADW) and Cyrix 5x86-100.

Make	Model	Speed	FSB	Multi	Decode (ms)
Cyrix 5x86 120 40 3 40,663
Cyrix 5x86 100 50 2 45,914
Cyrix 5x86 100 33 3 48,979
AMD Am5x86 160 40 4 50,126
AMD Am5x86 150 50 3 50,855
Cyrix 5x86 80 40 2 59,960
AMD Am5x86 133 33 4 63,540
AMD Am5x86 120 40 3 65,610
Cyrix 5x86 66 33 2 72,499
AMD Am5x86 100 33 3 77,355
AMD Am5x86 75 25 3 102,654

The Cyrix 5x86 likes your code better than the AMD.

Nothing exotic on my setup, standard voltage/BIOS settings for 60ns FPM RAM and 15ns WB L2 cache. 120MHz is the best my Cyrix can do, been chasing a 5x86-133 for what seems like decades now.

Result files attached for review...

  • Zenith Z386SX-20, 8MB FPM, Video 7 1024i, Unhoused
  • AOpen AP43, Am5x86-133@160, 1MB L2, 128MB FPM, Stealth III S540 32MB Savage4, SB32
  • ITX-Llama, 3Dfx V3
  • Asus CUV4X-E, P3-933, 512MB PC133, Hercules 3D Prophet II MX 32MB, SB Live!

Reply 12 of 33, by Many Bothans

User metadata
Rank Newbie
Rank
Newbie

Hit the per post attachment limit, here's the balance of my results.

Clean install of MS Win98SE on a CF card, btw.

  • Zenith Z386SX-20, 8MB FPM, Video 7 1024i, Unhoused
  • AOpen AP43, Am5x86-133@160, 1MB L2, 128MB FPM, Stealth III S540 32MB Savage4, SB32
  • ITX-Llama, 3Dfx V3
  • Asus CUV4X-E, P3-933, 512MB PC133, Hercules 3D Prophet II MX 32MB, SB Live!

Reply 13 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

Wow, sweet! Somehow your AMD 5x86-133 is twice as fast as Tiido's (big relief), and most of your processors seem to be fast enough for realtime decoding, without even needing the commandline switches for reduced quality : )

Taking the three slowest ones:

The decoding on Cyrix-66 and AMD-100 seems to be 7-12ms faster than the song duration. Did you test if the audio comes out without stuttering on those processors (just in case the audio playback adds extra cpu/bus load)?
The disk loading takes place before starting playback, so (with enough RAM) the disk loading shouldn't affect the playback time (but it might add a long delay before starting playback, especially when loading longer songs from slow disk drives - if that's a problem then I could change it to on-the-fly loading).

The AMD-75 is a bit too slow, but I guess even that should work comfortably when adding 1-2 commandline switches (either /mono, or /half /fast).

I have no experience with 5x86, my first internet-capable PC was a 80486DX2-66 win win95, it would be neat if the mp3 decoder would work on such hardware, too. Looking at this table, Re: The Ultimate 486 Benchmark Comparison the ALU timings for Cyrix 5x86 seem to be about same as for Intel 80486, so it might work on Intel 80486 chips, too (?)
I am a bit confused because the name 5x86 makes it sound more powerful than 486, but as far as I understand that seems to be true only when it comes to FPU opcodes (which would be no problem here, as I am using only integer maths, with lots of imul opcodes).

PS. What OS did you use for the tests?

Reply 14 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie

PPS. Does your mainboard have unusual upgrades like super-fast RAM and extra caches that won't be found on regular 486 boards?
The 1MB cache looks huge, but - I think - my decoder should be fine with less than that (maybe 256KB for variables including a wastefully huge huffman lookup table).

Reply 15 of 33, by jtchip

User metadata
Rank Member
Rank
Member
nocash wrote on 2024-09-21, 23:11:

I am a bit confused because the name 5x86 makes it sound more powerful than 486, but as far as I understand that seems to be true only when it comes to FPU opcodes (which would be no problem here, as I am using only integer maths, with lots of imul opcodes).

There are 2 5x86s, the AMD Am5x86-P75 is simply a 133MHz DX4 (4x33) so it's just a higher-clocked 486. The Cyrix Cx5x86 is slightly faster per-clock (only at integer instructions) as it's supposedly a cut-down 6x86 but still non-superscalar and on the 486 bus.

Reply 16 of 33, by Many Bothans

User metadata
Rank Newbie
Rank
Newbie
nocash wrote on 2024-09-21, 23:11:

Did you test if the audio comes out without stuttering on those processors (just in case the audio playback adds extra cpu/bus load)?

I only played it a handful of times and there was no audible stutter at 100MHz+ speeds.

nocash wrote on 2024-09-21, 23:11:

The AMD-75 is a bit too slow, but I guess even that should work comfortably when adding 1-2 commandline switches (either /mono, or /half /fast).

Yes, playback was stutter free with /half switch at the 50~75MHz range.

nocash wrote on 2024-09-21, 23:11:

I have no experience with 5x86, my first internet-capable PC was a 80486DX2-66 win win95, it would be neat if the mp3 decoder would work on such hardware, too. Looking at this table, Re: The Ultimate 486 Benchmark Comparison the ALU timings for Cyrix 5x86 seem to be about same as for Intel 80486, so it might work on Intel 80486 chips, too (?)

I have further tested with an Intel 486DX2-50 (and OC'd to 66) and an Intel 486SX-25 (with and without L2 cache, as you'd likely never see an SX paired with 1MB L2 in the wild.)

nocash wrote on 2024-09-21, 23:11:
I am a bit confused because the name 5x86 makes it sound more powerful than 486, but as far as I understand that seems to be tru […]
Show full quote

I am a bit confused because the name 5x86 makes it sound more powerful than 486, but as far as I understand that seems to be true only when it comes to FPU opcodes (which would be no problem here, as I am using only integer maths, with lots of imul opcodes).

PS. What OS did you use for the tests?
PPS. Does your mainboard have unusual upgrades like super-fast RAM and extra caches that won't be found on regular 486 boards?
The 1MB cache looks huge, but - I think - my decoder should be fine with less than that (maybe 256KB for variables including a wastefully huge huffman lookup table).

I think jtchip covered the 5x86 situation. All runs were done on Windows 98 Second Edition on a CF card.

My AOpen AP43 system is not period correct... It has been upgraded to the max fast page mode RAM that would fit on the motherboard and max L2 cache capacity of the SiS 496/497 486 chipset. Some 486 PCI boards based on the UMC UM8881 or ALi M1489/M1487 chipsets can use faster EDO RAM and can support same 1MB L2 cache(but maybe even PBurst?)

Here are the runs with balance of my 486 chips -

Make	Model		Speed	FSB	Multi	Decode (ms)
Intel 486DX2 66 33 2 117,499
Intel 486DX2 50 25 2 156,670
Intel 486SX 33 33 1 223,563
Intel 486SX-NoL2 33 33 1 240,540
Intel 486SX 25 25 1 296,439
Intel 486SX-NoL2 25 25 1 314,215

The 486SX needed /quarter & /mono switches for mostly stutter free playback.

  • Zenith Z386SX-20, 8MB FPM, Video 7 1024i, Unhoused
  • AOpen AP43, Am5x86-133@160, 1MB L2, 128MB FPM, Stealth III S540 32MB Savage4, SB32
  • ITX-Llama, 3Dfx V3
  • Asus CUV4X-E, P3-933, 512MB PC133, Hercules 3D Prophet II MX 32MB, SB Live!

Reply 17 of 33, by Cyberdyne

User metadata
Rank Oldbie
Rank
Oldbie

Windows 3.1 + Winplay does realtime with a fast 486, what am i missing here?

I am aroused about any X86 motherboard that has full functional ISA slot. I think i have problem. Not really into that original (Turbo) XT,286,386 and CGA/EGA stuff. So just a DOS nut.
PS. If I upload RAR, it is a 16-bit DOS RAR Version 2.50.

Reply 18 of 33, by nocash

User metadata
Rank Newbie
Rank
Newbie
Many Bothans wrote on 2024-09-22, 19:31:

Here are the runs with balance of my 486 chips...
The 486SX needed /quarter & /mono switches for mostly stutter free playback.

Many thanks! The 486SX-25 seems to be a bit too slow for stutter-free playback. Did the 486SX-33 stutter, too? The decoding time looks as if it could be stutter-free when using the fastest switch settings (perhaps even with /half rate, without needing to go down to /quarter rate).

I have added the test results for Cyrix, AMD, Intel chips on the webpage. And also calculated the decompression speed in clks/second (based on the decoding time vs song duration vs cpu clock). And added a column with the "cpu load" (decoding time vs song duration), a value bigger than 1.000 means that the cpu is too slow for perfect quality (but it may work with reduced output quality).

The table with the different switch combinations does now also include a "speed" column that indicates how much faster it's getting with each switch combination. Generally, it should work if the "speed" is a bit higher than the "cpu load" from the previous table.

The switch settings can make the decoding about 3x faster. If that isn't enough - and if one want to go through the hazzle to resample the mp3 files before playback - there's now also a table with switch settings for low-quality mp3's. That can be 4x faster with good quality (22kHz output), or 8x faster with low quality (11kHz output).

Cyberdyne wrote on 2024-09-23, 08:47:

Windows 3.1 + Winplay does realtime with a fast 486, what am i missing here?

Depends on what you are up to. The thread topic is testing if the asm decoder works on old hardware, and if it's faster/slower than other decoders. And realtime, yes, it's a realtime decoder, too.

With the 486 benchmarks, I am quite optimistic that the asm decoder could also work on a 386, although I don't know at which quality with how many MHz.

Reply 19 of 33, by analog_programmer

User metadata
Rank Oldbie
Rank
Oldbie

According to this video MPXPLAY for DOS can play 256kbit stereo MP3s on i486DX4-100 in real time almost flawlessly: https://www.youtube.com/watch?v=b0zZpzxHSeM

Here is another video with MPXPLAY and Am486DX2-66 and different quality MP2s, MP3s (problems with higher quality tracks) and FLAC: https://www.youtube.com/watch?v=Zm5s_Le7TV4

I think Am5x86-133 will play 256kbit stereo MP3s in real time with no problems.

As for the Am386DX-40 with coprocessor... maybe something like 8 or 16kbit MP3 in mono. I don't know if there's a coprocessor for the fastest 386-class CPU - UMC Green U5D. So, probably some very low quality MP3s can be played in real time in 386 system, but who will listen to such a cr*ppy tracks?

The word Idiot refers to a person with many ideas, especially stupid and harmful ideas.
This world goes south since everything's run by financiers and economists.
This isn't voice chat, yet some people overusing online communications talk and hear voices.