VOGONS


Duke Nukem 3D on a i80386

Topic actions

First post, by Tronix

User metadata
Rank Member
Rank
Member

Hello,

Today I would like to run Duke Nukem 3D on my Am386-DX40 with 128Kb cache and 8Mb RAM: game console start, display version, display "loading strings" and then I get DOS4GW error: "06h - invalid opcode". CS:EIP contain 0F C8 means BSWAP 486+ instruction. I downloaded Duke3d original source code (very hard to find, now I have lost link again), find all BSWAP instructions in sources and replace them to:

  "rol ax, 8",\
"rol eax, 16",\
"rol ax, 8"\

I am using Watcom 11.0 compiler. I recompile Build engine and game with /3r compiler options instead /5r.

/3r - (wcc386/wpp386 only) generate 386 instructions based on 386 instruction timings, and use register-based argument-passing conventions
/5r - (wcc386/wpp386 only) generate 386 instructions based on Intel Pentium instruction timings, and use register-based argument-passing conventions (default)

Then i bind new DOS-extender DOS32A DOS32A to EXE file.

So, game started normaly on my 386. Slow, about ~5 fps with high details, shadows and full screen, but worked. If anyone interested, you can download my pre-compile EXE from here: Duke Nukem 3D v1.4/i386 - Atomic Edition
UPD: link broken, i attach precompiled Duke3d to this message.

PS: UPD: find source code at http://legacy.3drealms.com/downloads.html

Last edited by Tronix on 2018-01-30, 06:26. Edited 1 time in total.

https://github.com/Tronix286/

Reply 1 of 34, by noshutdown

User metadata
Rank Oldbie
Rank
Oldbie

5fps in duke3d on a 386 is awesome, i got 1.5fps in doom on 486dx-25.

Reply 2 of 34, by HighTreason

User metadata
Rank Oldbie
Rank
Oldbie

That's awesome! So awesome that I posted it over at Duke4.net forums because there was a discussion there about the lowest spec the game would run on a while back.

Oh, I keep a mirror of 3D Realms' FTP, so if the new owners (Bunch of poseur dickheads) ever take those pages down again, it's in my sig. Edit: It was meant to be anyway. It is now.

Last edited by HighTreason on 2015-06-13, 13:29. Edited 1 time in total.

My Youtube - My Let's Plays - SoundCloud - My FTP (Drivers and more)

Reply 3 of 34, by leileilol

User metadata
Rank l33t++
Rank
l33t++

I would think the CON system would have a large overhead, makes me wonder how Shadow Warrior could perform or maybe just the build editor 3d view

apsosig.png
long live PCem

Reply 4 of 34, by Tronix

User metadata
Rank Member
Rank
Member
leileilol wrote:

makes me wonder how Shadow Warrior could perform

I tried to recompile Shadow Warrior, but have same problem https://forums.3drealms.com/vb/showthread.php?t=16657. I don't registered at 3drealms forums and I can't download attached htfile.zip. If anyone registered, please download this file and attach to this thread...

https://github.com/Tronix286/

Reply 6 of 34, by boxpressed

User metadata
Rank Oldbie
Rank
Oldbie

Very cool. I know that you can turn on framerate with "DNRATE", but is there a way to get an average framerate for a demo loop (like Quake), or do you just have to observe the dynamic framerate on the opening demo?

Reply 7 of 34, by Tronix

User metadata
Rank Member
Rank
Member
boxpressed wrote:

Very cool. I know that you can turn on framerate with "DNRATE", but is there a way to get an average framerate for a demo loop (like Quake), or do you just have to observe the dynamic framerate on the opening demo?

I start new game then type DNRATE and observe overage framerate in real game. If i turn off shadows and minimize game window i get ~15-18 fps, so i finished first Duke level on 386 -)

keropi wrote:

^ here ya go

Thank you. Now i compile Shadow Warrior game EXE for run on i386, but Shadow Warrior need more then 9Mb to start. My 386 configuration is 8Mb and i can't increase memory size (don't have 2Mb or 4Mb SIMMs). So, i tried disable memory check, game started but after few seconds crashed with stack or heap overflow messages. If anyone have 386 + 12Mb RAM might try attached EXE...

https://github.com/Tronix286/

Reply 8 of 34, by wbc

User metadata
Rank Member
Rank
Member
Tronix wrote:

Thank you. Now i compile Shadow Warrior game EXE for run on i386, but Shadow Warrior need more then 9Mb to start. My 386 configuration is 8Mb and i can't increase memory size (don't have 2Mb or 4Mb SIMMs). So, i tried disable memory check, game started but after few seconds crashed with stack or heap overflow messages. If anyone have 386 + 12Mb RAM might try attached EXE...

Try to use a swap on hard disk (not sure will it work on DOS32A but on DOS4GW it seems to run fine): http://ukrfaq.narod.ru/ru/game/games4mb.htm

--wbcbz7

Reply 10 of 34, by Snayperskaya

User metadata
Rank Member
Rank
Member

Hey, that's actually quite playable! I'd expect 10fps or lower on a 386 machine.

Reply 11 of 34, by elianda

User metadata
Rank l33t
Rank
l33t

On my 386DX-40 DNRATE shows 7 to 9 fps without sound (low detail, shadows off, 320x200)
Enabling sound SB 2.0, 8 Voices, 22 kHz and Music on GUS costs 1 fps.

So it's really quite playable.

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 12 of 34, by idspispopd

User metadata
Rank Oldbie
Rank
Oldbie
Tronix wrote:
Hello, […]
Show full quote

Hello,

Today I would like to run Duke Nukem 3D on my Am386-DX40 with 128Kb cache and 8Mb RAM: game console start, display version, display "loading strings" and then I get DOS4GW error: "06h - invalid opcode". CS:EIP contain 0F C8 means BSWAP 486+ instruction. I downloaded Duke3d original source code (very hard to find, now I have lost link again), find all BSWAP instructions in sources and replace them to:

  "rol ax, 8",\
"rol eax, 16",\
"rol ax, 8"\

No idea if these instructions are relevant for performance, but I was thinking if these could be optimized.
A rol ax, 8 could be replaced by xchg ah,al. Both should use 3 cycles on a 386, but the rol instruction uses 3 bytes while xchg reg,reg only uses 2 bytes (or even only 1 byte since it is xchg reg,accum). If the code is in 32 bit mode xchg also won't need a data size prefix byte which rol with a 16 bit register will need. I don't know if any stalls can happen on a 386, but IIRC those are a Pentium/PPro thing.

Reply 13 of 34, by alexanrs

User metadata
Rank l33t
Rank
l33t

I wonder how this would perform on a blue lightning machine xD

Reply 14 of 34, by Scali

User metadata
Rank l33t
Rank
l33t
idspispopd wrote:

No idea if these instructions are relevant for performance, but I was thinking if these could be optimized.

I can't really think of something in the innerloop of a software renderer that would benefit from bswap, so I wonder what they are using it for exactly.
Is it in the innerloop at all?
And if so, what is its purpose? It may well be that they only need a few of the bytes, instead of all of them, so perhaps you don't need to do a full bswap, but only two of the three instructions.

Edit: Browsed the code quickly, and I see it only in two places:
1) A CRC routine for the level data. We need all 4 bytes for this. However, it shouldn't be performance-critical.
2) There's a copybufreverse() function, which uses it for the mirror-effect, to mirror the screen x-wise.

So I think it only affects performance when there are mirror effects in view.
I wonder if there is a faster way to do copybufreverse() on 386.
You could do it with word loads/stores so you'd only need the xchg, not the rol eax, 16. But you'd need two extra mov instructions, so I doubt it's faster.
But you could use stosw/stosd instead of the mov/add which they do.
Perhaps you could also reverse the whole routine, and then use lodsd to read data forward and push eax to write it in reverse order.

All in all I don't think it's going to affect performance that much, because the mirror effect is not used everywhere.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 15 of 34, by elianda

User metadata
Rank l33t
Rank
l33t

Here is a capture how it runs on one of my 386: https://www.youtube.com/watch?v=Rt6xf43nnFg

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 16 of 34, by xjas

User metadata
Rank l33t
Rank
l33t

^^ Why didnt you use the GUS for sound? Wouldn't you see some benefit from hardware mixing?

twitch.tv/oldskooljay - playing the obscure, forgotten & weird - most Tuesdays & Thursdays @ 6:30 PM PDT. Bonus streams elsewhen!

Reply 17 of 34, by elianda

User metadata
Rank l33t
Rank
l33t

The GUS uses software mixing in Duke3D. This has additional overhead since it has to transfer the sound buffer each time. On the 386 it is slower than SB due to this additional overhead.

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 18 of 34, by SquallStrife

User metadata
Rank l33t
Rank
l33t
elianda wrote:

The GUS uses software mixing in Duke3D.

For sounds right? It should use GF1 for music playback, and the performance would be the same as AdLib, since OPL2/3 also does hardware 'mixing'.

VogonsDrivers.com | Link | News Thread

Reply 19 of 34, by elianda

User metadata
Rank l33t
Rank
l33t

Exactly thats why I chose SB for Sound and GUS for Music as seen in the video.

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool