VOGONS


First post, by Marco

User metadata
Rank Member
Rank
Member

Hello all,

I have here quite a deep-tech question. Already know a lot about dos extender and its advantage of direct memory accesses and bypassing 640k barrier.

What is it about:
- With the introduction of DOS4GW all games using this protected mode came with higher HW requirements mainly 4MB of RAM
- This is valid also for games that are technically identical to their pre-successors using real time mode "engines". Mainly Sierra adventures. Why is that?

System setup:
- 386 SX / 25 with 2MB later on 4MB RAM
- Focus on games like Sierra Quest for Glory 3,4 Police Quest 4, Gabriel Knight

Examples:
- Sierras adventures using DOS4GW extender needed 4MB of RAM at minimum even providing the same graphic detail level then their pre successors.
- Even better: I was having a Sierra Online Demo CD which shows a Demo version of Quest for Glory 4 running in realtime mode without Dos4GW on my 386SX/25. Result: Flawless smooth as usual, no issues with 2MB of RAM. The final version of QfG4 were using the DOS4GW and was terribly slow and required 4MB RAM. Cost me 400DM back in the days to upgrade btw.
Also Larry 6 as an exceptional non dos 4gw Game ran fantastic.

Question:
Why is that? The 386SX supported 32Bit addressing instructions. Is it why it had to translate 32Bit instructions to its 16Bit bus which could let the double amount of CPU cycles per instruction (waitstates?). I never had a comparison with a 386DX/25 unfortunately but I could find some Forum remarks in the internet stating that 386sx performance Fall quite a lot behind their DX counterparts while using dos 4gw.

Thanks a lot

Reply 1 of 11, by Jo22

User metadata
Rank l33t++
Rank
l33t++

Re: What is the main bottleneck in dos programming?

Not sure if that's helpful, though.
Maybe the 386SX has to use some external logic to interface with the 286 bus (ISA).

The 386DX technically can work in 386SX "mode", too, btw.
It has the ability to transfer in 16 Bit chunks like a 286 and 386SX (did notexist yet when the original 80386 was made).

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 2 of 11, by mkarcher

User metadata
Rank Oldbie
Rank
Oldbie

DOS4GW most likely isn't the cause of the performance issue, but just a piece in the puzzle. Performance in protected mode and in real mode is not significantly different. You lose some performance when you run with paging enabled (EMM386 does it, for example), but unless EMM386 is already loaded, DOS extenders like DOS4GW don't enable paging.

DOS4GW is a DOS extender that came with the Watcom C/C++ compiler. So code using DOS4GW is most likely compiled with that C compiler, as 32-bit code. And that is the main point. In 32-bit code, all pointers take 32 bits, and integers (by default) use 32-bits, too. In standard 16-bit DOS code, most integer variables are just 16 bits wide, and depending on memory model, pointers might also be limited to 16 bits. At the same time as going from a real-mode 16-bit game engine to a 32-bit game engine, probably a lot of highly optimized assembler code was dropped and replaced by better maintainable C code. If code uses 32-bit pointers and 32-bit integers, it issues a lot of 32-bit memory access cycles. Those are twice as fast on the 386DX than on the 386SX. So the bus is a bottleneck for mostly for 32-bit code. The performance penalty of the 386SX on 32-bit code combined with less optimized (but possibly more generic) game engine code are likely the explanation for the low performance of the 386SX in 32-bit Sierra games.

Reply 4 of 11, by pentiumspeed

User metadata
Rank l33t
Rank
l33t

Not getting the hint that fact is 386*SX* is 16 bit data path even the processor core is 32 bits. Any execution of any 32 bit instructions imposes big penalty on extra cycles due to transferring 16 bits twice to do a 32 bit execution then 2 times on 32bits data again then again. , Again.

If you had a 386DX which is fully 32 bits, it will do it with no latency on doing 32 bits data and instructions.

PS: if there is a cache on the motherboard with 386SX, this does big help indeed.

Cheers,

Great Northern aka Canada.

Reply 5 of 11, by Marco

User metadata
Rank Member
Rank
Member

Thanks as well. That’s what I initially meant with:

„Is it why it had to translate 32Bit instructions to its 16Bit bus which could let the double amount of CPU cycles per instruction (waitstates?). “

I‘d really like to see an identical app/game benchmark once with dos extender once with realmode 16bit

Reply 6 of 11, by AlexZ

User metadata
Rank Member
Rank
Member

386SX was crippled so it was sort of like a 286 with 386 instructions but not capable of executing 32bit code very fast. It was meant to be used for mostly 16bit software with the option of executing 32bit code in theory. As far as I remember it didn't even have a cache. As described above it suffers huge penalties with 32bit code.

386DX/40 is what you need for play early DOS4GW games. But only few are playable as usually memory (can be upgraded to 8MB), slow ISA bus speed (affects video transfers, partially solvable by running ISA at 12Mhz) and CPU became bottlenecks. It is true that those early games do not offer better experience than those 16 bit real mode ones coded for 286.

Pentium III 900E, ECS P6BXT-A+, 384MB RAM, NVIDIA GeForce FX 5600 128MB, Voodoo 2 12MB, 80GB HDD, Yamaha SM718 ISA, 19" AOC 9GlrA
Athlon 64 3400+, MSI K8T Neo V, 1GB RAM, NVIDIA GeForce 7600GT 512MB, 250GB HDD, Sound Blaster Audigy 2 ZS

Reply 7 of 11, by bakemono

User metadata
Rank Oldbie
Rank
Oldbie

It should also be noted that running in 32-bit protected mode is a choice of the developers, and one of the biggest reasons to make that choice is needing more than 640KB of memory. It means the developers set out to make a heavier game. So the situation is more like "higher memory requirements lead to using DOS4GW" and not "using DOS4GW leads to higher memory requirements"

If I'm not mistaken there are also instructions on 386/486 which take more cycles to execute in protected mode because of MMU overhead, so that can also slow things down a bit.

yet another retro game on itch: https://90soft90.itch.io/super-wild-war-22

Reply 8 of 11, by Jo22

User metadata
Rank l33t++
Rank
l33t++
AlexZ wrote on 2022-08-18, 15:04:

386SX was crippled so it was sort of like a 286 with 386 instructions but not capable of executing 32bit code very fast. It was meant to be used for mostly 16bit software with the option of executing 32bit code in theory. As far as I remember it didn't even have a cache. As described above it suffers huge penalties with 32bit code.

The 386SX really was a castrated i80386.
That's how the 386 was called originally, before the DX suffix was introduced.

From what I remember, the critics of day were really disappointed by Intel's announcement of the 386SX (terms "lazy", "boring" fell).
That's what computer magazines from the late 80s said, at least.

Ironically, the 80386/386DX did already have the ability to use 16-Bit I/O.
The BS16 pin can be used for switching between 16/32-Bit data size.

So the 386SX did not introduce anything new, really. It's relates like an 8088 to an 8086.
- At least, it gave new life to intelligent, mature 80286 chipsets (they used to be great; UMBs and EMS in hardware; no EMM386/V86 needed).
That was it only real right to exist, maybe.

The only notable difference was the 386SLC, a low power notebook processor.
It introduced things like SM BIOS and power-savings support.

However, the most castrated spin-off was the 80387, perhaps.
It had both 16-Bit/32-Bit registers (AX, EAX etc) but was 32-Bit Protected Mode only.
No v86, no paging. But the MMU's segmentation unit was still operational.

AlexZ wrote on 2022-08-18, 15:04:

386DX/40 is what you need for play early DOS4GW games. But only few are playable as usually memory (can be upgraded to 8MB), slow ISA bus speed (affects video transfers, partially solvable by running ISA at 12Mhz) and CPU became bottlenecks.

Yes, the 386DX-40 was neat. My father used one for professional software development in the early/mid 90s.

By dividing the clock by 4 (or was it 8? 80 MHz oscillator), the ISA bus could be set to a clean 10 MHz.

Which was a bit less restricting that 8.33 MHz.
Speaking of overclocking, 12 MHz (as you said) to 16 MHz was possible with good hardware.
The ~16,66 MHz were ideal, in theory, because they'd be exactly twice the default clock.
Programs like MOD4WIN encouraged ISA bus overclocking in their help files.

AlexZ wrote on 2022-08-18, 15:04:

It is true that those early games do not offer better experience than those 16 bit real mode ones coded for 286.

+1

I remember, those LucasArts DOS4GW games like Sam&Max running on a slow 386
performed worse than these 16-Bit Sierra VGA titles (Larry 1 and Space Quest 1 remakes) on a 10 MHz 286.

I do like Sam&Max Hit The Road a lot, but the engine was scumm.
Moving very slow. As ig being tarred and feathered.

That's why I never loved that 32-Bit and flat-mide cult.
Real-Mode and 16-Bit Protected-Mode were uncomfortable to work with,
but software using it often was not that slow in practice.

And then there's the slow down of V86/the 80386 MMU's Paging Unit.
The 386 didn't support Enhanced V86 (aka VME) yet, also.
The 586 (aka Pentium) core and late 486 cores had VME.
QEMM 7 was one of the early memory managers with officiall support for it.
There's even a sticker on the big-box that mentions special support.

Edit: Typos fixed. Sorry, working from a smartphone.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 9 of 11, by Horun

User metadata
Rank l33t
Rank
l33t

Good explainations of the diff of SX vs DX !

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor.

Reply 10 of 11, by rasz_pl

User metadata
Rank Oldbie
Rank
Oldbie
pentiumspeed wrote on 2022-08-18, 14:35:

PS: if there is a cache on the motherboard with 386SX, this does big help indeed.

Biggest help would be 1KB cache on the CPU itself, as evidenced by 486SLC running faster than 386DX https://youtu.be/ldYQQPYlRAU?t=220

Reply 11 of 11, by jakethompson1

User metadata
Rank Oldbie
Rank
Oldbie

Another part of this is that a DOS extender, Windows 386-enhanced mode included, is actually an operating system that picks and chooses which DOS calls it wants to implement and which it wants to pass back to DOS or the BIOS, running them in Virtual 8086 mode. The book Undocumented Windows 95 has a good explanation of this. For example, a DOS extender author would want to substitute their own code for things like memory allocation (being the whole point of the extender) while passing calls to open a file, read a file, list a directory, etc., back to DOS so as not to have to implement that functionality. So that explains some of the overhead.

As to you not feeling the increased requirements have much of a payoff... isn't that the whole history of the evolution of personal computers? Considering we have on a typical system 500 times as much memory as in the dial up era, and say 50 times the CPU power, but the modern web certainly isn't 50 or 500 times better because the bloat cancels out much of the benefit. With the exception of streaming video. Writing in 32-bit C for a PC in 1990 rather than hand optimized assembly was probably viewed at the time like writing an Electron app is today...