VOGONS


Reply 60 of 90, by javispedro1

User metadata
Rank Member
Rank
Member
javispedro1 wrote on 2024-08-18, 17:11:

With a Intel 13700k with stock settings, I got :

With 1W as the TDP Limit (Pl1 & PL2):
"This computer performs like a 900 MHz AT with a 400 Mhz 80287"
CPU 9000 Mhz
FPU 4800 Mhz
Video 3461 chr/ms
(particularly surprised at the round numbers, but I guess this is due to the nature of throttling).

AIDA16: 190 (between a K6-200 and a P55C-233)
AIDA32: 2350

Note: the actual power consumption only seems to reduce 20W. So basically 1 P core in real mode can consume 20W tops.
The overall consumption is still pretty high since nothing is entering power saving states, most glaringly the GPU. Just a guess, I don't have the tools to check.
Likewise, I also suspect that turbo is not entered at all.

edescourtis wrote on 2024-08-24, 00:55:

Interesting so an Intel 13700k looses to a Core2 Duo E6600?! crazy!

This is the entire point of this thread (and in fact the AIDA16 documentation mentions as much), but I was hoping for more data points....

Reply 61 of 90, by javispedro1

User metadata
Rank Member
Rank
Member
Riikcakirds wrote on 2024-08-23, 23:00:

Compare the benchmarks by using Emm386 for V86 mode and then using Jemmex for V86 mode (VME are enabled by default with Jemmex). You should see a slowdown with FPU in emm386.

Are you sure? The documentation of Jemmex says the opposite (that NOVME is the default). There's no discernible difference between explicit VME and NOVME anyway as far as I can see. My suspicion is that likely V86 mode is just faster on these CPUs anyway.

I never managed to boot MS's HIMEM + EMM386 on this machine (though Win9x works, with mem limit patches)

Reply 62 of 90, by Ringding

User metadata
Rank Member
Rank
Member
Riikcakirds wrote on 2024-08-23, 22:47:

The software/programs don't need to support VME, the CPU just needs it to be enabled, which EMM386 and any version of Win3.x will not, so you will get a slowdown in anything that uses V86 mode.
Loading Emm386 will generally slow down everything in DOS by 25-35% (video, i/o performance, FPU) . NSSI benchmark is just an easy way to show it as a real time bar graph. It is the same with other dos benchmarks/games. The same whether using a 486DX4 or Corei7. The only difference is the slowdown is much more notable on a 386/486 versus a 3GHz core i7 slowed down by 25%.
This was well known back in the early 1990's when using a 386 and 486 as it had such a big impact on performance.

Well yes, but claiming that everything gets slower is very different from saying that it decreases FPU performance.

Reply 63 of 90, by Ringding

User metadata
Rank Member
Rank
Member
edescourtis wrote on 2024-07-24, 13:10:

I understand that Real Mode is quite slow on modern CPUs

I never believed in this claim, and trying to verify or refute it has been sitting in the back of my head for quite some time, and because I still cannot think of a single piece of software that does not have graphics interaction and can be used to test real mode performance, and because I stumbled across a neat SHA256 implementation, I thought I could build a small 16-bit DOS program using Open Watcom utilizing this code, so this is exactly what I did. Additionally, and this was actually the most time-consuming work, I disassembled the inner loop and adapted it to run in 32-bit protected mode, still using the exact same 16bit registers, under Linux. The assembly listing is almost exactly the same apart from some pointer-reload operations that required slight adjustments, but the machine code is not, because almost every instruction acquired an operand size override prefix. If you are interested in the code or the binaries, let me know, but I will need a little time for preparing them.

My findings:

- There is absolutely no difference in performance between actual real mode, v86 mode, VME, virtualization and any combination thereof.
- The equivalent protected mode code runs at almost exactly the same speed as well, actually slightly faster, which might be attributed to the outer code being compiled in a more efficient environment.
- Compiling the same code for 32 bits (Open Watcom, DOS4GW) makes it an order of magnitude faster (10x), mostly because shifting/rotating 32 bit values is just a lot more efficient in 32 bit registers.
- Compiling the same code using a modern compiler for modern x86 (both 32 and 64 bit) doubles the performance again. And it is not even utilizing the special SHA instruction set.

I don't know if you consider my CPU "modern", as it is an i3-8100 from 2018, but it is at least not ancient and reasonably fast.

Conclusion: There is no such thing as "real mode performance". There is just CPU performance.

Reply 64 of 90, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Ringding wrote on 2024-10-27, 23:03:

My findings:

- There is absolutely no difference in performance between actual real mode, v86 mode, VME, virtualization and any combination thereof.

Last time I checked on an AMD Athlon X2 with Virtual PC 2007 it was different.

MS-DOS in Real-Mode (himem loaded) had lower numbers than if EMM386 was used.

The reason was that Virtual PC as virtualizer had to emulate Real-Mode code, whereas V86 "code" could be passed through to real CPU.

Edit: The technical background here was the x86 ring scheme of priority used in protected-mode, I think.
The VMs with V86 can probably be integrated easier and need no Real-Mode emulation.

Edit: The use of AMD-V or Intel-VT may have an effect on this. Things like "Ring -1" are possible with it.
https://blog.codinghorror.com/virtualization- … g-negative-one/
I'm not sure whether I've used hardware-assisted virtualization back then or not.
The CPU/PC had the feature (AMD-V), but I noticed compatibility issues with certain VMs if it was enabled (except for XP, which loved it; Win32s needed it).

By contrast, on a physical 386 PC, it was other way round.
The 386 didn't feature an Enhanced V86 (VME) yet and was slower in V86 mode.
Using EMM386 made the PC a little bit slower. About 20%, at least, if memory serves.

Ringding wrote on 2024-10-27, 23:03:

Conclusion: There is no such thing as "real mode performance". There is just CPU performance.

To draw a final conclusion after testing just one CPU, that's cool! 😎

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 65 of 90, by zyzzle

User metadata
Rank Member
Rank
Member
Ringding wrote on 2024-10-27, 23:03:

I don't know if you consider my CPU "modern", as it is an i3-8100 from 2018, but it is at least not ancient and reasonably fast.

Conclusion: There is no such thing as "real mode performance". There is just CPU performance.

Please release your binaries and let us test your claim for ourselves. It will be a very interesting test. I suspect "CPU performance" will be the motivating factor of speed as well, but would like to test on my various Intel 32-bit and 64-bit CPUs in bare metal DOS.

Reply 66 of 90, by javispedro1

User metadata
Rank Member
Rank
Member
Ringding wrote on 2024-10-27, 23:03:

Conclusion: There is no such thing as "real mode performance". There is just CPU performance.

"If you ignore everything that makes real mode different, there is no difference with real mode" is not really a surprising statement, though.

I agree with your guess that it is likely the speed of I/O, interrupts, etc. that makes all the quoted benchmarks appear to be so relatively slow. In fact, there was a thread in the FreeDOS mailing list where someone observed that the speed of plain 'out' loops has nowhere near improved as much in the last 10 years as everything else. I believe that effect is likely what people mean when they say "real mode" , and a benchmark biased towards IO will likely be very different between real mode and V86.

Are you actually running the benchmarks on DOS? I also suspect, and practically confirmed based on the TDP experiment, that under DOS the CPU is stuck in the lowest operating frequency. In recent generations there is a increasingly huge difference between the lower/highest frequencies (so called Turbo) and I would assume this by itself is the main reason my benchmark results seem so low. Under VirtualBox everything is much faster but it could also be due to several other reasons (better BIOS, much faster hardware emulation, etc).
Well actually not everything. AIDA itself shows exactly the same MIPS16/32 numbers in VirtualBox as in baremetal, with EMM386 (i.e. with V86), thereby destroying my theory. Interesting, I always get the same number I got under DOS with V86 (1283) , no matter whether I use V86 or not under VirtualBox. Under baremetal real mode was reproduceably a bit slower (1266).

Reply 67 of 90, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++

All I got from most of the above was "IF you recompile it for your CPU, it's faster" well no shit. I am thinking anyone who wants fast real mode is dealing with legacy binaries though, not code they can recompile to best suit modern CPU.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 68 of 90, by Ringding

User metadata
Rank Member
Rank
Member
javispedro1 wrote on 2024-10-28, 20:55:

Are you actually running the benchmarks on DOS?

Yes, I ran it on real DOS.

I also suspect, and practically confirmed based on the TDP experiment, that under DOS the CPU is stuck in the lowest operating frequency. In recent generations there is a increasingly huge difference between the lower/highest frequencies (so called Turbo) and I would assume this by itself is the main reason my benchmark results seem so low.

That's an interesting observation, but as far as I am aware, Turbo Boost works by itself, without any involvement from the OS. My CPU does not even have Turbo Boost, so I cannot experiment with this.

zyzzle wrote on 2024-10-28, 10:30:

Please release your binaries and let us test your claim for ourselves. It will be a very interesting test. I suspect "CPU performance" will be the motivating factor of speed as well, but would like to test on my various Intel 32-bit and 64-bit CPUs in bare metal DOS.

Ok, there it is: main.exe. On my system, the DOS version runs in 4.8 seconds, the Linux version in 4.7. Linux compilation requires the ability to build 32 bit executables and JWasm.

Something else that is a bit surprising to me is the fact that the Watcom compiler does not use a more efficient method for shifting 32 bit values when invoked with -3 and instead goes for loops shifting the bits one by one. Both Borland C++ 3.1 and Visual C++ 1.52 use a 32 bit register for this and just use different methods for getting the 32 bit result back into two 16 bit registers. Using shld/shrd directly on the 16 bit registers would also be possible, which I would assume to be the exact use case that Intel had in mind when creating these instructions for the 80386.

Reply 69 of 90, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Ringding wrote on 2024-11-02, 10:45:
Yes, I ran it on real DOS. […]
Show full quote
javispedro1 wrote on 2024-10-28, 20:55:

Are you actually running the benchmarks on DOS?

Yes, I ran it on real DOS.

I also suspect, and practically confirmed based on the TDP experiment, that under DOS the CPU is stuck in the lowest operating frequency. In recent generations there is a increasingly huge difference between the lower/highest frequencies (so called Turbo) and I would assume this by itself is the main reason my benchmark results seem so low.

That's an interesting observation, but as far as I am aware, Turbo Boost works by itself, without any involvement from the OS. My CPU does not even have Turbo Boost, so I cannot experiment with this.

Turbo Boost!

turbo-tv.gif

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 70 of 90, by Concupiscence

User metadata
Rank Newbie
Rank
Newbie
Dothan Burger wrote on 2024-07-29, 22:39:
I compared two systems because they are so close; Core 2 Duo T7600 2.33 GHz Merom vs Core i5 520m 2.4 GHz Arrandale. […]
Show full quote
edescourtis wrote on 2024-07-24, 13:14:
For benchmarking, does anyone have results from the following tools?: […]
Show full quote

For benchmarking, does anyone have results from the following tools?:

• Norton SysInfo
• Landmark Speed Test

If you have any benchmark results from these tools, please share them.

I compared two systems because they are so close; Core 2 Duo T7600 2.33 GHz Merom vs Core i5 520m 2.4 GHz Arrandale.

Aida16
Core2 1096 - Core i5 1108

RayeR's PI
Core2 24.82 - Core i5 20.31

I was interested in testing this because the Core i5 was going to replace the Core2 in my ultimate 98 system but it scores 200 realtics lower in Shareware DOOM. Is DOOM shareware a 16bit application?

Doom is a protected-mode 32-bit application through and through. The fact that it performs slightly worse might be down to the virtualization of Gate A20 support in Nehalem, which could impact performance in protected mode, or a regression from Conroe ( which did happen in a handful of cases where the integrated memory controller was not the bottleneck). I don't have a point of reference for period hardware benchmarks in those apps, but would expect that all things being equal both would be absurdly capable for DOS applications...

Reply 71 of 90, by Dothan Burger

User metadata
Rank Member
Rank
Member
Concupiscence wrote on 2024-11-07, 15:35:
Dothan Burger wrote on 2024-07-29, 22:39:
I compared two systems because they are so close; Core 2 Duo T7600 2.33 GHz Merom vs Core i5 520m 2.4 GHz Arrandale. […]
Show full quote
edescourtis wrote on 2024-07-24, 13:14:
For benchmarking, does anyone have results from the following tools?: […]
Show full quote

For benchmarking, does anyone have results from the following tools?:

• Norton SysInfo
• Landmark Speed Test

If you have any benchmark results from these tools, please share them.

I compared two systems because they are so close; Core 2 Duo T7600 2.33 GHz Merom vs Core i5 520m 2.4 GHz Arrandale.

Aida16
Core2 1096 - Core i5 1108

RayeR's PI
Core2 24.82 - Core i5 20.31

I was interested in testing this because the Core i5 was going to replace the Core2 in my ultimate 98 system but it scores 200 realtics lower in Shareware DOOM. Is DOOM shareware a 16bit application?

Doom is a protected-mode 32-bit application through and through. The fact that it performs slightly worse might be down to the virtualization of Gate A20 support in Nehalem, which could impact performance in protected mode, or a regression from Conroe ( which did happen in a handful of cases where the integrated memory controller was not the bottleneck). I don't have a point of reference for period hardware benchmarks in those apps, but would expect that all things being equal both would be absurdly capable for DOS applications...

I really appreciate explanations like this, Though 1/3 slower isn't what I would call slightly slower.

Reply 72 of 90, by javispedro1

User metadata
Rank Member
Rank
Member

Even in protected mode, it is still doing all the fancy DOS-y like stuff (I/O, interrupts, 32/16 code segments or entering/exiting protected mode, etc). All of it is playing a role, I suppose not just the A20 gate toggling speed.

Reply 73 of 90, by Jo22

User metadata
Rank l33t++
Rank
l33t++
javispedro1 wrote on 2024-11-10, 15:50:

Even in protected mode, it is still doing all the fancy DOS-y like stuff (I/O, interrupts, 32/16 code segments or entering/exiting protected mode, etc). All of it is playing a role, I suppose not just the A20 gate toggling speed.

To my understanding, isn't the physical A20 gate being "open" all time if a V86 memory manager such as QEMM is running (so it can access whole address range) ?
Which means that the A20 which DOS and application "see" is entirely being emulated at this point? Like in a multi-tasking OS?

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 74 of 90, by javispedro1

User metadata
Rank Member
Rank
Member

I was going to add ", since I really don't think A20 toggling speed is going to be the bottleneck in any case", but I didn't want to enter into that discussion and I think it distracts from the point 😀
Technically no EMM386-like-manager needs to be running for a protected mode program, and the DPMI host (or equivalent) may do whatever it wants with the A20 gate . While I suppose it will just enable it on entry and disable it on exit, I would not bet on it ...

If you only have XMS/HIMEM-like, it is perfectly valid for it to toggle the A20 gate repeatedly on each access to high memory.
And if you have EMM386 active , it has to keep A20 enabled all the time : you can't know when a ISR/TSR/whatever is going to access some page that has been remapped to XMS memory and crash.

Reply 75 of 90, by kagura1050

User metadata
Rank Newbie
Rank
Newbie

I'd like to help with this investigation, as I have a variety of PCs with CPUs that might be useful (MII, GX1, LX800, K6/K6-2, K7, K8(754 & 939), 65nm K10, Piledriver/Steamroller, Zen2, Willamate/Prescott/Cedarmill, Yonah/Conroe/Wolfdale, Clarkdale, Sandy to Skylake, Cedarview, Bay Trail, Gemini Lake, Nano U3100/Eden X2 U4200, maybe others).
However, I don't know much about DOS. Could it help if I just boot into the FreeDOS environment I created with Rufus, run the Landmark benchmark from the DOSBENCH pack, and share the results?

古いマシンで新しいOS(Linux/NetBSD)を動かすのが好き。
Timezone : UTC+9

Reply 76 of 90, by Jo22

User metadata
Rank l33t++
Rank
l33t++
javispedro1 wrote on 2024-11-10, 23:28:
I was going to add ", since I really don't think A20 toggling speed is going to be the bottleneck in any case", but I didn't wan […]
Show full quote

I was going to add ", since I really don't think A20 toggling speed is going to be the bottleneck in any case", but I didn't want to enter into that discussion and I think it distracts from the point 😀
Technically no EMM386-like-manager needs to be running for a protected mode program, and the DPMI host (or equivalent) may do whatever it wants with the A20 gate . While I suppose it will just enable it on entry and disable it on exit, I would not bet on it ...

If you only have XMS/HIMEM-like, it is perfectly valid for it to toggle the A20 gate repeatedly on each access to high memory. .
And if you have EMM386 active , it has to keep A20 enabled all the time : you can't know when a ISR/TSR/whatever is going to access some page that has been remapped to XMS memory and crash

True. That pretty much sums up the experience on my 286-12 PC back in the 90s.
It was running Real-Mode DOS through and through, as far as DOS applications were concerned. Himem.sys only.

However, I was under the impression that 386/486 power users always ran EMM386, QEMM or 386Max. Or any DPMI provider (Helix Netroom).
For purely being able using all those VESA VBE, CD-ROM, ZIP and network drivers alone.
Realistically, to most power users SmartDrive often was a must have, too.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 77 of 90, by zyzzle

User metadata
Rank Member
Rank
Member
kagura1050 wrote on 2024-11-11, 01:25:

I'd like to help with this investigation, as I have a variety of PCs with CPUs that might be useful (MII, GX1, LX800, K6/K6-2, K7, K8(754 & 939), 65nm K10, Piledriver/Steamroller, Zen2, Willamate/Prescott/Cedarmill, Yonah/Conroe/Wolfdale, Clarkdale, Sandy to Skylake, Cedarview, Bay Trail, Gemini Lake, Nano U3100/Eden X2 U4200, maybe others).
However, I don't know much about DOS. Could it help if I just boot into the FreeDOS environment I created with Rufus, run the Landmark benchmark from the DOSBENCH pack, and share the results?

I'd suggest not using Landmark benchmark, but just download and run Ringding's main.exe posted above, copying it to your freedos floppy / environment, and post your results with the various systems you listed.

As for mine, I got a time of 6.42 seconds on a Sandy Bridge i3400 laptop in pure DOS, with only XMS running, at a clock of 2.8 Ghz. Reducing the clockspeed (it supports multipliers 8x-28x in CPUSPD), give a purely linear relationship based upon clock speed. On an i5 8250 laptop running at 3.4 Ghz, I got 4.83 seconds. A20 line / gate was open on both systems. Both have 8 GB of RAM, the i3400 has DDR2 dual-channel 1600 MHz, the i5 8250 has DDR3 dual-channel 2400 Mhz RAM.

Reply 78 of 90, by Ringding

User metadata
Rank Member
Rank
Member

It would be interesting to know if there are modern CPUs that exhibit different behavior in "real" real mode (i.e. bare metal DOS, no virtualization) vs. virtualized real mode (DOS under VMWare, KVM, …)

Reply 79 of 90, by Inhibit

User metadata
Rank Newbie
Rank
Newbie

Huh. My curiosity is peaked. I've got one of the Piledriver-era AMD efficiency CPU based micro-PCs that ran DOS well bare metal. I didn't notice it listed and now I'm curious if it performs as well as I assumed with that benchmark. Have to go pull it out.