VOGONS


Reply 20 of 32, by FGB

User metadata
Rank Oldbie
Rank
Oldbie

A PPro isn't bad for DOS games at all. This myth has to be busted, I think.
For old 16-Bit code it is on par with a standard Pentium. But as soon a 32-Bit code (many games since 1993-94) or FPU demaning stuff comes into play, the PPro really flies away.

www.AmoRetro.de Visit my huge hardware gallery with many historic items from 16MHz 286 to 1000MHz Slot A. Includes more than 80 soundcards and a growing Wavetable Recording section with more than 300 recordings.

Reply 21 of 32, by BeginnerGuy

User metadata
Rank Oldbie
Rank
Oldbie
Scali wrote:
The missing information here seems to be that Pentium Pros were bad at running 16-bit code because of the register renaming that […]
Show full quote

The missing information here seems to be that Pentium Pros were bad at running 16-bit code because of the register renaming that was implemented as part of the Out-of-Order-Execution logic.
The key word here is 'partial register stall'.
That is, register renaming treats all registers, including 'partial' registers (eax being the full register, ax, ah and al the partial ones, for example) as separate internal registers.
However, there is overlap, so when another part of the register is accessed (eg, al was modified, then ax or eax is read), multiple internal registers need to be combined. This requires a pipeline flush to make sure that all register values are current.

Since 16-bit code always uses partial registers, this leads to excessive pipeline flushes in the Pentium Pro.
I don't think there's any 'special case' for realmode, because you cannot assume that you don't use 32-bit registers in realmode. Code can and will use the full 32-bit registers in realmode as well.

In the Pentium II this was fixed by adding some extra logic: when a full register is zero'ed (usually with xor eax, eax or such), the zero-flag also triggers a special state in the register renaming logic: Because it knows the full register was zero, there is no recombining required, and the pipeline flush can be skipped. Of course this still fails on legacy code where there's no explicit zeroing of the registers. But when you're in 16-bit mode, the registers start out as zero'ed, and the problem will not occur until you explicitly start using full 32-bit registers.
See also: http://qcd.phys.cmu.edu/QCDcluster/intel/vtun … rtial_Stall.htm
Pentium II also added caching for segment registers to improve 16-bit code performance.

So in short, I expect a Pentium Pro to be quite bad for DOS in general. But I've only used it with Win9x and NT4 myself, so I can't be 100% sure.
The problem is mixing registers of different sizes, either 16-bit and 32-bit or 8-bit and 16-bit. Especially legacy code will often use partial registers, because there was no penalty for it before the Pentium Pro. You could often optimize things considerably with clever use of partial registers.
Mixing 16-bit OS/BIOS code with 32-bit applications or vice-versa is both going to be a recipe for disaster on the Pentium Pro.
32-bit applications are no guarantee that they won't use partial registers though. The only 'good' 32-bit applications for Pentium II are ones that are compiled with a compiler that is Pentium II-aware, and always inserts the xor reg, reg sequence to avoid stalls. For Pentium Pro, even that doesn't really help, I believe.

I'd like to get my hands on one for some testing, but I believe the general understanding at the time was that the PPro was ~5% faster than a p5 Pentium in 16 bit ops, though I'm not sure if it was a clock for clock comparison since in those days we just went by quake or doom benchmarks and word of mouth. The extra decode stages and speculative execution certainly hurt 16 bit performance, but not enough to say it's bad in terms of a modern retro machine. At the time they were way more expensive than a p5 and consumers weren't interested in the huge cost for minute gains.

Today if you come across a cheap / goodwill ppro, I wouldn't call it "bad" for DOS when it's still going to pull slightly ahead of a Pentium AND way ahead in 32-bit. Issue is I believe these chips sell for gold and the cost is much higher because of it. All that said, I would still go with slot 1 as I said earlier if top speed is the concern, they will annihilate the Pentium Pro AND can be had cheaper.

FGB wrote:

A PPro isn't bad for DOS games at all. This myth has to be busted, I think.
For old 16-Bit code it is on par with a standard Pentium. But as soon a 32-Bit code (many games since 1993-94) or FPU demaning stuff comes into play, the PPro really flies away.

This is true, though if you were into CPUs at the time, the Pentium Pro was generally shunned because of the extremely high price and minimal performance gains in the apps consumers cared about. It was the equivalent of a gamer today buying a 32 core Xeon. In fact these "pro" chips went on to become the Xeon shortly thereafter. The ppro really could tear the head right off of a horse in 32 bit in it's heydey

Sup. I like computers. Are you a computer?

Reply 22 of 32, by Scali

User metadata
Rank l33t
Rank
l33t
BeginnerGuy wrote:

I'd like to get my hands on one for some testing

I have a Compaq DeskPro 6200, which has a Pentium Pro 200 with 256k L2 (there were also 1 MB L2 models, which would probably perform better in certain scenarios).
It runs Win98SE, I suppose I could put the DOS version of Quake and DOOM on there and run some timedemos.

BeginnerGuy wrote:

but I believe the general understanding at the time was that the PPro was ~5% faster than a p5 Pentium in 16 bit ops

It's very difficult to give a general comparison, because the difference depends a lot on the exact code you're executing. Sometimes the Pentium Pro will be faster, but worst case, it can be way WAY slower. You have to figure that the Pentium has no penalty for partial register access, so it can execute all these instructions in 1 cycle. The Pentium Pro will have a penalty of at least 8 cycles everytime a partial register stall occurs.
This can add up quickly in certain bits of code. It's not too far-fetched to have code that runs twice as fast on a Pentium than on a Pentium Pro.

We had similar things happening with Pentium MMX vs Pentium II. Hand-optimized code for Pentium MMX could be really fast. Pentium II however had additional latency on each instruction. So where you could use the results of one MMX instruction as the input for the next instruction on PMMX, the PII would stall for 2-3 cycles on that. You had to rewrite the code to interleave the operations more.
We've seen code where a PII-266 was actually dropping to the performance of a P166 MMX because of this.

BeginnerGuy wrote:

The extra decode stages and speculative execution certainly hurt 16 bit performance

That's not the problem. As long as you don't run into partial register stalls, the Pentium Pro will generally perform better because of it. It's just those few things it wasn't that good at that ruined the overall experience.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 23 of 32, by kixs

User metadata
Rank l33t
Rank
l33t

Nice reading guys... it brings up memories 😉 I've skipped PPro at that time but read about it a lot. I didn't even bother searching for one now. But got lucky with one lot of 6 motherboards - there it was hidding in a plain view. The BIG heatsink give it away finally 😉 I still don't know what to do with it. But it might end up in one of my future complete builds.

Requests are also possible... /msg kixs

Reply 24 of 32, by FGB

User metadata
Rank Oldbie
Rank
Oldbie
BeginnerGuy wrote:
FGB wrote:

A PPro isn't bad for DOS games at all. This myth has to be busted, I think.
For old 16-Bit code it is on par with a standard Pentium. But as soon a 32-Bit code (many games since 1993-94) or FPU demaning stuff comes into play, the PPro really flies away.

This is true, though if you were into CPUs at the time, the Pentium Pro was generally shunned because of the extremely high price and minimal performance gains in the apps consumers cared about. It was the equivalent of a gamer today buying a 32 core Xeon. In fact these "pro" chips went on to become the Xeon shortly thereafter. The ppro really could tear the head right off of a horse in 32 bit in it's heydey

Sure you're right. Also, the Socket 8 platform was very shortlived and didn't see any good upgrades despite the very late (..too late..) yet powerful PII Overdrive chip.
But from a retro gamers perspective, the PPro remains a very interesting, historic relevant and performant device to play with. Today we don't have to care about any upgrade paths, the high price or other things because we don't depend on them.
Every time I start Quake or GL Quake on my Dual PPro with 50ns EDO Dimms, I am impressed by its great performance.

www.AmoRetro.de Visit my huge hardware gallery with many historic items from 16MHz 286 to 1000MHz Slot A. Includes more than 80 soundcards and a growing Wavetable Recording section with more than 300 recordings.

Reply 25 of 32, by jheronimus

User metadata
Rank Oldbie
Rank
Oldbie

I own a Pentium Pro@200MHz machine. Never bothered to run extensive benchmarks on it, but I'm yet to see any game that would run better on a regular Pentium/Pentium MMX.

It is my understanding that a lot of late DOS games aren't really 16 bit. For example, DOS/4GW games like Doom, Duke Nukem 3D, Warcraft, Tomb Raider or NFS are 32 bit. Which pretty much makes the argument pointless. If a game needs a lot of resources then it is probably 32 bit (and will run faster on a PPro). If it doesn't, then it's probably something even a 386 or a 486 could run, and the difference between a PPro and a regular Pentium becomes negligible.

MR BIOS catalog
Unicore catalog

Reply 26 of 32, by sunaiac

User metadata
Rank Oldbie
Rank
Oldbie

Some homemade benches.

Attachments

  • Sans titre.png
    Filename
    Sans titre.png
    File size
    16.11 KiB
    Views
    913 views
    File comment
    benchs
    File license
    Fair use/fair dealing exception

R9 3900X/X470 Taichi/32GB 3600CL15/5700XT AE/Marantz PM7005
i7 980X/R9 290X/X-Fi titanium | FX-57/X1950XTX/Audigy 2ZS
Athlon 1000T Slot A/GeForce 3/AWE64G | K5 PR 200/ET6000/AWE32
Ppro 200 1M/Voodoo 3 2000/AWE 32 | iDX4 100/S3 864 VLB/SB16

Reply 27 of 32, by Jo22

User metadata
Rank l33t++
Rank
l33t++
kixs wrote:

Nice reading guys... it brings up memories 😉 I've skipped PPro at that time but read about it a lot. I didn't even bother searching for one now.

Same here. 😀 I became aware of the Pentium Pro because of Visual Basic 6.0 (or more precisely Visual Studio 6).
It had an "optimize" feature for PPro. Because of the difference in language, that option was a mystery for many years for my fellow countrymen.
No one had an idea, what it really means. Some were worried, it would break compatibility with i586 or newer processors and whatnot. *sigh*
Anyway, it seems that option simply creates code and data structures (pointers, etc.) that are always 32bit wide, so they can be handled well by the PPro.
On a 486, for example, this won't cause any trouble except for a slightly degraded performance.

vb6ppro.gif
Filename
vb6ppro.gif
File size
5.88 KiB
Views
901 views
File license
Fair use/fair dealing exception

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 28 of 32, by Scali

User metadata
Rank l33t
Rank
l33t
Jo22 wrote:

Anyway, it seems that option simply creates code and data structures (pointers, etc.) that are always 32bit wide, so they can be handled well by the PPro.

I would assume it also favours not using partial registers, except with xor reg, reg/sub reg, reg or movzx.
The fun part is that instead of movzx reg, byte ptr [var], compilers generally did a xor reg, reg ; mov al, [var]. This because movzx was a 3-cycle instruction on 486 and Pentium, and the xor/mov could be done in 2 cycles.
Still suboptimal for these CPUs of course, because they could just do the mov al in 1 cycle without a penalty.
It's a typical case of CISC showing its legacy: movzx was once added to save space and cycles with the 386, but as CPUs got more advanced, they couldn't implement it as fast as just doing it 'manually' anymore. So there's no point in the movzx instruction on those newer CPUs.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 29 of 32, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Scali wrote:
I would assume it also favours not using partial registers, except with xor reg, reg/sub reg, reg or movzx. The fun part is that […]
Show full quote

I would assume it also favours not using partial registers, except with xor reg, reg/sub reg, reg or movzx.
The fun part is that instead of movzx reg, byte ptr [var], compilers generally did a xor reg, reg ; mov al, [var]. This because movzx was a 3-cycle instruction on 486 and Pentium, and the xor/mov could be done in 2 cycles.
Still suboptimal for these CPUs of course, because they could just do the mov al in 1 cycle without a penalty.
It's a typical case of CISC showing its legacy: movzx was once added to save space and cycles with the 386, but as CPUs got more advanced, they couldn't implement it as fast as just doing it 'manually' anymore. So there's no point in the movzx instruction on those newer CPUs.

That makes sense, thanks Scali. I assume most of the VB/VS devs at the time were also unsure about the fact,
wheter or not this option would trigger the compiler to insert any i686-specific instructions..
Neither the help file, nor the documentation (MSDN Library CD, if someone had one) made it clear.
(By the way, I heard the VB6 compiler really was a somewhat hacked c++ compiler! Curious, but interesting.)

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 30 of 32, by Scali

User metadata
Rank l33t
Rank
l33t
Jo22 wrote:

I assume most of the VB/VS devs at the time were also unsure about the fact,
wheter or not this option would trigger the compiler to insert any i686-specific instructions..

I would assume not.
The C++ compiler has two separate settings for that:
1) Preference of the scheduler for a certain architecture
2) Usage of new instructions

Since there is only one setting in VB6, I doubt they would use i686 instructions. Also the way they phrased it: 'favor'. Which seems to imply that it's a Pentium Pro-friendly scheduler, but not a Pentium Pro-exclusive path.
In general, Microsoft stuck to 386-only instructions in their compilers for a long time. I wouldn't be surprised if even today their 32-bit compilers would only generate 386-code unless you turned on specific settings.
What I do know is that I compiled code with Visual Studio 2010 some time ago, which I successfully ran on my 486.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 31 of 32, by Jo22

User metadata
Rank l33t++
Rank
l33t++

Thanks a lot for the explanation. I can't disagree with this argumentation.

Scali wrote:

Also the way they phrased it: 'favor'. Which seems to imply that it's a Pentium Pro-friendly scheduler, but not a Pentium Pro-exclusive path..

Yes, in English language versions this was rather well phrased.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 32 of 32, by Azarien

User metadata
Rank Oldbie
Rank
Oldbie
Scali wrote:

I wouldn't be surprised if even today their 32-bit compilers would only generate 386-code unless you turned on specific settings.

Since VS2012, SSE2 is enabled by default in place of FPU even in 32-bit mode (and it can use SSE/SSE2 for some integer and memory-copying operations too).

You can disable it, though. But I don't know what that exactly means.
Visual Studio officially requires at least XP on target machines since VS2010 up to and including VS2017, and XP officially requires Pentium MMX. It doesn't make sense to limit the compiler to 386 or 486 instructions.