500MHz P II v.s. 500MHz P III v.s. 550MHz K6-III+ benchmarks!

Reply 20 of 22, by melbar

Posted on 2016-03-04, 20:31

melbar Offline

Rank Oldbie

Rank: Oldbie
Posts: 578
Joined: 2016-01-31, 13:38

kanecvr wrote:

Mainboard: Aopen AX59PRO VIA MVP3 2MB cache, ATX format -> http://www.motherboards.org/mobot/motherboard … pen/AX59%2BPro/

Do you really have 2Mb onbord on your Aopen? Well, when i download the original manual of the Aopen AX59PRO, this is written at specifications:
512KB or 1MB. Do you have a Deluxe version of this board. Anyway, it's used as L3 cache with your K6-III(+).

Attachments

Filename

spec_ax59pro-ol-e.png

File size

47.74 KiB

Views

788 views

File license

Fair use/fair dealing exception

#1 K6-2/500, #2 Athlon1200, #3 Celeron1000A, #4 A64-3700, #5 P4HT-3200, #6 P4-2800, #7 Am486DX2-66

Reply 21 of 22, by kool kitty89

Posted on 2016-03-08, 20:34

kool kitty89 Offline

Rank Member

Rank: Member
Posts: 434
Joined: 2012-02-15, 08:43
Location: San Jose, CA

meljor wrote:
I never was much of a quake fan, so i don't know. Maybe quake doesn't rely as strong on fpu then?

And i didn't say it can't run games nicely because it can and a whole bunch of them as well, p2 is simply quite a bit faster in most of them.

Quake and Quake II are VERY FPU-intensive, but more so in the software renderers as they do weird things with pixel data buffering in the FP registers. (Quake I is also extremely P5-specific in its optimization, to the exclusion of P6 and everything else -though also seems to have an affinity for Socket 5/7/SS7 motherboards, possibly due to the direct-mapped board level cache arrangement, but I haven't seen tests with board level cache enabled/disabled to demonstrate this -using a K6-2+ or K6-III would be significant there with the onboard L2 putting things more in line with Slot 1/370 or Socket 😎

Quake II is P6-friendly, but excludes taking advantage of the quirks/efficiencies the K6 family and Cyrix CPUs offered.

MMX does most of what Quake's weird FPU pixel buffering/operations -and perspective correction caluclations- does better and faster -albeit with integer precision limits vs FP- and does so much more consistently across different CPUs. Unreal's software renderer should be a much more fair comparison of MMX-enabled CPUs of the era, though the non-MMX capable ones will look a bit bad. (even the 6x86MX with its relatively modest MMX unit will smoke the 6x86 classic at the same clock speeds)

As an asside: Unreal's software renderer has a pretty decenty selection of detail options accessed through the console (using the 'preferences' command) that can disable that ugly dithered 'fast translucency' (which doesn't speed things up much anyway, at least with MMX enabled) and you can also enabled/disable their pseudo bilinear filtering effect that's normally forced at resolutions over 400x300 (or force-disabled at lower res -and all resolutions on non-MMX CPUs)

I'm not sure if Unreal or Unreal Gold uses 3DNow! but given the way the software renderer works, it probably wouldn't have been tough to include. Hell, I wouldn't be surprised if they used MMX for the geometry engine rather than going floating-point there (aside from the P55C -with its relatively slow integer multiply- all the Socket 7 and Slot1/S370 -and socket 8- based CPUs can do integer matrix math faster in MMX than on the FPU). Given how 3DNow! is mostly just a floating-point extension of MMX's feature set, it wouldn't be tough to adapt the code to use that in place of some (or most or all) integer MMX ops. (the P5 family is the only one of the bunch that has a significantly faster FPU multiply than ALU -let alone MMX- with all others -including P6 Intel chips- having integer multiply faster or approximately as fast as FPU multiply -and usually with better parallelism given the greater superscalar optimization of the integer pipelines and broader wealth of register -and renamed registers- resources for ALU operations)

On accelerated software, a lot of engines/drivers just use plain (not special case -like Quake) FPU grunt for the 3D math and given the relatively light AI/logic portion of games of the period the CPU's geometry was the main bottleneck. Software rendering favors non-FPU ops more and more as resolution goes up (except for Quake's special case texture mapping engine where FPU ops get heavier -especially on divide- as you scale resolution) but this isn't the case for 3D cards. (res scales up with CPU geometry being the main limit until GPU fillrate hits and starts slowing things down at further res increases)

The K6-2's combination of very fast integer operations and strong MMX performance should make MMX-accelerated software renderers favor it more than most game types of the period.

Unreal's also a neat case as it supports 16 and 32-bit color when most other software renderers were 8-bit (256 color), and in 32-bit mode actually has some nicer shading/lighting than the Voodoo2/3 is capable of, albeit without texture filtering. (shading/lighting and fog lack the color banding and dithering artifacts of the Voodoo, though you're limited to no texture filtering or the fake dithered texture filtering -which looks OK at higher res, but it kind of depends how much you like blurred texures over raw nearest neighbor pixels -particularly with how high res Unreal textures are)

Another plus is texture resolution/detail has next to zero performance on the software renderer. Where 3D accelerators benefit a lot from dropping texture res (usually due to fitting better into texture caches -bigger deal on some GPUs than others, like Rage Pro and Riva 128 vs Voodoo2/3/Banshee) as pixel fillrate and not textel fillrate is the main limiting factor on the software renderer. (so you can just max out detail settings and swap out resolution alone for performance gain -32-bit rendering is slightly slower than 16-bit, but still not very noticeable and well worth the visual quality gain)

Plus, the Dirext3D renderer has some bugs (aside from crashing ones, there's a few things like depth and Z errors or scene clipping/culling errors that show up on a number of cards) so visual quality might not be better than software rendering depending on your preference. (especially with in-era cards and considering the AGP issues with some cards on SS7 boards) The Voodoo 2/3 seems to avoid most/all Direct3D errors, but you might as well run the Glide renderer anyway. (I'm not sure if the Glide renderer's geometry pipeline makes use of MMX or 3DNow! or whether it's super P5 or P6 schedule biased like some other games, but I somewhat doubt it's K6-schedule optimized as that -as nice as it would be- would take the most going out of their way to do -and 3DNow! optimization would be much much more worthwhile ... or an integer MMX geometry engine for that matter would benefit everything but P5)

And a note on integer vs FP: there's some added rounding headaches to deal with (or risk visual errors like seams between polygons) but on the whole, especially for stuff of this era, fixed-point geometry works perfectly fine and it was only the raw speed of the P5's FPU (and sluggest ALU multiply) that favored floating point geometry engines over integer or MMX based ones. (all the consoles prior to the Dreamcast used 16-bit or 32-bit integer math for their geometry, mostly with DSP coprocessors or fast CPU math, but the N64's GPU included a fixed-point -integer- vector processor more analagous to MMX -the Dreamcast's SH4 CPU had a floating point vector extentions more like 3DNow! or SSE -I think more akin to SSE as they featured 128-bit registers rather than 64-bit)

I'm not familiar with how many drivers or engines actually opted for integer based 3D geometry engines, but on paper there was plenty of reason to do so at the time. (an MMX geometry engine would be very fast on PII/Celeron processors as well as K6-2/III -and decently fast on K6 classic and Cyrix MX/MII or Pentium MMX) And prior to SSE becoming common, MMX was the most universal feature set offering the fastest 3D vertex performance potential, so plenty of reason to support it.

I know DirectX 6.x incuded 3DNow! (and eventually SSE) support, but I'm not sure if it falls back on raw FPU or MMX operation if both of those are absent.

DirextX 5.x based games would fall back on FPU-driven geometry engines almost for sure, though. (DirextX 5 included some sort of MMX support ... or detected its presence at least, but I don't think there was much optimization for it, let alone full geometry/T&L engines and while a raw ALU-based geometry engine would work well for everything except the P5 family -including P6- that sort of programming seemed just plain unpopular for whatever reason -even though a lot of common 3D accelerators from D3D5.x's heyday relied on integer based pipelines with precision errors such that FPU based geometry was rather pointless ... and even sloppy 16-bit integer geometry math with rounding errors wouldn't be very noticeable)

BTW since both 3dnow! and SSE are heavily used in 3dmark99 it is normal for your p2 to score much lower there.

Given 3DMark99 is a DirextX 6.x based benchmark, this should rely on the Dirext3D drivers installed and similarly benefit any games that use them. (which should be almost anything released from 1999 onward that wasn't OpenGL-specific or proprietary, and probably a number of 1998 releases too -like Unreal)

I've seem anecdotes about DirectX 7 dropping 3DNow! support, but I'm not sure this is correct. (might depend ont he drivers, but it's also notable that 3DNow and SSE will matter much less if you have hardware T&L enabled -a K6 classic, PII, and Celeron should all do around the same with a DirectX 7 engine at comparable clock speeds -Cyrix 6x86 too for that matter, though probably not close to its PR rating, maybe slightly better than equal clock but not a big margin)

3DMark 2000 with a Radeon or Geforce would likely even the playing field and (with sound disabled) render multimedia extensions moot as well, aside from possible physics tests. (3DNow! and SSE would be significant for physics handling)

melbar wrote:
kanecvr wrote:

Mainboard: Aopen AX59PRO VIA MVP3 2MB cache, ATX format -> http://www.motherboards.org/mobot/motherboard … pen/AX59%2BPro/

Do you really have 2Mb onbord on your Aopen? Well, when i download the original manual of the Aopen AX59PRO, this is written at specifications:
512KB or 1MB. Do you have a Deluxe version of this board. Anyway, it's used as L3 cache with your K6-III(+).

Sandra can tell the board level cache size pretty fast. (one of the easiest ways I found ... confirming on the motherboard itself can be troublesome as some cache SRAMs don't have easily accessible datasheets -and counting the number of cache chips isn't a good route as boards of that era used both 512kB and 1MB 64-bit pipeline burst SRAMs -though from most I've seen, all MVP3 boards that use 2 SRAM chips are 2MB while 1MB boards use a single 1 MB SRAM and 512kB boards use a single 512 kB SRAM -confusing given the PCB often has '512kB' printed on the unfilled SRAM spot, making one assume the single SRAM is only 512 kB -FIC's 503+ and 503A confused me with this)

Reply 22 of 22, by kool kitty89

Posted on 2016-03-08, 20:53

kool kitty89 Offline

Rank Member

Rank: Member
Posts: 434
Joined: 2012-02-15, 08:43
Location: San Jose, CA

kanecvr wrote:

Some games were written for the Pentium's pipeline - for example GL_Quake and software quake will run faster on a 233MHz pentium MMX then they will on a 233Mhz pentium II - you can google it (P55C's advantage in these games caused quite some noise when the P2 launched too) - so older games should perform better on a P1 and PII due to optimizations - but not so much on a P3 for some reason. The more the CPU's architecture deviates from the P54 / P55C's design, the bigger the performance hit. Quake 1 is just one example. Another would be DOOM witch doesn't even use a FPU - and as far as I recall it also runs faster on a Pentium 1 then it runs on a Pentium II with the same clock speed.

GLQuake is not a good example of what you describe (it scales similarly to Quake II in OpenGL) but Quake I's software renderer is.

Quake I's software render (not GLQuake) is also a good example of non-P6 optimized code competing on roughly even footing (clock for clock) between the K6, PPro, PII, Celeron, and 6x86 (classic and MX/MII). Quake II's software renderer is very P6 biased though as is GLQuake and Quake II's OpenGL driver. (though later revision OpenGL drivers for quake II do better and better on the K6 and K6-2 -even without the 3DNow! Patch- while having no gains on the P5 and P6, and the same goes for later updates to the MiniGL drivers -though there's a negligibly small loss in P5 performance with the K6 optimized drivers)

See: The Ultimate 686 Benchmark Comparison
Unfortunately no Cyrix CPU results to compare there. (though if it's K6 FPU schedule optimization in play, I'm not sure that would help Cyrix CPUs any -Cyrix used a fairly slow FPU with a decent sized buffer for ALU parallelism, AMD used a fairly fast single stage FPU at the mercy of the RISC86 scheduler -the Cyrix FPU also wasn't 486 slow as some sources imply, but it was still much slower in common add/subtract/exchange/multiply operations than the K5, K6, P5, or P6 -the K6 ranks as fast or faster than the P5 and P6 in some FPU benchmarks because it IS faster when optimized for or when no optimization for either architecture is applied, but using Intel compilers or intel assembly language rules for the P5 or P6 will definitely put the K6 at a disadvantage -also note the FPU performance is by far not the only are Intel-specific compilers crippled the K6 -and Cyrix- potential, including several areas where the P5 and P6 were slow and thus avoided using instructions that were much faster on the competition -like LOOP- and also while older software targeting the 386/486 -even fully 32-bit software- had a greater bias towards the non-Intel parts)

There's some neat info on this here:
http://www.azillionmonkeys.com/qed/cpuwar.html

Tertz wrote:
Comparision of CPUs should be done in software modes without 3D acceleration and not with different video cards. For example, Unreal has software mode.
As Pentium 3 coppermine would be interesting to use also as more common P3.

There's a very comprehensive compilation of tests using Quake I, 2, and MDK in software modes and OpenGL (and D3D for the latter), though unfortunately no Unreal and also at 640x480 so more heavily biasing id's quirky Pentium-FPU heavy texture mapping. (320x200 would've been neat)

The Ultimate 686 Benchmark Comparison

No Unreal though, which would be the interesting one to compare for MMX and maybe even 3DNow! in software mode. (if it uses MMX for geometry calculations, I'm not sure 3DNow! would help at all as it's not really faster than MMX on the K6-2 -and is also quite fast, if not quite as fast as the K6-2 on the PII/III/Celeron -the K6 classic has a slower MMX unit, thoug and should be slower than any of the others though faster than a Pentium MMX or 6x86MX)

havli wrote:
huh?
Running games in software rendering is kinda pointless - that's the reason 3D accelerators were invented.

Running any of these games on these old systems is kind of pointless ... it's for the sake of comparison though, and back in 1998 or 99 there was very real reason to consider software rendering on a fast CPU ... or even a modestly fast CPU (MMX capable) on Unreal. (especially if you wanted max texture res ... and didn't have a Voodoo2 or better)

See above for more details on the benefits of Unreal's software renderer.

From a benchmarking perspective, Unreal is probably one of the best to test in software as it's not weirdly Pentium optimized like Quake and also uses high/truecolor video modes with tons of features (and makes heavy use of MMX ... and maybe 3DNow! -the K6-2 was released after Unreal, but very close, close enough that the programming specs would've been well within Unreal's design cycle and even more likely to be added into the Unreal Gold re-release -possibly SSE enabled too)

Testing at 320x240 in software mode vs higher resolutions (might as well go really high, 1024x768 or better) to contrast geometry performance (the bigger bottleneck at low res) with fillrate scaling.

Edit:
Another thought on Unreal's MMX renderer: given the overhead between switching between MMX and FPU operations, there's even more incentive to use an integer MMX based geometry engine and not use the FPU at all. (this might be true for hardware accelerated modes as well, aside from the Direct3D renderer, which uses whatever the DirectX drivers assigns it -which would include 3DNow! and SSE -but likely offer no performance gain over MMX, aside from P55C which probably falls back to raw FPU grunt and thus might run faster than the MMX drivers or break even -MMX is used for the sound driver in all cases though, so FPU use would take a hit there)

I'm not sure if Phil's Voodoo2 tests used Glide or OpenGL for Unreal (might be OpenGL to better compare with Quake) and if the Glide version used a custom MMX based geometry engine, it wouldn't show up in those tests. (the P5 and P6 are clearly favored and not representative of the K6-2/3's strong MMX performance)

kanecvr wrote:

I would have liked to use a coppermine in my tests but the slowest one I have is 700MHz (7x100), socket 370, and it's multiplier locked. In any case, I'm sure a 500MHz coppermine (if one could force it to run at that speed) would whip everything in my tests, including the katmai. I'll add a celeron 466 to the test suite as soon as I find more time for benchmarking. Coppermines are quite common these days, but I don't have one in slot 1 format and the're not easy to mind in these parts - most common slot 1 CPU's here are 450 to 600MHz katmai and 266-350MHz deschutes. In fact I've personally never encountered a slot 1 coppermine in the wild....

You might be surprised. The Coppermine 500 or 600 doesn't do much better (or any) in a lot of tests compared to the Katmai or PII at the same clock/bus speed. (PII SSE-support dependent)

Both the 686 benchmark thread and Phil's Voodoo SLI scaling tests show this. (and I believe it's games that already favored the Celeron -especially at 100 MHz FSB- over the PII that more consistently favor Cop over Kat at similar clock/bus speeds, relatively few seem to heavily benefit from the coppermine cache exclusively, at least not from that era)

As for Coppermine PIII's in the wild: I've seen a fair share locally (Santa Clara County, CA -so ... Silicon Valley) in old systems, garage sales, remaindered workstations (I've got a Compaq Slot 1 733 MHz PIII i820 RDRAM based system at home that Dad got from work when they discarded it well over a decade ago) and a mix of all sorts of Celeron and PIII's down at Weirdstuff warehouse. (they started selling most of their CPUs online a few years back, but a lot of Slot 1 CPU cartridges end up tossed into their heat-sink/fan bins intact) Or at least this was the case a couple years ago when I was frequenting it more. (also lots of socket 370 to Slot one adapters and even a few Socket 8 to Slot 1 adapters -or ... one or two once in a while ... huh, running a PPro on a BX board would be interesting, assuming it's compatible -not sure how many BX boards would supply the correct core voltage for that matter)

Oh, and I forgot to mention it earlier, but 640x480 might be too HIGH to properly show the full CPU dependency (with hardware acceleration) in a lot of the games in question, especially above 400 MHz. (it took 640x480 AND SLI to show substantial scaling above a PII 400 in Quake, Quake II, Forsaken, or Unreal in Phil's Voodoo tests, and 512x384 SLI to show much beyond 700 MHz) Some games might not support low resolutions, but I know Unreal, Quake, and Quake II do. (aside from Unreal's OpenGL renderer)

If the pure interest is CPU performance with GPU not being a bottleneck, I'd just go for the lowest resolution options available. (software rendering is another story and performance scaling with resolution is more interesting to compare there as it's CPU dependent and GPU independent -aside from PCI/AGP DMA bandwidth)

Main menu

Common searches