McMick wrote:Just a comment about FPU and Cyrix / AMD / Intel and Winstone 98: I made the mistake of relying on that benchmark to make my purchasing decision when it came time to replace my P166. I could have gotten a pentium 233 MMX, but instead I opted for the K6-233, based on reviews that used Winstone/Winbench 98, Tom's Hardware being prominent among them. We ran that demo in the store I worked at all the time, as it had a loop function. The problem was, and I didn't really understand this until later, that AMD's floating point performance suuuuuuuucked compared to the intel chip. So why did the Ziff Davis benchmarks show the AMD chip as faster than the Intel chip? Because none of the programs used in the Winstone 98 benchmarks used floating point arithmetic. They are all integer-based programs!
Looking at some actual K6 and pentium MMX benchmarks, the FPU performance actually seems comparable, though it would vary by benchmark and application in question. (due to the types of floating point operations used -performance isn't distributed evenly for both FPUs, plus there's the issue of mixed/simultaneous FPU/Integer performance and how well the CPU handles that)
This comparison looks quite favorably on the K6 FPU:
133 MHz Challenge - 5th/6th gen CPU per clock performance
However, as mentioned in several previous posts, there's more to the picture than raw floating point performance too.
There's the huge issue of overall optimization of an application for a specific CPU architecture, and that's a massive issue for games like Quake (which was hand-tailored to the P5 architecture, not just FPU heavy, but catering to pentium specific advantages and avoiding its weak points -while similarly ignoring the trade-offs of 486s, 5x85s, K5s, and 6x86s -let alone 386s).
Quake was the first notable game to do this, but others followed (though it wasn't really routine, and many 1995/96 games still optimized for 486 -or even 386- and thus also tended to favor 5x86/K5/6x86 chips much more -as those ran 486 optimized code much better -again, Tomb Raider and Descent are good counter examples, as would be Wing Commander III and IV -as far as texture mapped polygonal 3D games too)
The bigger issue appeared with API programmed games relying on DirectX, OpenGL, or various early alternative APIs.
And in those cases, performance was largely up to the drivers used and how well those catered to a given CPU. (if only pentium-optimized drivers were available, then you were out of luck, but there certainly would have been the technical possibility for patches/drivers catering to 486/6x86/etc or even using only fixed point libraries -no FPU use at all, heavily catering to the 6x86 and K5's strengths -and to lesser extent, K6, and of course 486 and C6 -and potentially allowing software compatibility with DLC/DRX/SX/386 chips without FPUs at all)
MMX performance could also be a factor for software supporting it. (with either lack of MMX support or lower performing MMX -or different performing, not catering to the same set of trade-offs as Intel's implementation)
And on that note, even if there were patches or alternate drivers available, it would tend to be up to the user to find and install those to improve performance. (the default drivers would almost certainly be Pentium optimized -at least from the late 90s onward, when 486 had fallen out of the mainstream)
As I said in a previous post, I don't have enough personal experience on this issue to say how available such drivers were, or how they performed when they did exist. (it really wasn't a problem for most games and multimedia stuff up through 1996 -Quake being the sole major exception- since games/multimedia software weren't solely optimized for Pentium -and often not optimized at all for Pentium, but more 486 specific)
I was pretty young at the time all of this was going on and I haven't researched further on this specific issue. (at the time my family generally sidestepped it anyway, so my dad's experience doesn't really help -he did a lot of tweaks/patches on a lot of stuff -including stuff like getting beta DVD video drivers to work for our Rage Pro PCI card, but we didn't use the chips/systems that had those major problems with games -we went from 486 to Pentium Classic to K6-2 300 then K6-2 550 and then to various socket 370 and Socket A based systems -and a coupled one offs like the Pentium Overdrive upgrade for my dad's 486 office PC and some slot 1 stuff from his work -and a Pentium Pro- He usually bought from local wholesalers too back then, so the retail markup was avoided -and there were much better than average deals on Intel parts, especially from overstock, which included the Socket 7 Pentiums and Pentium Overdrive -otherwise he'd almost definitely have ended up with a Cyrix or AMD 5x85 -and he did do something like that with a 486 DLC previously)
You technically don't even need an FPU at all to allow those sorts of games . . . it makes life a bit easier on programmers (and allows for better accuracy in some operations), but fixed point math is a quite viable alternative. (for handling matrix math for 3D vertex calculations, shading, texture mapping, perspective correct rendering, etc -and that's still what's used today on embedded/portable systems without hardware floating point support, as well as almost all 3D PC games up t0 1996, all game consoles prior to the N64, and all )
With a 3D accelerator card, only the 3D vertex math would be handled by the CPU with the rest done by the GPU (until hardware TL& came alone -then the CPU didn't even need to do that), on top of running the game engine itself of course. (logic, physics, AI, etc)
The bias on floating point performance of the P5 architecture was a huge catalyst for game programmers to start relying on the FPU for certain operations (since the P5 Pentium's FPU was actually faster at some operations than the ALU -like multiply- and the dual pipelined FPU and superscalar architecture allowed for simultaneous execution of multiple floating point operations as well as floating point and integer operations -the 6x86 only allowed int+int or int+float to execute simultaneously, not float+float).
But FPU usage alone was, again, only part of the overall problem with Pentium optimized code/compilers/libraries/drivers being the real issue. (catering to both integer and floating point operations that worked best for the P5 architecture specifically -so often underutilizing both ALU and FPU of other chips -let alone special features/functionality not present on the pentium)
There's also the issue of motherboard performance, though that's much more of an issue for the Cyrix chips than K6 it seems. (K6 worked better with a wider range of boards)
For AMD chips, the popularity of the massive K6 led to an increase in priority to support architectural-specific optimizations, and especially the introduction of 3DNow! supporting software with the K6-2. (as well as the 100 MHz FSB/L2 cache and improved MMX performance)
Plus, for the Cyrix chips specifically, the PR rating being significantly higher than the clock rate (due to the fast ALU) made it look especially bad compared to the K6 and Pentium while the actual per-clock FPU performance wasn't that much worse. (ie a 3x66/200 MHz -PR 233- 6x86MX should have held up decently well against a Pentium MMX 200 for FPU intensive applications without biased optimization)
feipoa wrote:
I wonder if any of the older (non 100 MHz) 6x86s/MIIs will actually well at 100 MHz bus. (and multipliers to match similar or lower clock rates to their rated speed -so avoiding heat/core stability problems)
A great question. I should test for this. Considering that the Cyrix 5x86 seems to run OK at 2x66 MHz implies there is some wiggle room with the buses. The MII-400 might be wiggled out though. It is clear to me that they were gunning for a 100 MHz bus. Perhaps the MII-400's are just nice performing MII-366 2.2V pieces and some new growth/yield strategies were implemented for the MII-433? Sometimes little things like humidity and temperature combinations during wafer growth can make a break your yields.
I wonder how many overclockers attempted to just increase the bus speed on Cyrix chips.
Most of the (non 2.2V) 6x86/MII family chips were known to be poor overclockers (already near their limits at rated speeds -and relatively hot running as well), so increasing the core frequency wasn't very practical, but increasing the bus speed could have been more attractive. (at least from the heat generation standpoint -no higher core clock or voltage to increase heat dissipation)
I was actually wondering about that while reading the Red Hill CPU descriptions. They did specifically mention overclocking the K6 300 to 3x100 (and that it worked about 2 out of 3 times), and Cyrix was lagging in getting 100 MHz bus parts out in general (and 83 MHz -and to lesser extent 75 MHz- on many motherboards tended to be finicky, while 100 MHz was not).
A 2x100 MHz MII probably would have fared reasonably well against a K6-2 300 (for integer and I/O performance at least), and probably would have merited the "300" designation much more than the 3x75 and 3.5x66 parts.
I think you would enjoy reading the Ultimate 486 Benchmark Comparison and Cyrix 5x86 Register Enhancements Revealed (links are on my signature); it will answer some of your questions. There are easier to read PDF files in those links which make for great bedtime stories. Enabling the register feature FP_FAST on the Cyrix 5x86 boosted FPU performance by an average of 18%, and 39% in some tests. The average FPU performance of a 5x86-133 bests the POD83 by about 10 Pentium ratings (so the differance between a P90 and a P100). The exception seems to be with Quake, whereby a properly configured Cyrix 5x86-133 scored 18.4 FPS and the POD83 scored 20.8 FPS. If the Cyrix 100/120/133 data is linearly extrapolated to 150 MHz, a Cyrix 5x86-150 would marginally beat the POD83 in Quake. When overclocking a Cyrix 5x86, it is important to keep in mind that your mileage will vary; not all Cyrix 5x86 next generation features overclock well, however my tests have shown that the important ones seem to overclock.
Yes, that is very interesting and informative, and it addresses a lot of what I was wondering about except the issue of the 6x86 vs 5x86 FPU.
This is what the Ultimate 586/686 Benchmark Comparison will tell us. I plan on including the Cyrix 5x86's in this comparison since they are a sorta 486/686 hybrid. I'm pretty sure the 6x86's will win clock-for-clock for ALU performance since they included two, as opposed to one, integer unit. I also plan on running the 5x86-133 at 2x66 MHz to be a fairer comparison with the 6x86-133 MHz. I'll use the same sticks of RAM, graphics card, etc to be as consistant as possible.
The 6x86 should smoke the 5x86 for clock per clock ALU performance (it's got the dual pipelines and dual integer units for superscalar operation -plus added features not present of buggy/disabled on the production 5x86)
I/O performance should also be better due to the 64-bit bus of the 6x86. (obviously more so for comparing a non-overclocked 33 MHz bus 5x86 vs 66 MHz bus 6x86)
The FPU would definitely be the questionable part though. If the full 6x86 FPU was simple/small enough (in transistor count), they may have implemented it in its entirety in the 5x86 core. If so, that would mean 5x86 and 6x86 parts would have roughly similar per-clock FPU performance (with some variables due to I/O performance) and thus better FPU performance relative to PR ratings of 5x86 parts. (ie an 80 MHz 6x86 vs 120 MHz 5x86)
And this would also make that hypothetical Socket 5/7 5x86 even more interesting. (smaller chip, higher yields, higher clock speeds, and much better matched FPU+ALU performance to the Pentium relative to PR ratings than the 6x86 or k5)
Cyrix was targeting super low power consumption and reduced transistor count with their 5x86 series, which is probably why the die size is so small and the yields so high.
This very much matches the target of the Winchip as well. (though the 5x86 is more advanced than the Winchip -aside from the smaller cache- with generally better per-clock integer and FPU performane)
It definitely would have been interesting to see what Cyrix might have done with a socket 5/7 5x86 derivative to complement the 6x86 in the low end (after they moved away from socket 3 boards).
A small-die, cool-running, high yield part topping at higher clock speeds than contemporary 6x86 parts (with lower PR ratings), but with potentially stronger FPU performance and better matched FPU/ALU performance to the Pentium. (it would have been really weird if the 5x86 ended up becoming a better option for later FPU-intensive games because of that -which really wasn't the case for the Winchip)
I'd have liked to see Cyrix continue with the 5x86 chips like AMD did. The AMD X5-133 is what kept AMD profitable while developing the K5/6. If Cyrix had the resources to simultaneously develop the 5x86 into 150, 160, and 200 MHz pieces, perhaps their fate would have been different. As the literature mentioned, Cyrix had yield issues with their 6x86's and couldn't keep up the pace. Had there been 160-200 MHz Cyrix 5x86's in Q1/2 of 1996, I don't think many people would have bothered buying Pentiums until the PII's came out.
From a business standpoint, moving on to Socket 5 made a lot of sense (catering to the higher end sector tends to have much higher profit margins -even for the relatively aggressive pricing of Cyrix compared to Intel).
For users, longer production and development of the 5x86 would obviously have been very attractive, but if anything, it probably would have made more business sense to invest in a socket 5 based 5x86 than long-term support for socket 3 models. (even at even lower/more aggressive prices than 6x86 parts, bottom-end socket 5 parts could have had considerably higher margins than comparable socket 3 parts -plus wider and officially bus speeds to work with, faster chipsets, faster RAM, etc)
Or, if they did support socket 3 a bit longer, it still could have made sense to move a 5x86 derivative into socket 5/7 as well.
Albeit, had they actually done that, it begs the question, what would have happened when the MII/MX was replacing the 6x86. (would there have been a 5x86 MX followon directly based on the 5x86 core but with MMX added -and maybe a larger cache- or would it make more sense to just use a small cache version of the MII instead -or just pushing the die-shrunk vanilla 6x86 into that role, though MMX suppport was significant and a 5x86+MMX part might have been more efficent)
Indeed, it would have been more ideal for Cyrix if game makers targeted the MII's strengths, but if I was a game maker, I'd still probably want to appease the big buy, Intel.
While 486s were still significant (or 386s still relevant for that matter), the issue was more open-ended though, which is why a the majority of games even up through '96 were 486 optimized or at least catered to 486 as well as Pentium.
And many offered more flexible detail settings than Quake as well. (again, you can't even disable perspective correction in quake -which is a substantial chunk of floating point overhead- whereas Tomb Raider does allow that and can also run with no FPU present at all)
Again, I wasn't suggesting game developers specifically cater to the 6x86's design quirks specifically (like Quake does for the Pentium), but just a generalized support for non-pentium parts (486/5x86/6x86/K5).
And the same goes for drivers for graphics APIs too. (though, again, I'm not sure such support wasn't actually available in many cases -I'm not sure one way or another -and the main period when this would have been important would be prior to 3DNow! and SSE becoming popular -as floating point performance became incredibly attractive with both of those . . . albeit hardware T&L became common just a couple years after that, rendering that less important for graphics and more important for multimedia sound/video acceleration and physics computations -though much of that has since been offloaded to GPUs as well)