VOGONS


First post, by justmex

User metadata
Rank Newbie
Rank
Newbie

Hello there, i'm curious about the peformance of the K5 and K6 family in old DOS apps, since all models are unlocked(good for speed sensitive apps), with the L2 cache disabled, would they perform almost identical at the same clock speed or the IPC between them is that much different?

Reply 1 of 11, by jakethompson1

User metadata
Rank l33t
Rank
l33t

K5 is pretty obscure (I've never seen one in-person) and K6 was a complete redesign.

Reply 2 of 11, by Anonymous Coward

User metadata
Rank l33t++
Rank
l33t++

Well, a different design...not exactly a re-design, since AMD didn't design it.

I remember somebody did a 133MHz comparison with a whole bunch of different CPUs including the K5. I believe it was feipoa. If you can dig up that thread you might be able to find your answer (or part of it).
The K5 uses the "PR" system, and the fastest one (PR200) runs at 133MHz, that's why that speed was chosen for the test.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 3 of 11, by Gmlb256

User metadata
Rank l33t
Rank
l33t

K6 microarchitecture is based on NexGen technology.

K6-2 is identical to the vanilla K6 but with support for 3DNow! instructions, later it got a CXT revision that brings write combining to improve graphics performance. K6-III is pretty much a K6-2 CXT with 256 KB of on-die L2 cache.

Lastly, there is also the K6plus (K6-2+ and K6-III+, with 128 KB and 256 KB of L2 cache respectively) that are more efficient and lets you change the CPU multiplier on the fly thru software, from 2.0x to 6.0x (excluding 2.5x).

Reply 4 of 11, by Sphere478

User metadata
Rank l33t++
Rank
l33t++

K6-3+ I remember noticing once that there was one more myltiplier that was supported in software that you couldn’t get to in hardware. It may have been 2.5x

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 5 of 11, by justmex

User metadata
Rank Newbie
Rank
Newbie
Gmlb256 wrote on 2024-07-12, 23:39:

K6 microarchitecture is based on NexGen technology.

K6-2 is identical to the vanilla K6 but with support for 3DNow! instructions, later it got a CXT revision that brings write combining to improve graphics performance. K6-III is pretty much a K6-2 CXT with 256 KB of on-die L2 cache.

Lastly, there is also the K6plus (K6-2+ and K6-III+, with 128 KB and 256 KB of L2 cache respectively) that are more efficient and lets you change the CPU multiplier on the fly thru software, from 2.0x to 6.0x (excluding 2.5x).

These new instrictions set I guess only get use by newer apps (win9x), if that were the case, a K5 at 133 mhz vs k6-iii at 133 mhz (no l2 cache) would perform almost the same under DOS apps?

Reply 6 of 11, by Nemo1985

User metadata
Rank Oldbie
Rank
Oldbie

They are different architectures it would be weird that just because they run at the same frequency they will have the same performances.
As someone told you look for feipoa round up, he compared all the available cpus, it will clarify your doubts.

Edit: Here is the link, have fun: The Ultimate 686 Benchmark Comparison

Reply 7 of 11, by Anonymous Coward

User metadata
Rank l33t++
Rank
l33t++

I'd have to go back and re-examine the results of the 133MHz test, but if my memory is correct then the K5 actually outperforms the K6 at the same clock speed (not PR rated speed). Like the Cyrix 6x86 the K5 did more work per CPU cycle, and like the 6x86 the architecture did not scale very well, which is why AMD bought Nexgen and used their design instead.

I also seem to remember that the K5 did okay against the K6 at PR rated speed. Slower in some tests, faster in others.
In the summer of 1997 when I was building my K6-200, I could also get K5-200 for about half the price. I think the main reason I passed was lack of MMX and worries about being fully Pentium compatible.

"Will the highways on the internets become more few?" -Gee Dubya
V'Ger XT|Upgraded AT|Ultimate 386|Super VL/EISA 486|SMP VL/EISA Pentium

Reply 8 of 11, by MSxyz

User metadata
Rank Member
Rank
Member

The K5 was an in house design by the same team who designed AMD old RISC processors (which were quite popular in embedded applications back in the '90s). If I remember correctly, the K5 core logic was to be their next RISC processor, but AMD also had to come up with a new x86 processor, sine the market for 486 was dwindling down. The early K5s were a disappointment, barely faster clock for clock than a Pentium and also realized on old process. The redesign arrived only in the second half of 1996 and was a fast and efficient processor, although its short pipelines and complex design didn't allow to scale well. Oh, its FPU is supposed to be very fast: even if it's not pipelined it has a very low latency. If I remember correctly, in the 133MHz test someone mentioned above, clock for clock it placed right behind the original P5 in DOS Quake. Since Quake DOS graphic routines were hand coded and optimized for the P5 architecture, it wouldn't be far fetched to think that if somebody optimized the code for the K5, it would probably place first.

Much of the K6 design was a contribution of AMD acquisition of the NexGen team of engineers. It's a different design. Much of its speed derives from having 2x 32KB caches, an additional 20KB 'pre-decode' cache (it's not a true instruction cache like Pentium IV 'trace cache', although it's also way to speed up the fetch, align and decode operations) and last it has a 8KB BTB. It has longer pipelines than the K5 and this allows it to scale up to 500Mhz at 0.25u and around 750 MHz at 0.18u, although the fastest commercial version never went beyond 570MHz (95Mhz x6).

The K6-2 introduced 3Dnow instructions in which a pair of 32 bit floating point instruction are processed in parallel and stored in a single 64 bit floating point register. Sort of MMX but for floating point numbers. Possibly, the best addition of 3dNow are the approximated reciprocals and approximated reciprocal square roots of a pair of floating point numbers. Some RISC architectures already had these instructions which were useful in 3D graphics before T&L became a thing. Unfortunately for AMD, I don't think any DOS software made use of 3DNow. A 3DNow optimized version of DOS Quake would have blown the P5 and even P6 out of the water.

With 3Dnow, not only 3D computations could have been done in parallel on two numbers at the same time , but since the x87 architecture has only 8 registers arranged as a 'shallow' stack, halving the register requirements would also have made quite a difference.

In addition to that, texturing done right (no wobbly pixels like on the Playstation!) requires perspective correction using a divide instruction. With 3Dnow you could do two in parallel in much less time, using the approximations. IF I remember correctly, also the first Voodoo card needed dX/dW and dY/dW perspective correction parameters computed by the CPU for each vertex.

Reciprocal square root is used in light calculations and it's so important that, in Quake III, John Carmack wrote a software routine that computed the approximation much faster than using the FPU. With 3Dnow you could have a single, dedicated instruction for that. How's cool?

But, like I said before, none of that is useful under DOS. To my knowledge, nobody ever optimized Quake or any other early 3D game running under DOS for these extensions. Even AMD abandoned 3Dnow for SSE with second generation Athlons.

Last, the K6-2 core moved the FXCHG instruction from the FP unit to one of the integer units and this is possibly the only modification that is beneficial to DOS Quake, because this instruction is used a lot.

K6-III was a K6-2 with a larger, slightly slower, unified cache that acts as a L2 cache, except that it runs off its own dedicated bus and at the same frequency of the core. It allows the CPU to scale better with frequency as it decreases the need to fetch data over the slower external memory.

Reply 9 of 11, by justmex

User metadata
Rank Newbie
Rank
Newbie

Thanks everyone for your detailed answers! Interesting that K6 is not a succesor of K5 but from NexGen.

Reply 10 of 11, by MSxyz

User metadata
Rank Member
Rank
Member

Yesterday, after writing my post I surfed around a bit for extra info and found this:
https://websrv.cecs.uci.edu/~papers/mpr/MPR/A … CLES/081401.pdf

It seems the K5 instruction cache is also actually a 'pre-decode cache', meaning that the fetch-align-(pre)decode happens before a sequence of instructions is stored into the L1 cache. So, it seems this feature was not a novelty of the K6. AMD probably carried it over from the K5, but kept the 32 + 32 KB L1 cache organization inherited from the NexGen 686.

Reply 11 of 11, by justmex

User metadata
Rank Newbie
Rank
Newbie
Anonymous Coward wrote on 2024-07-13, 03:12:

I'd have to go back and re-examine the results of the 133MHz test, but if my memory is correct then the K5 actually outperforms the K6 at the same clock speed (not PR rated speed). Like the Cyrix 6x86 the K5 did more work per CPU cycle, and like the 6x86 the architecture did not scale very well, which is why AMD bought Nexgen and used their design instead.

I also seem to remember that the K5 did okay against the K6 at PR rated speed. Slower in some tests, faster in others.
In the summer of 1997 when I was building my K6-200, I could also get K5-200 for about half the price. I think the main reason I passed was lack of MMX and worries about being fully Pentium compatible.

Indeed, In Doom the K5 is the best performing at 133 mhz!