What AMD processors did beat their Intel counterparts?

Reply 60 of 92, by Joey_sw

Posted on 2017-07-14, 06:55

Joey_sw Offline

Rank Oldbie

Rank: Oldbie
Posts: 550
Joined: 2011-08-17, 12:03

My personal belief that IBM PC won the market thanks to Lotus 1-2-3, other similar software competitors/solutions for non-IBM architecture computers was kinda sucks.

-fffuuu

Reply 61 of 92, by Scali

Posted on 2017-07-14, 07:23

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

kanecvr wrote:
P3 and the Athlon have nothing in common architecturally

What a ridiculously wrong claim.
Do you even know anything about architectures? Did you even try to compare them?
Let me, as long-time assembly programmer, give you a few hints.
First, let's take the block diagrams of the PIII:

And the Athlon:

From the get-go you can already see that both have a 3-way x86 decoder, and the general layout is basically the same, except that the Athlon has 3 integer units instead of 2, as I already said (and they use slightly different caching strategies etc).

Then let's look at the pipeline for the PIII:

And the Athlon pipeline:

Look, both are exactly 10 stages!
Look closer, and you see that in general, every stage even does the exact same thing in both architectures.
Eg: The first few stages are fetch and decode (they use slightly different names, but both fetch in the first stage and the last decoding stage is 5, so they basically both need the same amount of stages to do the same thing).
Then stage 6 is the register renaming stage on both.
The final stages have some slightly different names again, but that seems to be more of a detail of how you look at things... Eg, is an instruction scheduled when you read it from the reorder buffer, or when it actually arrives in an execution unit?
Likewise, is an instruction executed when the result is available from the execution unit, or when the instruction is retired?

But at the end of the 10 stages, both CPUs can retire a maximum of 3 instructions per cycle. So the theoretical maximum performance is exactly the same.

Anyway, there are more similarities than differences here. So as I already said, K7 looks basically like a P6 with a few tweaks here and there.
I can tell you, as an assembly programmer, optimizing for both architectures is also mostly the same. They both have very similar latencies and throughput on all instructions, and in general code that is optimal for one architecture will work very well on the other. The main details are in the instruction decoder. The P6 can decode certain instructions in parallel that the K7 can't, and vice versa (the P6 has one 'uber' decoder and two 'small' decoders. The K7 has basically three 'small' decoders, although its 'small' is not exactly the same as the P6's).
Another big difference is that the K7 is not capable of reordering reads and writes, while the P6 is.

Last edited by Scali on 2017-07-14, 12:53. Edited 1 time in total.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 62 of 92, by Scali

Posted on 2017-07-14, 07:54

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Joey_sw wrote:
My personal belief that IBM PC won the market thanks to Lotus 1-2-3, other similar software competitors/solutions for non-IBM architecture computers was kinda sucks.

Indeed. The PC had a few 'killer apps', and Lotus 1-2-3 was perhaps the most important of those. It was very important for clone-makers to be at least compatible with Lotus 1-2-3 (another good clone-test was being able to run Microsoft Flight Simulator).
Note that this is different from how CP/M worked, or how DOS was originally intended: The idea was that the OS was merely an API layer, where OEMs could make their own custom drivers for their keyboard, display and other hardware. So the original idea behind DOS was: as long as you have an x86 CPU, and provide custom drivers for your hardware, the DOS APIs will work. All software should use only the DOS APIs, so the hardware is abstracted, and software will work anywhere.
Because of limitations of the DOS APIs in both functionality and performance, in practice, most DOS applications assumed they were running on an IBM PC (which was the only DOS machine anyway, at the time), and used the hardware directly.
Some very early clones were designed with the original philosophy, so they ran DOS, but were not entirely compatible at the hardware or BIOS level. Apps like Lotus 1-2-3 made it clear that clones had to be hardware-compatible and BIOS-compatible as well, in order to run popular DOS software.

Funny enough, VisiCalc was the spreadsheet 'killer app' before that, which boosted popularity of the Apple II a lot.
But Lotus 1-2-3 did it better, on PC.

Software was all-important back then.
As an Amiga-owner I often ran into that. Yes, my machine was far more advanced than a PC, and also much better value for money, but I couldn't run Lotus 1-2-3 or other popular DOS programs on it. So people just bought PCs anyway (there even were extensions for the Amiga to make them PC-compatible, complete with real 8088 CPU. Commodore offered this already with the Amiga 1000 in 1985).
And of course, most of that software was x86-only, so that's why Intel became so successful (for example, the first Mac release for Lotus 1-2-3 was as late as 1991, by which time the damage to the 68000 had already been done).

Last edited by Scali on 2017-07-14, 10:23. Edited 1 time in total.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 63 of 92, by jheronimus

Posted on 2017-07-14, 10:04

jheronimus Offline

Rank Oldbie

Rank: Oldbie
Posts: 1473
Joined: 2015-12-10, 00:09

Jade Falcon wrote:
And I'm amazed that no one said anything about the Am5x86.

I guess, same goes for Am386DX40 and K6-2/3 family. Yes, they are faster than 386DX-33, 486DX4-100 and P-MMX@233, but Intel already had next generation chip at every point.

MR BIOS catalog
Unicore catalog

Reply 64 of 92, by spiroyster

Posted on 2017-07-14, 10:57

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 696
Joined: 2015-10-12, 12:26

Scali wrote:
I couldn't run Lotus 1-2-3 or other popular DOS programs on it.

I ran Lotus 123 on an Amiga 500+...

Excruciating, but it worked... did 286 stuff too iirc, but of course, any software depedant on timings of some sort ... erm yeah ... just no.

I will add, rebooting back into the Amiga was like going back to the future in comparison to Dos... mouse, higher res, decent sound, more vibrant colours... more text on TV screen! The flight simulators really started showing the Amiga up.

Reply 65 of 92, by Tetrium

Posted on 2017-07-14, 12:31

Tetrium Offline

Rank l33t++

Rank: l33t++
Posts: 9607
Joined: 2010-01-27, 18:53
Location: Netherlands

kwyjibo wrote:
First, I do not want to start a flame war 😀

What I want to know is, in order to be able to build some period correct PCs (gaming or workstation), and I am getting very confused about benchmarks, what AMD high-end processors were considered to be on pair or surpass their Intel high-end counterparts? Athlon 1 Ghz? First Opteron? First Athlon 64?

Tetrium wrote:
Athlon 1GHz is roughly equivalent to Pentium 3 1GHz I'n not sure about Opteron, but my guess is that it's roughly similar in per […]
Show full quote

Athlon 1GHz is roughly equivalent to Pentium 3 1GHz
I'n not sure about Opteron, but my guess is that it's roughly similar in performance to Athlon 64 of the same clock frequency.
Athlon 64 @2.2GHz is roughly equal to Netburst 3.2GHz (A64 is probably a bit faster).
Athlon 64 was about 25% faster compared to Barton clock for clock. Barton 2.2GHz was maybe a little bit slower compared to a 3.2GHz Netburst chip.

I'm not sure if this is what you meant, please correct me if I'm wrong.

Scali wrote:
kanecvr wrote:

Do you even know anything about architectures?

Well...at least I tried 😵

🤣

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 66 of 92, by Phreeze

Posted on 2017-07-14, 12:58

Phreeze Offline

Rank Member

Rank: Member
Posts: 158
Joined: 2016-10-11, 08:24

jheronimus wrote:
Jade Falcon wrote:
And I'm amazed that no one said anything about the Am5x86.

I guess, same goes for Am386DX40 and K6-2/3 family. Yes, they are faster than 386DX-33, 486DX4-100 and P-MMX@233, but Intel already had next generation chip at every point.

that's why the AMDs were upgrade kits for older boards. The AMD X5 133 was like a 133Mhz 486 and Pentium 75 hence the P75 rating in it's title. I think that's pretty cool as those amd x5 run pretty cool on my 486 pci board 😎
core2 ended AMD's streak. I had an opteron before, with a bit of overclocking it performed really fast

ArGUS Parts list: http://bit.ly/2Ddf89V

Reply 67 of 92, by kwyjibo

Posted on 2017-07-17, 07:12

kwyjibo Offline

Rank Newbie

Rank: Newbie
Posts: 40
Joined: 2009-09-20, 09:14
Location: Cartagena, Spain

Tetrium wrote:
Athlon 1GHz is roughly equivalent to Pentium 3 1GHz I'n not sure about Opteron, but my guess is that it's roughly similar in per […]
Show full quote
kwyjibo wrote:
First, I do not want to start a flame war 😀

What I want to know is, in order to be able to build some period correct PCs (gaming or workstation), and I am getting very confused about benchmarks, what AMD high-end processors were considered to be on pair or surpass their Intel high-end counterparts? Athlon 1 Ghz? First Opteron? First Athlon 64?

Athlon 1GHz is roughly equivalent to Pentium 3 1GHz
I'n not sure about Opteron, but my guess is that it's roughly similar in performance to Athlon 64 of the same clock frequency.
Athlon 64 @2.2GHz is roughly equal to Netburst 3.2GHz (A64 is probably a bit faster).
Athlon 64 was about 25% faster compared to Barton clock for clock. Barton 2.2GHz was maybe a little bit slower compared to a 3.2GHz Netburst chip.

I'm not sure if this is what you meant, please correct me if I'm wrong.

Thank you all!

As I said I did not want to start a war but with this subject it seems difficult to avoid it 😀

Yes, I feel you gave me a great response to my question.

Talking about price/performance comparison I think that it is an important matter with current CPUs but once time has passed and you are building computers with old parts just for fun we always try to get the best performance CPUs (although there is always room for "special" cases).

To continue with the discussion, have you seen benchmarks of Athlon Slot A versus Pentium III Xeon Slot 2? Does Pentium III Xeon differ from regular Pentium III in anything but the increased cache?

Reply 68 of 92, by Scali

Posted on 2017-07-17, 07:17

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

kwyjibo wrote:
As I said I did not want to start a war but with this subject it seems difficult to avoid it 😀

Not sure what some people think a 'war' is... But in this case, there were apparently people talking about things they know nothing about, such as claiming that "P3 and the Athlon have nothing in common architecturally" (in a response/'correction' to someone who DOES know what they're talking about).
The response was an in-depth explanation of the two architectures and the glaringly obvious similarities once you actually bother to dive into the architecture.
I'm not sure how anyone can construe that as a 'war' unless they have some kind of personal agenda.

Bottom line is: too many people are talking about things they don't understand anything about... and doing that in a way as though they are an authority on the subject. This is annoying. Some people should know when not to talk.
Facts (and architectural characteristics such as pipeline depth, number of x86 decoders, number of instructions to retire per clk etc are certainly facts) are something that do not require any kind of discussion, let alone a 'war'. Just make sure you get your facts straight before you open your mouth. Worst mistake you can make is opening your mouth trying to 'correct' someone who DID have their facts straight, and you don't know what you're talking about.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 69 of 92, by Palladium

Posted on 2017-07-17, 10:18

Palladium Offline

Rank Newbie

Rank: Newbie
Posts: 35
Joined: 2017-01-14, 17:30

The only times AMD decisively won in performance post-P5 against Intel's best was Athlon Classic vs P3 Katmai, Palomino AXP vs Wilamette and A64 vs P4 era, and it wasn't until 2002 they finally got a chipset that doesn't suck.

Reply 70 of 92, by Tetrium

Posted on 2017-07-17, 12:32

Tetrium Offline

Rank l33t++

Rank: l33t++
Posts: 9607
Joined: 2010-01-27, 18:53
Location: Netherlands

kwyjibo wrote:
Thank you all! […]
Show full quote
Tetrium wrote:
Athlon 1GHz is roughly equivalent to Pentium 3 1GHz I'n not sure about Opteron, but my guess is that it's roughly similar in per […]
Show full quote
kwyjibo wrote:
First, I do not want to start a flame war 😀

What I want to know is, in order to be able to build some period correct PCs (gaming or workstation), and I am getting very confused about benchmarks, what AMD high-end processors were considered to be on pair or surpass their Intel high-end counterparts? Athlon 1 Ghz? First Opteron? First Athlon 64?

Athlon 1GHz is roughly equivalent to Pentium 3 1GHz
I'n not sure about Opteron, but my guess is that it's roughly similar in performance to Athlon 64 of the same clock frequency.
Athlon 64 @2.2GHz is roughly equal to Netburst 3.2GHz (A64 is probably a bit faster).
Athlon 64 was about 25% faster compared to Barton clock for clock. Barton 2.2GHz was maybe a little bit slower compared to a 3.2GHz Netburst chip.

I'm not sure if this is what you meant, please correct me if I'm wrong.

Thank you all!

As I said I did not want to start a war but with this subject it seems difficult to avoid it 😀

Yes, I feel you gave me a great response to my question.

Talking about price/performance comparison I think that it is an important matter with current CPUs but once time has passed and you are building computers with old parts just for fun we always try to get the best performance CPUs (although there is always room for "special" cases).

To continue with the discussion, have you seen benchmarks of Athlon Slot A versus Pentium III Xeon Slot 2? Does Pentium III Xeon differ from regular Pentium III in anything but the increased cache?

We do have a Slot 2 Xeon expert on Vogons, but I can't remember his name but I think he had the XO of BSG as his avatar??
I don't think the Slot 2 Xeons were really a lot better clock for clock compared to something like Coppermine. Btw, we do have a couple benchmark threads, you could have a look there? There is one with SuperPi, but I'm not sure that's what you're looking for.

You can see a link to Vogonswiki in my signature, at the bottom are several threads with benchmark threads. Maybe there's something for you there , so you don't have to wait for an answer 😜

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 71 of 92, by kwyjibo

Posted on 2017-07-18, 07:37

kwyjibo Offline

Rank Newbie

Rank: Newbie
Posts: 40
Joined: 2009-09-20, 09:14
Location: Cartagena, Spain

Thank you Tetrium for the info. I saw a couple of benchmarks (486, P6), but I did not search any further. And thanks all for being patient with me 😜

Reply 72 of 92, by Tetrium

Posted on 2017-07-18, 13:30

Tetrium Offline

Rank l33t++

Rank: l33t++
Posts: 9607
Joined: 2010-01-27, 18:53
Location: Netherlands

kwyjibo wrote:
Thank you Tetrium for the info. I saw a couple of benchmarks (486, P6), but I did not search any further. And thanks all for being patient with me 😜

Yw, Frank 😜

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 73 of 92, by Jade Falcon

Posted on 2017-07-18, 14:34

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

jheronimus wrote:
Jade Falcon wrote:
And I'm amazed that no one said anything about the Am5x86.

I guess, same goes for Am386DX40 and K6-2/3 family. Yes, they are faster than 386DX-33, 486DX4-100 and P-MMX@233, but Intel already had next generation chip at every point.

Very true, but these CPU's counterparts were Intel's older CPU's. Or that's the way I look at it.

Reply 74 of 92, by kanecvr

Posted on 2017-07-19, 00:38

kanecvr Offline

Rank Oldbie

Rank: Oldbie
Posts: 1957
Joined: 2015-04-22, 20:30
Location: Bucharest, Romania

Scali wrote:

What a ridiculously wrong claim.
Do you even know anything about architectures? Did you even try to compare them?
Let me, as long-time assembly programmer, give you a few hints.

STOP thinking like a programmer. I'm talking about physical differences between the silicon as well as how the chips actually process data and instructions, and how they handle cache and memory. If we're to compare diagrams, most x86 compatible chips look similar.

1. Instruction decoding and pipeline.
While the P6 and K7 can handle decoding 3 instructions at once, they do it very differently. On a hardware level, the K7's instruction decode unit is symmetrical and homogeneous. The K7's decoder converts instructions into fixed length macro ops and finally RISC ops, much like the K6, while P6's decoders interpret instructions quite differently.

"Unlike the P6, and more like the K6, the K7’s decoders are generalized and completely symmetric. Whereas the P6’s decode pipeline will stall if any instruction other than the first in an issue packet is even mildly complex (e.g., not register to register), the K7 won’t skip a beat. Furthermore, each of the K7’s three pipelined decoders can handle moderately complex x86 instructions, including instructions such as load-operate-store and as long as 15 bytes." .... "The K7’s decoders convert variable-length x86 instructions into fixed-length “macro ops” (MOPs) and deliver them to the in-order instruction control unit (ICU). The ICU dispatches these MOPs to the instruction schedulers in the outof-order core. The schedulers convert MOPs into RISC ops (ROPs), which they issue to the execution units. The execution units can execute up to nine ROPs per clock. It is the job of the K7’s direct-path decoders to keep the ICU fed, so that it never stalls for want of MOPs to dispatch to the core" - more here: http://cgi.di.uoa.gr/~halatsis/Advanced_Comp_ … h/Papers/k7.pdf

2. Cache.
The P6 and K7 handle cache differently. To start, L1 cache size, structure and management on the P6 and the K7 are very different. The K7 has 4 times more cache then the P6 - 2 way 64kb instruction and 64kb data cache vs 4 way 16kb instruction and 16kb data cache. The P6 can move instructions and data faster trough it's L1 cache then the K7, but the latter can store and index more instructions and data at once. Also, the L2 cache's translation lookaside buffer (TLB) is physically located in the same area as the L1 cache - 256 entries for data, 256 entries for instructions. The P6 lacks this altogether. L2 cache entries are handled by the L1 cache TLB's and limited to 32 instruction entries and 72 data entries, white the K7 has dedicated hardware for this purpose.

Later P6 chips like the Coppermine and Tualatin use a 256 bit path to the L2 cache while the K7 chips have a solitaire 64 bit path - BUT - unlike the P6 (and the P68 Netburst chips) the K7's cache controller is able to access the L2 cache "directly" and store 512 (256 data + 256 instruction) entries, while the L2 cache on P6 chips are slaved to the L1 cache, and can only store 72+32 entries in the L1 cache's TLB. Also, the FPU on the K7 can make direct use of the L2 cache, witch is part of the reason the Athlon's FPU performance is much better then on any pentium III, including Tualatin chips.

"Having built a powerful execution engine, Meyer was determined to provide enough memory bandwidth to keep it fed. To this end, the K7 implements aggressive memory reordering, a large multiported L1 data cache, an associative backside L2 cache with on-chip tags, a multilevel TLB (vs the P6's single level TLB and sidelined rudimentary L2 cache controller), and a memory interface with significantly more bandwidth than is provided by either K6’s Socket 7 or Intel’s Slot 1 or 2"

3. Branch Prediction.
"The K7 implements a surprisingly simple branch predictor for a machine with such a long pipeline. The K7 uses a 2,048-entry branch history table (BHT) with a simple two bit Smith prediction algorithm. This predictor stands in sharp contrast to the K6’s elaborate 8,192-entry BHT with its two-level GAs predictor—a feature that AMD now admits was overkill. The K7’s BHT is accessed in the fetch stage using the branch address, and the prediction is made in the scan stage. The prediction is fed back to direct instruction fetch on the next cycle. Branch target addresses are computed on the first misprediction and stored in a 2,048-entry branch target
address cache (BTAC)." As for the P6, I remember it's BPU being derived from the P55c - and it's not much faster then the latter. Performance-wise, the K7's BPU is not much faster then the P6's.

4. System bus.
AMD licensed Digital’s Alpha's 21264 processor bus, making it the first PC processor to depart from a traditional multidrop shared bus. This enables the machine to provide much better bandwith for other devices, like fast video cards, witch is why a 1400MHz athlon can keep a FX 5900XT fed with data while running at 1600x1200 vs a 1400Mhz tualatin. On the latter, stuttering and framedrops can be observed in some 1999-2000 and almost all 2001-2003 games at that resolution. A nice experiment anyone can run is Dungeon Keeper II @ 1600x1200 / high /32 bit color or Black and White 1 @ 1280x1024 / high. B&W also benefits heavily from the Athlon's L2 cache architecture and faster floating point unit.

"The K7’s system bus, which AMD borrowed (licensed) from Digital’s Alpha 21264 processor (see MPR 10/28/96, p. 11), is the first PC processor to depart from a traditional multidrop shared bus. Unlike other PC processors, the K7 uses a pointto-point interconnect" .... "The K7’s bus uses source-synchronous clocking to minimize skew and latch-to-latch signaling to reach 200 MHz, with Slot 1–type printed-circuit module packaging. Up to 400 MHz is possible on impedance-controlled multilayer boards."

5. Floating point
"....fully pipelined (the P6 is partially pipelined when performing multiplies) AMD had the gall to make a superpipelined FPU. I would have thought that this was impossible given the horribly constipated x87 instruction set, but I was shocked to find that its really possible to execute well above one floating point operations per clock (on things like multiply accumulates.) The K7 architecture shows a three-way pipeline (FADD, FMUL, and FSTORE) for the FPU however, "FSTORE" does not appear to be all that important (its used for FST(P), FLD(CONST) and "miscellaneous" instructions.) So the only question you'd think remains is "how fast is FXCH"? However, upon reflection it seems to me that the use of FXCH is far less important with the K7. Since the K7 can combine ALU and load instructions with high performance, pervasive use of memory operands in floating point instructions (which reduces the necessity of using FXCH) seems like a better idea than the Intel recommended strategies.

A floating point test I did that uses this strategy confirms that the K7 is indeed significantly faster than the P6's floating point performance. My test ran about 50% faster. I suspect that as I become more familiar with the Athlon FPU I will be able to widen that gap (i.e., no I can't show what I have done so far.)

Nevertheless the top two stages of the FPU pipeline are stack renaming then internal register renaming steps. The register renaming stage would be unnecessary if FXCH (which helps treat the stack more like a register file) did not execute with very high bandwidth so I can only assume that FXCH must be really fast. Update: The latest Athlon optimization guide says that FXCH generates a NOP instruction with no dependencies. Thus it has an effective latency of 0-cycles (though it apparently has an internal latency or 2 clocks -- I can't even think of a way to measure this.)

Holy cow. Nobody in the mainstream computer industry can complain about the K7's floating point performance.

The 21264 also has two main FP units (Mul and Add) on top of a direct register file. So while the 21264 will have better bandwidth than the K7 on typical code which has been optimized in the Intel manner (with wasteful FXCHs) on code fashioned as described above, I don't see that the Alpha has much of an advantage at all over the K7. Both have identical peak FP throughput of 2 ops per clock, that in theory should be able to be sustainable by either processor."

^taken directly from here: http://www.azillionmonkeys.com/qed/cpujihad.shtml

In the same way you're thinking as a programmer, I look at it the way an engineer would. Physically, the two chips have very little in common, and while the end result might be the same (they are both x86 compatible microprocessors after all), they do things very differently.

P.S. Don't get your panties in a bunch.

Reply 75 of 92, by Scali

Posted on 2017-07-19, 07:11

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

kanecvr wrote:
STOP thinking like a programmer.

We're talking about microarchitectural similarities here. The only way to judge those is to use those microarchitectures, as in running code and studying their behaviour.

kanecvr wrote:
I'm talking about physical differences between the silicon

This is mostly irrelevant. The same microarchitecture can be built in a number of configurations, on a number of different process nodes, using a number of different packages.
So it can very well be that there are CPUs that look very different physically, yet stem from the same microarchitectural family.

kanecvr wrote:
as well as how the chips actually process data and instructions

Which, as I've explained above, is extremely similar between K7 and P6.

kanecvr wrote:
and how they handle cache and memory.

This is arguably not part of the microarchitecture itself, but of the 'un-core'.
Aside from that, you won't see Intel and AMD use exactly the same caching algorithms due to a combination of trade secrets and patented technologies. So that is pretty much a given.

kanecvr wrote:
If we're to compare diagrams, most x86 compatible chips look similar.

They do *now*, that's the whole point of the P6. It was revolutionary at the time. All successful x86 CPUs still basically follow the same mold.
The biggest outliers so far have been Netburst and Bulldozer, and we all know how well that went.
Any x86 CPU before P6 looked very different from the P6. There was no out-of-order execution, no register renaming, no separated decoding of instructions vs execution backend etc. Take a look at the 286 architecture, for example: https://en.wikipedia.org/wiki/Microarchitectu … i80286_arch.svg
Which actually makes it all the more remarkable that P6 and K7 look so similar.
Don't forget the timeframe here:
P6 was introduced in 1995.
K7 was introduced in 1999.
P4 was introduced in 2000.

So AMD has had 4 years to develop a newer architecture, it could have been anything. Intel released their P4 not much later, and as you see, they took it into new directions, with much longer pipelines, trace cache, and deprecating the x87 in favour of SSE2.
Apparently AMD did not do any of that, and stuck very closely to that original P6 mold.

kanecvr wrote:
In the same way you're thinking as a programmer, I look at it the way an engineer would.

I disagree. You get caught up in irrelevant details and how AMD vs Intel marketing spins them. Not what I expect an engineer to do.
The arguments you present aren't very strong either. They are mostly differences that I had already mentioned myself already, you just try to make a huge deal out of them.
Besides, I'm an engineer as well. A software engineer (and yes, we do get taught computer organization and digital technology as well, so we do know how CPUs work, down to the transistor level, so don't bother even going there).

Last edited by Scali on 2017-07-19, 13:24. Edited 1 time in total.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 76 of 92, by Tetrium

Posted on 2017-07-19, 12:08

Tetrium Offline

Rank l33t++

Rank: l33t++
Posts: 9607
Joined: 2010-01-27, 18:53
Location: Netherlands

Scali wrote:

kanecvr wrote:
I'm talking about physical differences between the silicon

This is mostly irrelevant. The same microarchitecture can be built in a number of configurations, on a number of different process nodes, using a number of different packages.
So it can very well be that there are CPUs that look very different physically, yet stem from the same microarchitectural family.

Scali...when kanecvr referred to "silicon", he was actually referring to the silicon (as in the material "silicon") and not the plastic, ceramic or metal parts that make up the packaging of the CPU. As you may already know, the Intel Pentium MMX rated for 166MHz was constructed in a ceramic package, but was later also packaged in PPGA. Both physically different and both running at the same speed, both should perform identical when it comes to the silicon used inside the Pentium MMX 166MHz. But sometimes the silicon inside the CPU package (which is what you're rambling about here, but which is totally irrelevant to the performance) will be rated at a higher clock speed, thus improving performance of said CPU. But this has very little to do with the packaging and the packaging was not what kanecvr was talking about (no idea why you are making this so much of a point here. Packaging also has nothing to do with programming so why are you bringing this up anyway? 😕 ).

Scali wrote:
Bottom line is: too many people are talking about things they don't understand anything about... and doing that in a way as though they are an authority on the subject. This is annoying. Some people should know when not to talk.

Looks like you're the one who doesn't understand why kwyjibo even created this thread. Have you actually answered his question yet? You're only making this thread unnecessarily longer and give it more pages without giving kwyjibo an answer or even read what else he stated.

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 77 of 92, by Scali

Posted on 2017-07-19, 12:15

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Tetrium wrote:
Scali...when kanecvr referred to "silicon", he was actually referring to the silicon (as in the material "silicon") and not the plastic, ceramic or metal parts that make up the packaging of the CPU.

As am I... a different socket would mean different pin layout, and therefore you may also need to change the interconnects on the die-part to make it fit, for example.
Or what about the aforementioned Smithfield vs Presler? Both are a two-core Pentium 4 derivative, same microarchitecture, but one is a single piece of silicon, the other is two. Etc. Or on a smaller scale, there may be different steppings of a CPU, which means some parts of the die have been physically changed, but those are not microarchitectural differences, just small changes/rearrangements to improve manufacturing and such.
In the case of the P6 architecture, it was used in its original form in the Pentium Pro, PII and PIII, and in various derivatives from then on. At least the PPro, PII and PIII are considered the same P6 microarchitecture, yet obviously they have very different physical characteristics.
You may want to read and think about what I write, before making stupid knee-jerk 'corrections' like this.

My point was: physical characteristics of the die do not say much about the microarchitecture (and as such make no sense as a basis for a comparison). I didn't expect that I had to spell that out, but apparently I do, to some people. People who, as mentioned before, are trying to discuss topics they know little or nothing about.

Tetrium wrote:
Have you actually answered his question yet?

I was one of the first to answer in this thread: What AMD processors did beat their Intel counterparts?

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 78 of 92, by Tetrium

Posted on 2017-07-19, 12:50

Tetrium Offline

Rank l33t++

Rank: l33t++
Posts: 9607
Joined: 2010-01-27, 18:53
Location: Netherlands

Scali wrote:
Tetrium wrote:
Scali...when kanecvr referred to "silicon", he was actually referring to the silicon (as in the material "silicon") and not the plastic, ceramic or metal parts that make up the packaging of the CPU.

As am I... a different socket would mean different pin layout

Pentium MMX 166MHz in CPGA and PPGA packages were actually both available in identical s7 pinout, but the differences are actually not part of the silicon (dunno where you're getting that).
I think you're the one requiring doing a bit of thinking, as I already (2 or 3 years ago or something?) told you why people are not really listening to the knowledge you possess. Maybe you should think again instead of only trying to lecture anyone in the too often unpleasant ways that you're lecturing. It hasn't worked for you for years, why would it suddenly start working from today?

Whats missing in your collections?
My retro rigs (old topic)
Interesting Vogons threads (links to Vogonswiki)
Report spammers here!

Reply 79 of 92, by Scali

Posted on 2017-07-19, 12:56

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Tetrium wrote:
Pentium MMX 166MHz in CPGA and PPGA packages were actually both available in identical s7 pinout, but the differences are actually not part of the silicon (dunno where you're getting that).

I don't know where *you* are getting this.
You're moving the goalposts here. You are talking about Pentium MMX all of a sudden. I'm not (and I did not include that part of your message in the quote).
I was talking about the P6 architecture (topic was P6 vs K7 remember?), which has been available in multiple sockets and also slot packages, not to mention different process nodes. So there are a number of physically different pieces of silicon, which are all P6 microarchitecture.

Nevertheless, both Pentium Classic and Pentium MMX are also the same P5 microarchitecture, despite being available in different sockets, different process nodes, packages etc, so the argument still holds anyway: physical characteristics of silicon and/or packaging are irrelevant to microarchitecture.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Main menu

Common searches