gdjacobs wrote:Those segments (as a complete platform) pretty much require an IGP.
IGP as in integrated on the motherboard? Because those existed for Nehalem/Lynnfield (H57/H55/Q57 chipsets).
Or specifically integrated into the CPU package? I would say that is somewhat arbitrary. Penryn had the IGP in the chipset. Clearly in that segment you'd want to integrate it in the CPU package eventually, but Westmere is not necessarily the make-or-break point. They could have left the IGP in the chipset for another generation.
I think you are putting goalposts in arbitrary positions. I don't like that type of 'discussion'.
I think your view of how well you conduct yourself in a mature, technical discussion, does not quite match with reality.
gdjacobs wrote:Depends on whether you're interested in scoring points or a real discussion. To me, engineering decisions are interesting.
Engineering decisions are interesting to me as well, but there's this thing about discussions and making statements in a certain context.
I made the statement that Westmere consisted of two dies, like Zen 2.
The engineering decisions behind that choice may have been slightly different, but that is not part of the context in which I made my statement.
I never claimed Intel worked from the same engineering decisions as AMD did. I merely said that Intel has taken the same multi-die approach to a CPU package in the past as well.
And that's where the problem is. You are trying to pull my statement into a different context, and then try to say my argument was 'imperfect'.
Clearly I would object to that, because it's just very unsportsmanlike to use a strawman fallacy in a discussion like that. Again, don't try to shift the blame on me. Look at your own contribution first. You know people are not going to like it if you twist and turn their words and use fallacies to try and 'win' a debate.
gdjacobs wrote:Sure, but neither was Intel. So what?
I merely responded to someone who claimed that AMD *was* being innovative with their chiplet architecture.
Again, context, it is important.
The "so what" here is that someone made a comment that was incorrect, and I responded with a correction.
That would have been the end of that, but apparently some people, including you, just can't let it end there, and have to keep harassing me about it.
gdjacobs wrote:Sure, I believe I said so. Most innovations touted in the media are just rewarmed versions of stuff from before.
Then we agree on that. And then you understand why I try to give a more realistic view of developments, by pointing out how things pushed as 'innovations' look eeriliy similar to what others have done in the past.
gdjacobs wrote:You trumpeted Intel as the prior art for package level integration.
I object to your use of the word "trumpeted".
Why would you phrase it in a subjective way like that?
I gave Westmere as an example, because it was the first that came to mind.
I never claimed Intel was the first, nor that Intel was innovative in doing so.
I merely wanted to debunk the claim that Zen 2 was 'innovative', because Zen 2 clearly was NOT the first.
You understand the concept of falsification? To disprove something, you only need one counter-example. Westmere is one such counterexample. I never implied it was the first, or the only one. I merely disproved Zen 2 being the first. Nothing more. Anything you want to read into it on top of that, is your own bias.
gdjacobs wrote:Even if you don't believe so, it comes across as very defensive of Intel as the better engineering house.
That is your subjective view.
I am just stating fact that Intel got there before AMD. Intel being "the better engineering house", well, that's your interpretation of it all. I think it says a lot more about you than about me. Think about it.
gdjacobs wrote:Stacked dies has been quite successful since it debuted with mobile chipsets, but it has physics issues against it. Stacking won't be viable for functional areas with high thermal load, so GPUs and desktop CPU cores are probably out. They will be using some variant of chiplet with EMIB and some stacking, but I'm not sure how far along they are.
I think you are missing the bigger picture here.
"Functional areas with high thermal load" -> "GPUs and desktop CPUs cores"
See the mental leap you made there? You are still thinking of GPUs and CPUs as single dies. You forget to factor in chiplets: You can slice and dice the functional logic of a CPU or GPU into any number of sub-modules. This is a divide-and-conquer approach: You can divide the logic in such a way that there will be entire chiplets that do not contain functional areas with high thermal load. And you can stack those together.
That's also how HBM is implemented by the way: They are put on top of a GPU package. Which by your logic wouldn't work, because you say GPUs are 'functional areas with high thermal load'. Yes, they are... but you can create a 'low thermal load' area at the edges of the GPU package, and stack your HBM there.
gdjacobs wrote:SMT yields the greatest benefit in architectures that suffer disproportionately from pipeline issues, so long pipelines with a large stall or miss penalty and in-order designs.
This is where your view is again limited. Long pipelines are just one case where SMT helps, they are not the only case.
In-order designs are a special case altogether, as their take on SMT is a very limited one, as they are in-order, which is more or less mutually exclusive with proper SMT.
As said, SMT originated from IBM, and they applied it to their POWER architecture, which is an entirely different beast from a Pentium 4. Sun also used SMT, again with entirely different types of CPUs.
I think what you are missing is that x86 is inherently inefficient. The instructionset dates from the 1970s, and only allows two operands per instruction, where one operand doubles as source and destination.
This inherently causes inefficiencies and requires all sorts of fancy register renaming and recombining logic to extract more ILP.
And that is why SMT works so well on any x86, no matter how efficient it is.
I mean, I hate to point out the obvious, but Nehalem was at its introduction the x86 CPU with the highest IPC ever. It couldn't possibly sustain that IPC if it suffered severely from stalls. There must be more to the story. Which is what I said above. Even the fastest x86 CPUs are inherently inefficient in their execution backend because the x86 instructionset has a lot of dependencies hardwired into the instruction format.
gdjacobs wrote:(although it's worth noting the Nehalem instruction pipe isn't exactly short compared to, say Penryn)
It is short compared to Pentium 4. Again, context, goalposts... Look at yourself first. You really think you are conducting a good discussion?
Anyway:
Penryn: 12-14 stages
Nehalem: 20-24 stages
Pentium 4: 28-31 stages
What needs to be noted however, is that there's a thing known as a 'Loop Stream Detector', where part of the pipeline can be skipped (similar to Pentium 4's trace cache). Penryn has a simple one, the Nehalem one is more advanced.
So effectively not all 20-24 stages are used during execution, bringing it closer to Penryn. Which makes sense, because otherwise it would never have gotten better IPC than Penryn.
gdjacobs wrote:Okay, but that's not a reason for trying to weaponize the BB.
Again, there's the moral superiority.
You are making assumptions, and then working from those, while your assumptions are flawed.
gdjacobs wrote:If we're able to talk in a reasoned and complete way instead of always angling to destroy everyone else in the thread, I bet we can dump 90% of the drama, be much more productive, and have a way better time.
Again, I'm not the problem.
The way I see it, some AMD fanboys were convinced that Zen 2 was the best thing since sliced bread, and chiplets were so awesome and innovative.
Then I pointed out Westmere, and cognitive dissonance ensued. Of course I was a heretic, and needed to be stoned!
The 'best' argument I've found from the other side is "they are not the same"... Yea, that's a reasoned, productive discussion!
Either you have arguments why they're (fundamentally) different, or you accept my arguments why they're remarkably similar (two dies, I/O split off to a die on older manufacturing). There's no other option.