AMD drops the mic

Reply 60 of 279, by Scali

Posted on 2017-02-11, 14:31

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

gdjacobs wrote:
I'm not a game programmer, so I'm not really aware of how guys like Tim Sweeney are evolving their approach with increasingly parallel hardware, but I am familiar with numerical programming on dozens or hundreds of cores. In general I can say that parallel programming is not straightforward or cheap and any opportunity to use knowledge of the target platform to simplify the task of adapting software to a multi core environment will be embraced by developers, especially if doing so (assuming small core counts) has little no downside in the economic lifetime of your product.

The most versatile approach to multiprocess computing is to partition the problem in a fine grained fashion amongst your processors, but this is very expensive in terms of engineering. Instead, if your network code, audio code, or some other subsystem does not lend itself to parallelism, you don't achieve worthwhile gains by partitioning, or it's not worth the cost in developer time, it can be spun off on a thread in coarse fashion for the scheduler to load balance. On a processor with a few cores, large elements such as graphics remain unitary within their respective threads, but have a core fully dedicated to their execution. Housekeeping threads and the OS share what's left.

This is useful for small core counts as it allows the resources available to be utilized while avoiding the significant cost of completely re-engineering the game engine potentially down to it's inner loops. The approach runs out of steam with large core counts due to the limited number of application elements to be spun off and the significant difference in compute time required for the different threads (compare network code vs scene setup and render).

Well, my point is that all the 'low hanging fruit' of splitting up tasks in multiple threads at a coarse level has been done about a decade ago. Since then they have been working on optimizing the computationally intensive tasks such as physics with parallelism.
Don't forget, the most parallel task in a game is the rendering (known as an 'embarrassingly parallel' problem), and that was already tackled with specially optimized hardware before multi-core CPUs were even available in the mainstream.

People tend to talk like there's a lot of performance still on the table, and if those lazy developers would just start writing multithreaded code, you know, that silver bullet, we'd get incredible performance boosts. It's nothing like that.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 61 of 279, by Azarien

Posted on 2017-02-11, 22:54

Azarien Offline

Rank Oldbie

Rank: Oldbie
Posts: 878
Joined: 2015-05-14, 07:14

I like how old-school this new Socket AM4 looks like.

Reply 62 of 279, by vladstamate

Posted on 2017-02-11, 23:55

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

What would be an interesting approach is non-symetrical multiprocessing. The Cell processor did that. PS3 games were able to take advantage of the 6 SPUs that had incredible performance:

1) ran at 3.2Ghz
2) all the instructions were effectively in cache and so was all of its data
3) they had a dedicated bus (the BE, Broadband Engine)
4) in some circumstances could do 2 instructions per cycle. A vector and matrix multiplication was just a handful of cycles once you splat the vectors.

And you also had 2 proper CPU cores. Games that took advantage of the SPUs looked incredible. I did a lot of SPU coding and SPU code performance analysis.

While Cell had its issues there might still be place for non-symetrical computing. Not all tasks are created equal and some of them could use a dedicated type a core.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 63 of 279, by Scali

Posted on 2017-02-12, 11:00

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

vladstamate wrote:
While Cell had its issues there might still be place for non-symetrical computing. Not all tasks are created equal and some of them could use a dedicated type a core.

Well, isn't the PC equivalent of this just a combination of multi-core CPU and GPGPU programming?

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 64 of 279, by PhilsComputerLab

Posted on 2017-02-12, 11:07

PhilsComputerLab Offline

Rank l33t++

Rank: l33t++
Posts: 6174
Joined: 2014-09-28, 03:33
Location: Western Australia

Azarien wrote:
I like how old-school this new Socket AM4 looks like.

Did you see the coolers? Oh my 😊

YouTube, Facebook, Website

Reply 65 of 279, by vladstamate

Posted on 2017-02-12, 13:56

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

Scali wrote:
vladstamate wrote:
While Cell had its issues there might still be place for non-symetrical computing. Not all tasks are created equal and some of them could use a dedicated type a core.

Well, isn't the PC equivalent of this just a combination of multi-core CPU and GPGPU programming?

It almost is yes. There are still few things that make GPGPU not as good as SPUs

1) There is a noticeable latency to task start. It is in order of microseconds. On SPUs all you had to do is write to a register and the task would start. However you can batch things in advance on the GPU true.
2) The GPUs do not yet run at 3.2Ghz. Now in faireness you do get A LOT of ALU units in recent GPUs to make for the lack of speed.
3) Divergent code still has issues on GPUs.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 66 of 279, by swaaye

Posted on 2017-02-12, 19:13

swaaye Offline

Rank l33t++

Rank: l33t++
Posts: 8139
Joined: 2002-07-22, 21:24
Location: WI, USA

vladstamate wrote:
It almost is yes. There are still few things that make GPGPU not as good as SPUs […]
Show full quote

It almost is yes. There are still few things that make GPGPU not as good as SPUs

1) There is a noticeable latency to task start. It is in order of microseconds. On SPUs all you had to do is write to a register and the task would start. However you can batch things in advance on the GPU true.
2) The GPUs do not yet run at 3.2Ghz. Now in faireness you do get A LOT of ALU units in recent GPUs to make for the lack of speed.
3) Divergent code still has issues on GPUs.

Isn't PCIe less than ideal too? Or is that the latency you refer to. The consoles certainly don't have to deal with PCIe.

Reply 67 of 279, by vladstamate

Posted on 2017-02-12, 22:20

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

swaaye wrote:

Isn't PCIe less than ideal too? Or is that the latency you refer to. The consoles certainly don't have to deal with PCIe.

No, I was not referring to the bus (PCIe, etc) latency. And you are right consoles do not have that issue (being SoC based). It is what I call the "0-to-60mph" for the GPU. How long does it take from nothing to have my compute task started on the GPU. The time is at least an order of magnitude longer than doing same on CPU. I am not talking about execution time but just start time.

GPUs have to deal with a fairly complicated pipeline, internal resource management (do I have enough local memory for this task, do I have enough temp registers for this task, etc) that has to be resolved before the first instruction can be executed. This is not true for the CPU.

The you also lack the ability for one compute task to start another (although GPU makers are now putting that in).

But we are digressing here, no point making this into a CPU vs GPGPU thread. My initial point was that maybe, just maybe, there can be some win in having some asymmetrical multiprocessing in the CPU.

For example, I did a lot of chess engine programming and my chess engine (Plisk) ranked 8th in the 2010 World Computer Fast Chess Championship. I can tall you no way I could use the GPU for any of the algorithms. But give me some 20-30 cores with fast and simple ALU units with almost 0 latency to task start and I would love you for it!

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 68 of 279, by Standard Def Steve

Posted on 2017-02-13, 02:37

Standard Def Steve Offline

Rank Oldbie

Rank: Oldbie
Posts: 1424
Joined: 2012-09-15, 08:04

Full Ryzen spec and price list leaks
http://techreport.com/news/31427/rumor-full-r … ist-leaks#metal

8c/16t, 16MB L3, unlocked multi, and a 65W TDP for $319 is absolutely bonkers. I just might have to order one on launch day. 😊

94 MHz NEC VR4300 | SGI Reality CoPro | 8MB RDRAM | Each game gets its own SSD - nooice!

Reply 69 of 279, by archsan

Posted on 2017-02-17, 14:00

archsan Offline

Rank Oldbie

Rank: Oldbie
Posts: 983
Joined: 2009-06-28, 15:30

Ahh... long time no checking the forum. Forgot where the one thread with the "Zen" title on it was, so here goes...

Yeah, the hype is getting too big to ignore. https://videocardz.com/66065/first-cpu-z-scre … yzen-cpu-leaked

Also some recent leaked benches on the site. At 65~95W... these aren't even OC results yet. I know, I know... I'll wait a week or two for legit results. Someone please make an "official (RY)ZEN thread" then. 😀

Waiting for Naples as well. I hope there will be fast 10~12-core options in addition to the slower 16/32-core. Put the Xeons in check (damn unjustifiable prices).
Return of the Opteron workstations (& servers)...

"Any sufficiently advanced technology is indistinguishable from magic."—Arthur C. Clarke
"No way. Installing the drivers on these things always gives me a headache."—Guybrush Threepwood (on cutting-edge voodoo technology)

Reply 70 of 279, by Scali

Posted on 2017-02-17, 14:13

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

I could not find an official release date... Just some reference to Lisa Su saying "Early March"...
Intel approaches launches very differently. They give you a real date long in advance, and send out engineering samples to reviewers early. They also perform more than 1 benchmark when they 'introduce' the product.
So you know what you're going to get.

With AMD it's all hype so far. Which wouldn't be so bad, if only their track record wasn't so poor. Both Barcelona and Bulldozer were hyped for months, if not years in advance, and completely fell flat on delivering the promise. So all this hype isn't exactly getting me excited. I get this feeling of "Oh no, not again".

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 71 of 279, by archsan

Posted on 2017-02-17, 14:34

archsan Offline

Rank Oldbie

Rank: Oldbie
Posts: 983
Joined: 2009-06-28, 15:30

Looks like the samples are already in the hands of reviewers. Just waiting til March 2nd for the NDA lift. Two weeks from now.

True it's all hype at this point for us -- though the hype is not that it's "the fastest" stuff, but it's "more cores for less money at speeds around 5th-6th gen i7" which is plenty good for a lot of purposes today.

Vega is probably the one that will tank. But Zen could save a LOT especially for those of us procuring lots of new machines.

"Any sufficiently advanced technology is indistinguishable from magic."—Arthur C. Clarke
"No way. Installing the drivers on these things always gives me a headache."—Guybrush Threepwood (on cutting-edge voodoo technology)

Reply 72 of 279, by eL_PuSHeR

Posted on 2017-02-18, 09:44

eL_PuSHeR Offline

Rank l33t++

Rank: l33t++
Posts: 6570
Joined: 2003-06-20, 16:39

What strikes me as odd is that Ryzen doesn't seem to come with built-in GPUs.

Intel i7 5960X
Gigabye GA-X99-Gaming 5
8 GB DDR4 (2100)
8 GB GeForce GTX 1070 G1 Gaming (Gigabyte)

Reply 73 of 279, by shiva2004

Posted on 2017-02-19, 00:53

shiva2004 Offline

Rank Member

Rank: Member
Posts: 194
Joined: 2016-03-18, 10:30

Odd? Why? AMD has never included GPUs in their CPUs, only in the APUs, and there never was any information of a change in that policy. Of course, sooner or later we'll see APUs with ryzen cores, but not a GPU in every processor ala Intel.

Reply 74 of 279, by ODwilly

Posted on 2017-02-19, 01:52

ODwilly Offline

Rank l33t

Rank: l33t
Posts: 2311
Joined: 2013-06-18, 02:07
Location: Wa. U.S

Why waste die space in performance desktop chips that are going to be used with a discrete card anyways? Granted they should be releasing some new APU's for the mainstream market. Hopefully there is a solid improvement over the fm2 stuff.

Main pc: Asus ROG 17. R9 5900HX, RTX 3070m, 16gb ddr4 3200, 1tb NVME.
Retro PC: Soyo P4S Dragon, 3gb ddr 266, 120gb Maxtor, Geforce Fx 5950 Ultra, SB Live! 5.1

Reply 75 of 279, by vladstamate

Posted on 2017-02-19, 01:58

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

Scali wrote:

Intel approaches launches very differently. They give you a real date long in advance, and send out engineering samples to reviewers early. They also perform more than 1 benchmark when they 'introduce' the product.
So you know what you're going to get.

In all fairness that is not a proper comparison. Intel CPUs as of late have not been revolutionary. Their tic-toc is not really holding up anymore either. They have not done major pipeline changes in years. Of course their launches are going to be more structured since their CPUs are less risky.

Also Intel was not always like that. Point in case, Larrabee. I was there at SIGGRAPH 2008 at the talk that announced it for the very first time to the public. A LOT and I mean A LOT of handwaving. They did not have engineering samples nor did they have more than 1 benchmark to show. I was involved in the project from the game console manufacturer point of view and it was a mess.

I can understand that for CPUs launched in the last 5 years Intel has been more structured but lets judge AMD and Intel using the same stick.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 76 of 279, by Scali

Posted on 2017-02-19, 10:13

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

vladstamate wrote:
In all fairness that is not a proper comparison. Intel CPUs as of late have not been revolutionary.

I'm not just talking about now, they've done this as long as I can remember. Even with the Core2.
That caused quite a stir. The early benchmarks from Core2 seemed 'too good too be true' (they showed 20-40% better IPC than the fastest x86 at the time, being the Athlon64), and were not accepted by the AMD crowd.
However, these benchmarks were 100% correct as people found out later when they hit the retail channels.

The fact that Intel has not done anything that revolutionary in recent years doesn't change anything about that. Their approach to launches is still the same.
And if you want to judge AMD by the same stick: AMD hasn't released anything new at all since Bulldozer. I wouldn't count Zen as revolutionary either, since it seems to be mostly a carbon-copy of the architecture that Intel has been using since Core i7: relatively short pipelines focused on high IPC, combined with SMT for parallel scalability.
Which is, as I've argued before, the best strategy for AMD in my opinion: AMD doesn't have the resources for R&D to come up with some revolutionary architecture that will blow the doors off Intel. Even if AMD did, they would still face the problem of legacy code: people only consider an x86 CPU good if it can run current software faster than the existing CPUs.
Itanium, Pentium 4 and Bulldozer all suffered from this: they were considerably better when you compiled and optimized code specifically for these architectures, than when you ran legacy x86 code on them, optimized for Pentium Pro/2/3/Core architectures.
One of the biggest strengths of the Athlon and Core2 architectures was that their performance characteristics matched closely with the Pentium Pro, for which most software was (and in a way still is) optimized.

So AMD's best gamble would be to take the Intel CPUs as their guideline, take from them what works, and refine things where you can.
This is also what AMD did with the original Athlon: it was very similar to a Pentium Pro, but introduced a few things (some taken from acquired technology from the DEC Alpha team) that led to better performance in some cases.
They also simplified some things of the design, which didn't hurt performance in practice.

vladstamate wrote:
I can understand that for CPUs launched in the last 5 years Intel has been more structured but lets judge AMD and Intel using the same stick.

I am: Intel shows accurate benchmarks well in advance. AMD generally hypes up their CPUs without much substance, and they then disappoint.
AMD has done that since the Core2 era as well, starting with Barcelona, then Bulldozer. With Bulldozer, they had the John Fruehe fiasco even. You can't fail much harder than that. Intel has never pulled any stunt even remotely as underhanded as that.
You put that arbitrary '5 years' metric on there, that didn't come from me. So in essence you are pulling what I said out of its original context.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 77 of 279, by gdjacobs

Posted on 2017-02-19, 11:36

gdjacobs Offline

Rank l33t++

Rank: l33t++
Posts: 7626
Joined: 2015-11-03, 05:51
Location: The Great White North

Scali wrote:

And if you want to judge AMD by the same stick: AMD hasn't released anything new at all since Bulldozer. I wouldn't count Zen as revolutionary either, since it seems to be mostly a carbon-copy of the architecture that Intel has been using since Core i7: relatively short pipelines focused on high IPC, combined with SMT for parallel scalability.

To be clear, SMT is generally most profitable with architectures which suffer heavily from pipeline stalls. It's a big win for long pipeline OOO execution architectures like Netburst and IO execution architectures like Ultrasparc and Power6. Wide architectures were something pioneered by the RISC vendors like MIPS, IBM, DEC, and HP. Essentially, everyone is iterating on the RISC concept (with the EPIC VLIW architecture having failed).

Where Intel remains ahead of everyone is in manufacturing process technology. This might see some erosion with the shift in the semiconductor market to handset SoCs, but even there the focus is on cost and efficiency rather than raw performance.

All hail the Great Capacitor Brand Finder

Reply 78 of 279, by Scali

Posted on 2017-02-19, 12:48

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

gdjacobs wrote:
To be clear, SMT is generally most profitable with architectures which suffer heavily from pipeline stalls.

No, this is a common misconception, going back to the days of the Pentium 4 (AMD claimed they didn't need the technology, because they didn't have the technology. Everyone bought into that story, except me. Because I do critical thinking).
Did you ever bother to think it through in the context of a modern x86?
A Core i7 has a lot of execution units per core. The legacy two-operand instructionset of x86 however is inadequate to feed all of these units every cycle.
This is not a 'pipeline stall' in the traditional sense, but you do have many units that are sitting idle every cycle.
By feeding two or more instruction streams, you can reach better utilization of these units.

Which is why HT works at least as well on modern Core i7 CPUs as it did on the Pentium 4, while Core i7 is the complete opposite of the Pentium 4 in terms of pipeline design, stalls, and cost of these stalls.
(Why would Intel have brought back HT otherwise? It was gone from the Core2, because Core2 was not based on the Pentium 4 architecture, and had to be migrated. Which they did in Nehalem, and it has stayed there ever since. If there was no merit to it, it wouldn't have been here today, and AMD certainly wouldn't be trying to copy it).

As for RISC, There's no such thing anymore. We are well into the post-RISC era, where even 'true' RISC architectures are now very similar to x86 in how they execute legacy code: A lot of instructions aren't implemented in hardware, but are decoded to microcode, or even emulated in software altogether.
x86 itself has of course also been using a RISC backend since the Pentium Pro era. The boundaries between the two are fading fast.
But if you want to argue about older RISC, if anything, the main characteristic of RISC has been very short and simple pipelines, with few and short stalls. Yet RISC is where you saw SMT first, on the DEC Alpha architecture. Which is where Intel got their HT technology from. Of course it was originally developed by IBM and also implemented on their POWER RISC architecture. Both have considerably shorter pipelines than the Pentium 4, and in fact, very similar to modern Core i7 pipelines (in the range of 14-16 stages, where Pentium 4 was 28+).

You might want to update your knowledge, because it all sounds more than 10 years out of date.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 79 of 279, by carlostex

Posted on 2017-02-19, 18:23

carlostex Offline

Rank l33t

Rank: l33t
Posts: 2403
Joined: 2010-04-03, 21:39
Location: Portugal

If all the hype leaks and such actually are accurate, then Zen actually has slightly better IPC than Broadwell. That would be actually really good. If the silicon works well enough that it can clock higher maybe AMD has a winner and can indeed shake the market up. They won't win the performance crown for sure, but they don't need to anyway.

Main menu

Common searches

Topic actions

Reply 60 of 279, by Scali

Reply 61 of 279, by Azarien

Reply 62 of 279, by vladstamate

Reply 63 of 279, by Scali

Reply 64 of 279, by PhilsComputerLab

Reply 65 of 279, by vladstamate

Reply 66 of 279, by swaaye

Reply 67 of 279, by vladstamate

Reply 68 of 279, by Standard Def Steve

Reply 69 of 279, by archsan

Reply 70 of 279, by Scali

Reply 71 of 279, by archsan

Reply 72 of 279, by eL_PuSHeR

Reply 73 of 279, by shiva2004

Reply 74 of 279, by ODwilly

Reply 75 of 279, by vladstamate

Reply 76 of 279, by Scali

Reply 77 of 279, by gdjacobs

Reply 78 of 279, by Scali

Reply 79 of 279, by carlostex