Modern PC vs PS4, Xbox..

Reply 40 of 74, by Scali

Posted on 2017-10-04, 10:33

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

badmojo wrote:
Scali wrote:
Depends on what you're interested in. Is it:
1) The performance of video cards (and their drivers) in actual games
2) The performance and capabilities of the actual hardware (and their drivers)

Aren't 99.9% of people interested in #1? Your inability to concede a point makes for exasperating reading - I'm guessing you don't get invited to many dinner parties 🤣

Inability to concede a point?
Not at all. There's goalposts, and people wanting to shift those.
The context from which dexvx is arguing is not at all the context in which I made my original statements (for starters, I was talking about the original PS4 and XB1, not the upcoming Xbox One X update he's trying to draw the strawman from). Therefore, there is nothing to concede.

Also, part of his argument is literally: "25% more transistors + some optimization = 50%+ more performance"... Really?

Oh, and for clarity's sake: I was getting back to the point that swaaye already made: those Forza benchmarks are clearly an outlier. Vega suddenly being the best performer, beating the 1080Ti even. In every other game we've seen so far, the Vega was nowhere near the 1080Ti.
So games do not necessarily represent the capabilities of the hardware. This one game does not suddenly 'prove' that the Vega hardware is faster than a 1080Ti after all. We all know it isn't. It merely proves that you can create a game that runs faster on Vega than on a 1080Ti.

Likewise, the Vulkan path in DOOM uses special shader extensions and async compute on AMD hardware, but not on NV hardware. Games are rarely apples to apples. So it all depends on how you cherry pick.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 41 of 74, by dexvx

Posted on 2017-10-04, 16:27

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Scali wrote:
Depends on what you're interested in. Is it: 1) The performance of video cards (and their drivers) in actual games 2) The perfor […]
Show full quote
dexvx wrote:
Scali wrote:
It all depends on how you want to cherry-pick.

Picking AAA graphics intensive PC gaming titles as they come in? The next big ones will be Destiny 2 and SW: BattleFront II. Do you disagree with this?

Depends on what you're interested in. Is it:
1) The performance of video cards (and their drivers) in actual games
2) The performance and capabilities of the actual hardware (and their drivers)

They're not the same thing.

Seems like they overlap quite a bit.

Aside from workstation and compute performance (which has little to do with the use case of a PS4/Xbox vs Modern PC), how else would one characterize #2?

Scali wrote:

Also, part of his argument is literally: "25% more transistors + some optimization = 50%+ more performance"... Really?

Really? Recent Benchmarks have shown GTX 1070 is about 25-30% ahead of the RX 580. I stated with 25% more CU's (not transistors), the XBox One X GPU should be on par with the GTX 1070.

Do you deny that with increasing amounts of native DX12/Vulkan titles coming out, it has become more favorable to the Radeon arch? FFS, I link a list of 20 games and you can clearly see the data that backs this claim up.

Scali wrote:

Likewise, the Vulkan path in DOOM uses special shader extensions and async compute on AMD hardware, but not on NV hardware. Games are rarely apples to apples. So it all depends on how you cherry pick.

Because Nvidia doesn't have true async compute. Maxwell is software and Pascal is pseudo-hardware.

You can also argue the same that lots of games have proprietary nvidia GameWorks optimizations that totally crap on Radeon performance.

Reply 42 of 74, by Scali

Posted on 2017-10-04, 16:34

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

dexvx wrote:
Aside from workstation and compute performance (which has little to do with the use case of a PS4/Xbox vs Modern PC), how else would one characterize #2?

Develop your own tests and make sure no driver developer ever gets their hands on them so they can't 'optimize' their drivers for your software and distort the results.

dexvx wrote:
Do you deny that with increasing amounts of native DX12/Vulkan titles coming out, it has become more favorable to the Radeon arch?

What do you mean with 'it'?
I see nothing in the DX12/Vulkan API that would benefit the Radeon architecture more.
What I do see is that the driver layer is simplified, and more functionality has been pushed into the application instead. This masks AMD's shortcomings in the driver department somewhat.
Ironically enough the current state of these APIs is quite limited in some key areas, most notably the resource management having quite poor granularity and control. This was generally handled more efficiently in DX < 12 and OpenGL drivers than is currently possible with the new APIs.
All in all, it's a software thing, not a hardware thing (although you could argue that some of AMD's driver shortcomings are the result of hardware choices that do not fit the API model very well, and are therefore difficult to work around in the drivers).

dexvx wrote:
Because Nvidia doesn't have true async compute. Maxwell is software and Pascal is pseudo-hardware.

And there you go. You just unmasked yourself as someone who has no clue what they're talking about, and is just repeating what the other AMD fanboys on the web say.
This is just complete nonsense (not to mention a 'No true Scotsman'-fallacy).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 43 of 74, by vladstamate

Posted on 2017-10-04, 17:06

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

dexvx wrote:

Aside from workstation and compute performance (which has little to do with the use case of a PS4/Xbox vs Modern PC), how else would one characterize #2?

I might have misunderstood you here. But that is not entirely true. At least for PS4 the async compute (64 compute queues) are actually used a lot. Definitely by first party games and engines but also by some third party (like Frostbite). It is used for anything from vertex compression, visibility determination and various AA algorithms. Those techniques might or might not work as well on modern PC, but I know the 64 compute queues is something Sony asked AMD to give them at at the time.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 44 of 74, by dexvx

Posted on 2017-10-05, 01:38

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Scali wrote:

Develop your own tests and make sure no driver developer ever gets their hands on them so they can't 'optimize' their drivers for your software and distort the results.

I'm sure that will be useful in the real world. Within my business group at my hardware company, we have a 'performance' team that loves sending out whitepapers to marketing claiming how good our hardware is when using certain software. Problem is that their test cases are totally unrealistic compared to real world usage of our hardware. Queue customer and open source mailing lists hate.

Scali wrote:

I see nothing in the DX12/Vulkan API that would benefit the Radeon architecture more.
What I do see is that the driver layer is simplified, and more functionality has been pushed into the application instead. This masks AMD's shortcomings in the driver department somewhat.
Ironically enough the current state of these APIs is quite limited in some key areas, most notably the resource management having quite poor granularity and control. This was generally handled more efficiently in DX < 12 and OpenGL drivers than is currently possible with the new APIs.

End result is all that matters.

Architects can speculate if it was a combination of hardware, software, better scheduling, etc. The fact of the matter is that low level API's like DX12/Vulkan is correlated with better relative performance of the Radeon architecture (which is the whole hardware + software package).

dexvx wrote:
Because Nvidia doesn't have true async compute. Maxwell is software and Pascal is pseudo-hardware.

Scali wrote:

And there you go. You just unmasked yourself as someone who has no clue what they're talking about, and is just repeating what the other AMD fanboys on the web say.
This is just complete nonsense (not to mention a 'No true Scotsman'-fallacy).

I'm not a graphics engineer, I work on other parts of the system. But various linux mailing lists are full of fun compliments of Maxwell/Pascal async compute problems.

Please explain how Maxwell offers great async compute performance, because the biggest complaints are how terrible serializing concurrent queues are within Maxwell and its static SM partitioning.

Pascal is a very different story. Biggest complains center around context switches. If your compute is finished well before frame is rendered, your compute is just sitting idle. Not a big issue if you partition correctly, but certainly AMD's implementation of a compute being able to get served work on OTHER queues should be far more beneficial.

Again, end result is what matters. Performance gains from async compute is better (percentage wise) on Radeon HW than on Pascal. Not even going to mention Maxwell (which for all purposes is flat).

Reply 45 of 74, by Scali

Posted on 2017-10-05, 08:46

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

dexvx wrote:
End result is all that matters.

Thing is, there is no single end result. Performance is all over the place. Which is why I said: it depends on how you cherry-pick.

dexvx wrote:
I'm not a graphics engineer, I work on other parts of the system.

Then don't talk about things you don't know about.

dexvx wrote:
But various linux mailing lists are full of fun compliments of Maxwell/Pascal async compute problems.

Because everyone posting on the internet is an expert, right?

dexvx wrote:
Please explain how Maxwell offers great async compute performance, because the biggest complaints are how terrible serializing concurrent queues are within Maxwell and its static SM partitioning.

Firstly, where did I say Maxwell offers great async compute performance?
Secondly, you were the one who claimed that Maxwell has 'software async' as opposed to 'true async'... Without any explanation of what either 'software async' nor 'true async' would even mean.

But if you must know, NV disabled async compute on Maxwell in the drivers, after the whole AoTS debacle.
I believe the idea was originally going to be that NV would have an opt-in policy: async can be enabled for software that has specifically been optimized to take advantage of Maxwell, rather than deliberately hurting performance as AoTS did.
That idea quickly became irrelevant when Pascal arrived though. No developer is going to bother investing in a Maxwell-optimized path anymore.
Anyone trying to argue about Maxwell async compute is basically being a clueless idiot, because the drivers always use only a single queue internally.

dexvx wrote:
Pascal is a very different story. Biggest complains center around context switches. If your compute is finished well before frame is rendered, your compute is just sitting idle. Not a big issue if you partition correctly, but certainly AMD's implementation of a compute being able to get served work on OTHER queues should be far more beneficial.

Again, end result is what matters. Performance gains from async compute is better (percentage wise) on Radeon HW than on Pascal. Not even going to mention Maxwell (which for all purposes is flat).

You touched on some of the subject matter, yet failed to put 2+2 together here...
Firstly, async compute does not necessarily give performance gains. It's not a silver bullet. If you apply async compute poorly, you may well get a performance drop. If you would bother to read some of the documentation and presentations from MS, Intel, NV and AMD on the subject, you'll see that they all warn about improper use of async compute as a performance hazard in DX12.

Secondly, the key point with async compute is that resources are being shared between different types of tasks. You are combining graphics tasks and compute tasks, both competing for the GPU's resources (which is why it can hurt performance: you can be starving certain resources if you do it wrong).
NV's graphics pipeline is oodles more efficient than AMD's, so the balance is completely different.
So it is certainly not a given that GPU A should be able to get the same performance as GPU B even if both have the same async compute implementation (heck, this doesn't even go for different variations of the same GPU architecture, with different amounts of CU's, different memory bandwidth etc).
This also implies that you should develop separate paths for each GPU, yet I've not seen a single game do this yet. Most of the time they had developed an async path for AMD-based consoles, and ported that as-is to the PC.

Lastly, it should be obvious why Maxwell is flat, given the above (the fact that there are no performance drops either should have been a hint to anyone with half a clue).

Now I suggest you stop talking about this subject, because the ongoing stream of misinformation is annoying and damaging (also, that talk about 'other queues' is nonsense, no idea where you got that from... NV has been able to do that since they introduced HyperQ on Kepler. Think of use cases of having multiple CUDA applications running side-by-side, or a single GPU being used on a system running multiple VMs etc. See here, Kepler does up to 32 concurrent compute tasks: http://www.nvidia.com/object/nvidia-kepler.html).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 46 of 74, by Jade Falcon

Posted on 2017-10-05, 16:47

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Hay Scali, post some poof to back up your words. There are a lot of articles from semi to very credible sources out there to back up what dexvx is saying but I'm nit seeing much out there to back up your words.

Reply 47 of 74, by Scali

Posted on 2017-10-05, 16:53

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Jade Falcon wrote:
Hay Scali, post some poof to back up your words. There are a lot of articles from semi to very credible sources out there to back up what dexvx is saying but I'm nit seeing much out there to back up your words.

I've already covered it on my blog long ago.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 48 of 74, by Jade Falcon

Posted on 2017-10-05, 16:55

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Scali wrote:
Jade Falcon wrote:
Hay Scali, post some poof to back up your words. There are a lot of articles from semi to very credible sources out there to back up what dexvx is saying but I'm nit seeing much out there to back up your words.

I've already covered it on my blog long ago.

But do you have proof to back your words up? A blog post is not very good proof without something to back it up. Not everyone posting on the internet is an expert you know.

Last edited by Jade Falcon on 2017-10-05, 16:59. Edited 1 time in total.

Reply 49 of 74, by Scali

Posted on 2017-10-05, 16:58

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Jade Falcon wrote:
But do you have proof to back your works up? Some random blog is not proof. Not everyone posting on the internet is an expert you know.

There are explanations that anyone with some common sense should be able to understand, combined with links to relevant documentation from MS, AMD, Intel, NV and other websites on the topic (I was part of the DX12 Early Access program, and in a way, helped develop the standard with MS and the other vendors).

Which is all pointless, because AMD fanboys don't have common sense to begin with (the rest wouldn't even bother asking to back things up, since they are self-explanatory).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 50 of 74, by Jade Falcon

Posted on 2017-10-05, 17:01

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Scali wrote:
Jade Falcon wrote:
But do you have proof to back your works up? Some random blog is not proof. Not everyone posting on the internet is an expert you know.

There are explanations that anyone with some common sense should be able to understand, combined with links to relevant documentation from MS, AMD, Intel, NV and other websites on the topic (I was part of the DX12 Early Access program, and in a way, helped develop the standard with MS and the other vendors).

Which is all pointless, because AMD fanboys don't have common sense to begin with (the rest wouldn't even bother asking to back things up, since they are self-explanatory).

That's cool. I used to work for Intel and AMD at the same time. And now I'm posting at a computer from the CIA wile I'm on my lunch brake.

I'll take it you don't have any proof to back up your words just like I don't have any proof to back up what I just posted?

Last edited by Jade Falcon on 2017-10-05, 17:06. Edited 1 time in total.

Reply 51 of 74, by Scali

Posted on 2017-10-05, 17:06

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Jade Falcon wrote:
I'll take it you don't have any proof to back up your words just like I don't have any proof to back up what I just posted?

Did you even go to my blog to read the articles?
Or do I have to specifically link them for you?

After reading that, if anything is still unclear, would you mind asking specific questions of what exactly I need to prove, and how exactly you would like me to prove that?

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 52 of 74, by Jade Falcon

Posted on 2017-10-05, 17:07

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Scali wrote:
Did you even go to my blog to read the articles? Or do I have to specifically link them for you? […]
Show full quote
Jade Falcon wrote:
I'll take it you don't have any proof to back up your words just like I don't have any proof to back up what I just posted?

Did you even go to my blog to read the articles?
Or do I have to specifically link them for you?

After reading that, if anything is still unclear, would you mind asking specific questions of what exactly I need to prove, and how exactly you would like me to prove that?

Yes, a link would be nice, not everyone knows you well enough to even know you have a blog let alone were it is. But jugging by your posts here I'm going to bet its just a post you put tougher with nothing to back it. I'm hoping that not's the case.

Reply 53 of 74, by Scali

Posted on 2017-10-05, 17:11

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Jade Falcon wrote:
Yes, a link would be nice, not everyone knows you well enough to even know you have a blog.

Looking at my signature might give that away...
This is what you get by searching for DX12:
https://scalibq.wordpress.com/?s=DX12

This may be a good starting point: https://scalibq.wordpress.com/2016/08/10/dire … d-what-it-isnt/

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 54 of 74, by Jade Falcon

Posted on 2017-10-05, 17:19

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Thanks. I'll take a look at them later today. Hopefully they can be useful for others too.

P.S. I have signature disabled.

Reply 55 of 74, by dexvx

Posted on 2017-10-06, 00:33

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Partial responses for now due to work.

Scali wrote:
dexvx wrote:
End result is all that matters.

Thing is, there is no single end result. Performance is all over the place. Which is why I said: it depends on how you cherry-pick.

You do not believe in mathematical theories such as the average, geometric mean, median, etc? GTX 1080 is, on average, about 5-10% faster in 3D games than the Vega 64. Are all the singular datapoints (games) where the GTX 1080 is faster just cherry-picked and therefore be discarded? Or are you implying the opposite?

Scali wrote:

dexvx wrote:
I'm not a graphics engineer, I work on other parts of the system.

Then don't talk about things you don't know about.

I'll proceed and tell those game devs to shut up.

Scali wrote:
Firstly, where did I say Maxwell offers great async compute performance? Secondly, you were the one who claimed that Maxwell has […]
Show full quote

dexvx wrote:
Please explain how Maxwell offers great async compute performance, because the biggest complaints are how terrible serializing concurrent queues are within Maxwell and its static SM partitioning.

Firstly, where did I say Maxwell offers great async compute performance?
Secondly, you were the one who claimed that Maxwell has 'software async' as opposed to 'true async'... Without any explanation of what either 'software async' nor 'true async' would even mean.

I believe the idea was originally going to be that NV would have an opt-in policy: async can be enabled for software that has specifically been optimized to take advantage of Maxwell, rather than deliberately hurting performance as AoTS did.
That idea quickly became irrelevant when Pascal arrived though. No developer is going to bother investing in a Maxwell-optimized path anymore.
Anyone trying to argue about Maxwell async compute is basically being a clueless idiot, because the drivers always use only a single queue internally.

It's software from my perspective because I can't change the SM's on the fly. How the fvk would I know what my fractional workload (which is often times dynamic) is going to be?

Please cite anywhere that AoTS was developed with the intention of deliberately hurting performance. From the papers I've read, AoTS's design was to just emulate a single logical compute queue and then serializing tasks into the graphics queue. It just so happens that Maxwell's very static implementation of async compute (with explicit scheduling of graphics and compute tasks) was terrible at executing the task structured as such.

Just because you don't like their implementation doesn't mean you can call it sabotage.

Scali wrote:

You touched on some of the subject matter, yet failed to put 2+2 together here...
Firstly, async compute does not necessarily give performance gains. It's not a silver bullet. If you apply async compute poorly, you may well get a performance drop. If you would bother to read some of the documentation and presentations from MS, Intel, NV and AMD on the subject, you'll see that they all warn about improper use of async compute as a performance hazard in DX12.

Improper use of ANY technology can be a hazard to performance (e.g. With some idiotic changes to RDT in Skylake-SP servers, I can make them slower than Core 2's Xeons). From the presentations I saw at GDC, the attitude is certainly not about discouraging async compute, it's about avoiding the 'async tax'. Very similar situation when SMT came out.

http://32ipi028l5q82yhj72224m8j.wpengine.netd … e-Deep-Dive.pdf

Anyways, you give nice and detailed technical explanations (like some of our PE's and higher). However, like many PE's, you seem to ignore real world results that don't conform to your internal theories (and dismiss anomalies as 'cherry picked' data).

Scali wrote:

Now I suggest you stop talking about this subject, because the ongoing stream of misinformation is annoying and damaging (also, that talk about 'other queues' is nonsense, no idea where you got that from... NV has been able to do that since they introduced HyperQ on Kepler. Think of use cases of having multiple CUDA applications running side-by-side, or a single GPU being used on a system running multiple VMs etc. See here, Kepler does up to 32 concurrent compute tasks: http://www.nvidia.com/object/nvidia-kepler.html).

What does HPC compute tasks have to do with async compute? HyperQ on Kepler (and mind you it's only Big Kepler, GK110) has 32 queues, but is useless for gaming because they cannot be mixed with the graphics queue. Also, I'm pretty sure they're called queues, as Nvidia and Microsoft (at the very least) talks about them as such.

http://developer.download.nvidia.com/compute/ … /doc/HyperQ.pdf
https://msdn.microsoft.com/en-us/librar ... s.85).aspx

Reply 56 of 74, by spiroyster

Posted on 2017-10-06, 07:45

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 707
Joined: 2015-10-12, 12:26

dexvx wrote:
Anyways, you give nice and detailed technical explanations (like some of our PE's and higher). However, like many PE's, you seem to ignore real world results that don't conform to your internal theories (and dismiss anomalies as 'cherry picked' data).

This reminds me of a joke... 🤣

funny person wrote:
A man in a hot air balloon realized he was lost. He reduced altitude and spotted a man below. He descended a bit more and shoute […]
Show full quote
A man in a hot air balloon realized he was lost. He reduced altitude and spotted a man below. He descended a bit more and shouted,

"Excuse me, can you help me? I promised a friend I would meet him an hour ago, but I don't know where I am."

The man below replied, "You are in a hot air balloon hovering approximately 30 feet above the ground. You are between 40 and 41 degrees north latitude and between 59 and 60 degrees west longitude."

"You must be an engineer," said the balloonist.

"I am," replied the man, "How did you know?"

"Well," answered the balloonist, "everything you told me is technically correct, but I have no idea what to make of your information, and the fact is I am still lost. Frankly, you've not been much help so far."

The man below responded, "You must be a manager."

"I am," replied the balloonist, "but how did you know."

"Well," said the man, "you don't know where you are or where you are going. You have risen to where you are due to a large quantity of hot air. You made a promise that you have no idea how to keep, and you expect me to solve your problem. The fact is, you are in exactly the same position you were in before we met, but now, somehow, it's my fault."

Reply 57 of 74, by Scali

Posted on 2017-10-06, 09:41

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

dexvx wrote:
You do not believe in mathematical theories such as the average, geometric mean, median, etc? GTX 1080 is, on average, about 5-10% faster in 3D games than the Vega 64. Are all the singular datapoints (games) where the GTX 1080 is faster just cherry-picked and therefore be discarded? Or are you implying the opposite?

I'm not sure if you understand what cherry picking is...
If you want to talk about maths and statistics, the point is 'outlier'... If a certain result deviates far from the average/mean/median/whatnot, then there is something interesting going on there, which warrants deeper inspection in order to try and explain why this particular result is faster/slower than most.
Cherry-picking is where you only pick the results that support your argument. In some cases that can be people only picking the 'outliers' where their brand of choice is faster, discarding the results that bring the average down.
In other cases, people may discard the 'outliers' where their brand of choice is slower.
It all depends on what outliers you have.

In the case of DX12/Vulkan, the sample space is currently too small to even have a good idea of what the expected results are, vs what the outliers would be.

dexvx wrote:
I'll proceed and tell those game devs to shut up.

Game devs don't necessarily understand hardware these days. Heck, many of then don't even know assembly language anymore. They're probably doing the same as you do: repeat stuff they read on the internet, because confirmation bias.
Really, you'll have to come up with something better.

dexvx wrote:
It's software from my perspective because I can't change the SM's on the fly. How the fvk would I know what my fractional workload (which is often times dynamic) is going to be?

As far as I recall, Maxwell can very well change the SMs on the fly (as pointed out, they have been able to do that since Kepler with HyperQ), the only limitation being that it cannot do it while a draw call is executing, because draw calls are not pre-emptive. So it can change the SMs between any two draw calls.
In which case you are misunderstanding the hardware and misrepresenting the facts.

dexvx wrote:
Please cite anywhere that AoTS was developed with the intention of deliberately hurting performance.

Why do I have to need to cite anything? How about common sense?
They released a benchmark which had worse performance on Maxwell when async is enabled.
If you didn't want to hurt performance on NV, you would either not enable the async path at all, or you would make an alternative path that doesn't hurt performance.
In fact, why would they even release a benchmark at all, of a game that was still far from finished at the time?
Not to mention that AoTS was an AMD-sponsored game, so the writing is on the wall, isn't it?

If you look at DOOM, yes it was AMD-sponsored, and yes, it has AMD-specific shader optimizations, and yes it uses async compute optimized for AMD... But at least they don't enable async compute on NV at all, so it doesn't HURT performance. That's what one expects in a game.

dexvx wrote:
From the papers I've read, AoTS's design was to just emulate a single logical compute queue and then serializing tasks into the graphics queue. It just so happens that Maxwell's very static implementation of async compute (with explicit scheduling of graphics and compute tasks) was terrible at executing the task structured as such.

Putting the cart before the horse, are we?
The point of writing a game should be to make it run as fast as possible, and make it look as good as possible. What you're saying just proves my point: they ran a task that was structured in a way that it ran very poorly on Maxwell.
Why would you even allow such a code path to run on the hardware? QA should have figured out that this didn't work on that hardware, so you disable it. After all, async compute doesn't change anything about how the game looks. It's merely a basic tool that may or may not allow you to get small gains on certain hardware if you can use it correctly.
It should be disabled by default, unless you made specific optimizations and have verified that they indeed improve performance during QA. This is also what the DX12 best practices docs say.
Instead, not only did they enable it by default, they even went as far as shout out in the media that NV's hardware was broken and whatnot. Which is what got us to where we are today, with people like you arguing about how only AMD has "true async". It's a dirty game that AMD has been playing, and you fell for it.

dexvx wrote:
Anyways, you give nice and detailed technical explanations (like some of our PE's and higher). However, like many PE's, you seem to ignore real world results that don't conform to your internal theories (and dismiss anomalies as 'cherry picked' data).

Actually, it's the other way around.
See, async compute is mainly an AMD marketing tool. It is basically the only DX12-thing that they can sorta do. Not to mention that they get it 'for free' on the PC platform since game devs also use it on consoles.
As a result, AMD's DX12 strategy has been to focus 100% on async compute (and completely ignore other new features of the API, many of which they didn't even implement). The only software out there that uses async compute and ISN'T AMD-sponsored/biased is FutureMark's Time Spy.
So the async compute consensus on the net is very much AMD-biased. I am one of the few who has a more balanced view, and actually understands how async compute works and what the differences are between the archictectures.
Pretty much everything else is "Look, AMD is faster, NV is fake!", which is nonsense of course. It's about as nonsensical as saying that AMD's CPUs must be 'pseudo-hardware' because they can't run x86 software as quickly as Intel can.
Different architectures just have different solutions to the same problem, which comes with different performance characteristics and optimization strategies.

Time Spy is the only 'fair' async compute test we have so far, and we can see that it indeed works on NV hardware. It doesn't get as much as a boost as it does on AMD hardware, but does that make NV's bad or fake? No. Their architecture is just different. As I already said before, NV's pipeline is far more efficient than AMD's, so there is less to gain with async compute in the first place. Even if NV would copy AMD's async compute implementation 1:1 and glue it onto Pascal, you wouldn't see the same gains as you get on AMD hardware. That's not because the async compute doesn't work that well, it's because the other parts of the GPU are more efficient, and therefore take away more of your resources (just look at how much more performance NV gets out of the same memory bandwidth or GFLOPS... eg compare RX580 with GF1070, which have the same memory interface. Then think about what that means for something like async compute... NV cards that generally perform at the same level as AMD cards, are far 'lighter' in terms of actual resources. They just perform the same because they're more efficient).
Of course you also have to consider the compute units themselves, which are not the same between AMD and NV.

dexvx wrote:
What does HPC compute tasks have to do with async compute?

Isn't that obvious?
Async compute is about running compute shaders asynchronously. It doesn't necessarily have to be paired with graphics tasks. This concept existed long before DX12, because in HPC it was a real problem.
In the end, a graphics task is just another shader task. At the time of Kepler, the hardware wasn't completely generalized in this way yet, so you could not run them in parallel. But later hardware did make this generalization, and the basic concept is just the same: you have multiple shader tasks running in parallel, and you can dynamically schedule these over the SMs.
You can see that in Maxwell v2, they made the first step here: a graphics task could run in parallel with compute tasks, but they had not implemented pre-emption of graphics tasks yet, so an entire draw call had to complete before context switches could be made.
In Pascal they improved granularity considerably: they can now pre-empt graphics tasks at the pixel level.
You could argue that they could still make another step: pre-empt graphics tasks at the instruction level. However, unless you have some extremely long pixel-shaders, it is probably not going to make a difference in practice. And I can see why they did it per-pixel... this way they can perform synchronization at the raster operation level. That probably makes the implementation of the special-case of graphics tasks a whole lot less complicated for context switches.
As far as I know, AMD has not specified exactly when or how they pre-empt, but I doubt that they go beyond pixel level on graphics tasks.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 58 of 74, by Jade Falcon

Posted on 2017-10-06, 14:27

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Just my .02$
With API's and the way game dev code these days just about every game dev does not know the nuts and bolts of the hardware.

You'd be hard pressed to find any PC game dev theses days that do know assembly, the ones that do know assembly ether been around for a long time or started coding outside of PC games. I my self do know assembly, but not for the PC (saga Saturn) its not like coding in C or using an API, you rely need to know the hardware. But with things like c and API, you really don't need to know what the hardware can do, just its limits. OR that's how I look at it.

As for the point of writing a game, I thought that was to ether entertain or make money? running fast and looking good is a by product. There are good looking games that are well coded that end up tanking.

As for the Async thing, I'm not the very knowledgeable on that. I understand what it is and all but it seems to be blown way out of proportion, but I never had a dx12 card or worked with one. Last cast I coded for was the voodoo5500 and that was years ago. Last card I gamed on was a 780gtx under vista and if I recall that's not really a dx12 card nor does vista support dx12.

Reply 59 of 74, by Scali

Posted on 2017-10-06, 14:32

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Jade Falcon wrote:
But with things like c and API, you really don't need to know what the hardware can do, just its limits. OR that's how I look at it.

You would though, if you want to make informed statements about async compute and scheduling, since that happens at the instruction level. As I say in my blogs, async compute is very similar to HyperThreading.
At the API level you just submit entire shaders to execution queues. That tells you nothing about how it gets executed.

Jade Falcon wrote:
As for the point of writing a game, I thought that was to ether entertain or make money? running fast and looking good is a by product. There are good looking games that are well coded that end up tanking.

Nice try, but we're talking about AAA titles here. The fact that they even bother to try and optimize the game with async compute pretty much proves that they wanted to make it run as fast and look as good as possible... on the vendor's hardware that sponsors them, that is.
Because there's only one reason why you would use async compute.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Main menu

Topic actions

Reply 40 of 74, by Scali

Reply 41 of 74, by dexvx

Reply 42 of 74, by Scali

Reply 43 of 74, by vladstamate

Reply 44 of 74, by dexvx

Reply 45 of 74, by Scali

Reply 46 of 74, by Jade Falcon

Reply 47 of 74, by Scali

Reply 48 of 74, by Jade Falcon

Reply 49 of 74, by Scali

Reply 50 of 74, by Jade Falcon

Reply 51 of 74, by Scali

Reply 52 of 74, by Jade Falcon

Reply 53 of 74, by Scali

Reply 54 of 74, by Jade Falcon

Reply 55 of 74, by dexvx

Reply 56 of 74, by spiroyster

Reply 57 of 74, by Scali

Reply 58 of 74, by Jade Falcon

Reply 59 of 74, by Scali