Modern PC vs PS4, Xbox..

Reply 60 of 74, by dexvx

Posted on 2017-10-06, 17:44

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

More partial responses.

Scali wrote:
Cherry-picking is where you only pick the results that support your argument. In some cases that can be people only picking the 'outliers' where their brand of choice is faster, discarding the results that bring the average down.
In other cases, people may discard the 'outliers' where their brand of choice is slower.

???

I literally picked the upcoming AAA title that is going to be released (it was released on Oct 3, a few days prior to my post). It was purely coincidental that it is super optimized for Radeon at the moment. I would've done the same for Destiny 2 and SW: BF II, regardless of outcome.

Scali wrote:

In the case of DX12/Vulkan, the sample space is currently too small to even have a good idea of what the expected results are, vs what the outliers would be.

True (I discount 'ported' titles unless it was a total revamp), but that doesn't mean you can dismiss results as they are coming in.

Scali wrote:

Game devs don't necessarily understand hardware these days. Heck, many of then don't even know assembly language anymore. They're probably doing the same as you do: repeat stuff they read on the internet, because confirmation bias.
Really, you'll have to come up with something better.

Confirmation bias that is confirmed by real world application results. Sure.

Scali wrote:

As far as I recall, Maxwell can very well change the SMs on the fly (as pointed out, they have been able to do that since Kepler with HyperQ), the only limitation being that it cannot do it while a draw call is executing, because draw calls are not pre-emptive. So it can change the SMs between any two draw calls.
In which case you are misunderstanding the hardware and misrepresenting the facts.

Let me spell out the problem for you in Maxwell. I can change SM allocations on the fly, yes. But how the fvk am I supposed to know ahead of time what I'm going to change it to? Am I supposed to dynamically detect when an end user (e.g. gamer) happens wander to somewhere that requires ALU heavy work (thus needing more compute)? You can't be serious when you say 'let's change it on the fly' as a solution. That's akin to scheduling CPU workload manually (without an OS scheduler).

Scali wrote:
Why do I have to need to cite anything? How about common sense? They released a benchmark which had worse performance on Maxwell […]
Show full quote
Why do I have to need to cite anything? How about common sense?
They released a benchmark which had worse performance on Maxwell when async is enabled.
If you didn't want to hurt performance on NV, you would either not enable the async path at all, or you would make an alternative path that doesn't hurt performance.
In fact, why would they even release a benchmark at all, of a game that was still far from finished at the time?
Not to mention that AoTS was an AMD-sponsored game, so the writing is on the wall, isn't it?

Because your wording was malicious towards AoTS. If you think that's an outlier (which I think it is, btw), than that would be acceptable. But no, because AoTS is developed in a way that's unfriendly to Maxwell (at the time), you go on and denigrate them.

I could literally say the same thing about Project Cars (where it performs like sh1t on Radeon).

Scali wrote:

Putting the cart before the horse, are we?
The point of writing a game should be to make it run as fast as possible, and make it look as good as possible. What you're saying just proves my point: they ran a task that was structured in a way that it ran very poorly on Maxwell.

Actually, the point of writing a game is to get as many sales as possible. There are plenty of titles that run like sh1t on both Nvidia and AMD hardware (Dishonored). In a perfect world, game companies release bug free games running optimally on all hardware paths while looking superb, but the reality is that 'good enough' performance is the target. Squashing user bugs is far more important than eeking out that last 20% more performance. Unless... some hardware company who wanted to show off certain aspects of their hardware wanted to pay for that development. And since Nvidia has way more free cash than AMD, guess which way game companies usually go?

Scali wrote:

Why would you even allow such a code path to run on the hardware? QA should have figured out that this didn't work on that hardware, so you disable it. After all, async compute doesn't change anything about how the game looks. It's merely a basic tool that may or may not allow you to get small gains on certain hardware if you can use it correctly.
It should be disabled by default, unless you made specific optimizations and have verified that they indeed improve performance during QA. This is also what the DX12 best practices docs say.
Instead, not only did they enable it by default, they even went as far as shout out in the media that NV's hardware was broken and whatnot. Which is what got us to where we are today, with people like you arguing about how only AMD has "true async". It's a dirty game that AMD has been playing, and you fell for it.

So you're upset that a beta build of AoTS had async compute enabled by default, and a bunch of reviewers posted stuff about it and sent the fanbois crazy? Last I heard, the released version of AoTS had async compute not enabled by default if it detected a Maxwell based card.

And please cite where the AoTS devs directly say Maxwell hardware was broken. And yes, I stand by my wording saying Maxwell has software async compute due issues I've outlined above.

Scali wrote:

See, async compute is mainly an AMD marketing tool. It is basically the only DX12-thing that they can sorta do. Not to mention that they get it 'for free' on the PC platform since game devs also use it on consoles.
As a result, AMD's DX12 strategy has been to focus 100% on async compute (and completely ignore other new features of the API, many of which they didn't even implement). The only software out there that uses async compute and ISN'T AMD-sponsored/biased is FutureMark's Time Spy.

You are quite correct about AMD using async compute as a marketing tool throughout the Hawaii/Fiji era. Nvidia was totally silent about the whole thing until Pascal. Then magically with Pascal, Nvidia was talking about async compute and its potential publicly (GDC). How curious and convenient that Nvidia was embracing async compute just when they had capable hardware!

Scali wrote:

Pretty much everything else is "Look, AMD is faster, NV is fake!", which is nonsense of course. It's about as nonsensical as saying that AMD's CPUs must be 'pseudo-hardware' because they can't run x86 software as quickly as Intel can.
Different architectures just have different solutions to the same problem, which comes with different performance characteristics and optimization strategies.

Oh now you're venturing into my space. Actually AMD CPU's cant run x86 (+full extensions) as well. I mean FFS, there was a time where Ryzen can't compile in certain situations*. And it's confusing to say 'run x86 as fast as', as compute speed varies on workload (int vs fp in a general sense). For fp, Zen has much lower fp throughput (16 ops/clock with 2x 256bit FMA vs 8 ops/clock with 2x 128bt FMA) for whatever reason. I would venture that they envision their servers to have Zen as int and Vega as fp.

x86 also includes many extensions. AMD usually falls behind on implementing said extensions. E.g. Ryzen AVX-256 implementation is 4x 128bit units that can be paired for a 256bit op. However, only one pair can do add and one to do mul. Compared to Skylake AVX-256 where it is 2x 256bit units that can do add or mul per unit. They're also missing AVX-512 extensions.

And their CCX structure is a mess at higher core counts (cross CCX is a massive penalty due to limited fabric). Surprise! It depends on how you schedule your workload.

* https://www.phoronix.com/scan.php?page=news_i … Compiler-Issues

So feel free to take your leave from CPU world 😀

Scali wrote:

Time Spy is the only 'fair' async compute test we have so far, and we can see that it indeed works on NV hardware. It doesn't get as much as a boost as it does on AMD hardware, but does that make NV's bad or fake? No. Their architecture is just different. As I already said before, NV's pipeline is far more efficient than AMD's, so there is less to gain with async compute in the first place. Even if NV would copy AMD's async compute implementation 1:1 and glue it onto Pascal, you wouldn't see the same gains as you get on AMD hardware.

Speaking of which.

Reply 61 of 74, by Scali

Posted on 2017-10-06, 18:51

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

dexvx wrote:
Confirmation bias that is confirmed by real world application results. Sure.

Yes. You see... the bias is not in the results, it is in how they are being interpreted.
Your interpretation being that NV has "software" or "pseudo-hardware" implementations of async compute as opposed to "true compute" for AMD.
This is your biased view of reality, and you are using real world results as 'confirmation' for your bias.

Nobody is arguing that AMD hardware gets more gain from async compute. I am arguing against your biased 'explanation' of this, and am instead giving a more realistic explanation why AMD gets more gains from async compute than "software" or "pseudo-hardware".

dexvx wrote:
Because your wording was malicious towards AoTS. If you think that's an outlier (which I think it is, btw), than that would be acceptable. But no, because AoTS is developed in a way that's unfriendly to Maxwell (at the time), you go on and denigrate them.

Again, no, this is your bias showing again.
NV always gets the blame when AMD hardware doesn't perform well in certain titles. But when AoTS deliberately enables a code path that has absolutely no reason to be enabled whatsoever on Maxwell, suddenly it's still NV's fault? No, that code simply should not have been enabled. As I said, QA should have caught that, and it should have been fixed before they released it into the wild.

dexvx wrote:
So you're upset that a beta build of AoTS had async compute enabled by default, and a bunch of reviewers posted stuff about it and sent the fanbois crazy? Last I heard, the released version of AoTS had async compute not enabled by default if it detected a Maxwell based card.

Again, bias showing.
Firstly, I am not upset.
Secondly, they left the feature enabled, even after NV specifically asked them to disable it.
Thirdly, reviewers? No, the developers themselves have been quoted on various sites as claiming that NV hardware can't do async:
http://www.dsogaming.com/news/oxide-developer … -the-benchmark/

Which you could have googled yourself, but obviously you're not interested in the truth, you're just a clueless AMD fanboi. This discussion is over.

Oh and by the way, you totally misunderstand what 'static partitioning' is... It's not you who creates the partitions (there is no API for that), it's the driver (which is where my explanation of pre-emption comes in... if you can't pre-empt a draw call, you can't repartition at that point. The driver has to wait until the draw call finishes). Statements like that make you look like a fool.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 62 of 74, by dexvx

Posted on 2017-10-07, 04:29

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Scali wrote:
Yes. You see... the bias is not in the results, it is in how they are being interpreted. ... This is your biased view of reality […]
Show full quote

Yes. You see... the bias is not in the results, it is in how they are being interpreted.
...
This is your biased view of reality, and you are using real world results as 'confirmation' for your bias.
...
... but obviously you're not interested in the truth, you're just a clueless AMD fanboi. This discussion is over.

Okay, my mind is literally blown with your statement here. You basically just stated that real world results correlate to my own confirmation bias.

One read of your blog and anyone will know where your bias is. As for me being a clueless AMD fanboi, I find that statement interesting because my above post literally just talked down about their Zen uarch.

Scali wrote:
Again, no, this is your bias showing again. NV always gets the blame when AMD hardware doesn't perform well in certain titles. B […]
Show full quote

Again, no, this is your bias showing again.
NV always gets the blame when AMD hardware doesn't perform well in certain titles. But when AoTS deliberately enables a code path that has absolutely no reason to be enabled whatsoever on Maxwell, suddenly it's still NV's fault? No, that code simply should not have been enabled. As I said, QA should have caught that, and it should have been fixed before they released it into the wild.

Firstly, I am not upset.
Secondly, they left the feature enabled, even after NV specifically asked them to disable it.
Thirdly, reviewers? No, the developers themselves have been quoted on various sites as claiming that NV hardware can't do async:
http://www.dsogaming.com/news/oxide-developer … -the-benchmark/

Ah no, reread your own citation, Mr. Bias.

dsogaming wrote:

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only ‘vendor’ specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware.
...
This isn’t a vendor specific path, as it’s responding to capabilities the driver reports.

Translation: Maxwell at the time had an async compute capability flag. AoTS was developed with AMD's implementation of async compute in mind. Turning on Maxwell async compute results in worse performance, so they disabled that for the release. So if turning on a hardware feature is useless, then it totally fair to say it is broken for their use case. And BTW, Maxwell async was shut down on release, so please stop spouting that lie again.

Just like if you use AVX-256 workloads on Ryzen CPU's and in edge situations you find yourself having lower performance than with AVX disabled. Not my fault they have the capability exposed, and therefore is used. I'm not going to #ifdef like it is the 90s. Fvking defeats the purpose of hardware capabilities flag.

Scali wrote:

Again, bias showing.

Your bias was shown when you couldn't even comprehend what the Oxide developers were trying to say.

Scali wrote:

Oh and by the way, you totally misunderstand what 'static partitioning' is... It's not you who creates the partitions (there is no API for that), it's the driver (which is where my explanation of pre-emption comes in... if you can't pre-empt a draw call, you can't repartition at that point. The driver has to wait until the draw call finishes). Statements like that make you look like a fool.

When did I say this was done at API level? Screenshot please. This is of course done at driver level. But the problem is every game is different. So you either needed godly heuristics (which sure the hell wasn't there with Maxwell) or your driver team needed to optimize on a per title basis. Consequently, this entire subject is moot because even Nvidia realizes how Maxwell async is useless, so for practical purposes it is currently disabled. Thus, it is totally fair to say Maxwell async is broken ('software' was just a nice way of saying it).

Reply 63 of 74, by Scali

Posted on 2017-10-07, 05:26

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Can anyone explain dexvx what confirmation bias is? Even after I tried to explain it, he doesn't seem to understand: https://en.wikipedia.org/wiki/Confirmation_bias

Also...

dexvx wrote:
I can change SM allocations on the fly, yes. But how the fvk am I supposed to know ahead of time what I'm going to change it to? Am I supposed to dynamically detect when an end user (e.g. gamer) happens wander to somewhere that requires ALU heavy work (thus needing more compute)? You can't be serious when you say 'let's change it on the fly' as a solution. That's akin to scheduling CPU workload manually (without an OS scheduler).

Am I the only one who reads this as dexvx himself wanting to change SM allocations on the fly (so that would have to be at API level by default)? As opposed to having the driver do it?
Beacuse the way he uses phrasing like "I can change..." and "how am I supposed to know what to change it to"... and "am I supposed to..." makes it pretty obvious, to me at least, that he thinks that he (as a developer) has to control the SM allocations, instead of understanding that the driver does it for him (seeing as it's a hardware-specific thing, which is abstracted by the DX12 API).
Which would mean that he is backpedaling hard... Not to mention the rest of his reply being nothing more than personal attacks at this point. Sore loser.

Oh, and there is no async compute flag in DX12. That's just more lies from Oxide/AMD. You can create multiple queues, and submit work to each of these queues with the DX12 API. How this is executed is up to the driver. You don't know if it will run it in serial or parallel or whatever (much like on CPUs... when you create a thread, you don't normally know on which core(s) it executes or when/how).

That's what you get when you take a knife to a gun fight.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 64 of 74, by Jade Falcon

Posted on 2017-10-07, 05:58

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Scali wrote:

Nice try, but we're talking about AAA titles here. The fact that they even bother to try and optimize the game with async compute pretty much proves that they wanted to make it run as fast and look as good as possible... on the vendor's hardware that sponsors them, that is.
Because there's only one reason why you would use async compute.

Fallout 3 was a aaa game that looked like shit and was so poorly written that it had alot of problems on systems with quads or systems running vista/7. Not all aaa games are well written or even have very good graphics.
The second Assassin creed is another example of a portly wirten aaa game, that games coppy prevision failed before the game was even released, nore did it make playing the game fun for the folks that bought the game. Even more so AAA game developers are out to make money.

On a side note, Scali your posts are staring to get funny again.
This theard has turned into the argument clinic. But it looked like the full course was bought.
One simply says the opposite wile the other has a point with something (not much) to back it.

Last edited by Jade Falcon on 2017-10-07, 06:24. Edited 1 time in total.

Reply 65 of 74, by Jade Falcon

Posted on 2017-10-07, 06:24

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Scali wrote:

That's what you get when you take a knife to a gun fight.

mythbusters proved that at close range the knife welder has a higher chance of winning, just saying.

Reply 66 of 74, by spiroyster

Posted on 2017-10-07, 09:08

spiroyster Offline

Rank Oldbie

Rank: Oldbie
Posts: 699
Joined: 2015-10-12, 12:26

I assume you can have a DX12 context that doesn't draw a single pixel to screen? This is the case with vulkan (nothing but compute queues if desired), would the Maxwell bubble still be present in this case?

Also when it's claimed nV async compute is done in software (CPU), this is only for those queues that are executed (high priority) while a draw queue is active and doing its thang? When the draw has finished, if you didn't have anything lined up in the swap chain, would all SM's be available for compute next 'frame'? What is actually done in software in this case? I appreciate games are less likely to have a situation where they don't need draw, but still...ya'know for other situations o.0.

If you look at something like the fillrate, its only AMD's highest VEGA (Radeon PRO) which can compete with nV's offerings for the past few years, this perhaps hints at the fact that nVidia's architecture is much more streamlined? I've read the AMD architecture had so many bubbles in the first place there was mucho low hanging fruit to captialise on (very good harvest!). Vulkan is essentially AMD conceived, "industry forged" (TM), so in that regard the verbosity of this new fangled API would remove AMD's repsonsiblity for GPU resource management, and place it firmly in the hands of the client, while fitting extremely well with AMD's larger amount of small CU's, rather than longer more efficient SM's.

I think this was a good shot fired by AMD at nV, which, while not fatal, scared nV as it showed that an API like vulkan could expose inefficiencies in its architecture since Vulkan (having consumed OpenCL) is directly competeing with CUDA now. Pascal seems to have a fixed this (and has put the 'async' into compute), so now nV has harvested its low hanging fruit, we are back to brute force fighting over actual architecture throughput, both players seeming having made their offerings as effieicent as possible. Given history and current line-up.. my money would be with nV personally.

Jade Falcon wrote:
The second Assassin creed is another example of a portly wirten aaa game, that games coppy prevision failed before the game was even released, nore did it make playing the game fun for the folks that bought the game.

Why is Assasins Creed a poorly written game o.0 DRM choice has nothing to do with the quality of code?

Personally I loved that game. Had it on the PS3, graphics were good! I still get chills run through my spine everytime I jump off one of those big towers (they nailed that falling/jumping sensation/experience perfectly with FOV transition imo).

Reply 67 of 74, by Jade Falcon

Posted on 2017-10-07, 15:35

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Sorry sould have said I was referring to the pc version of assassins creed 2. The DMR made it very hard to play.
If you lost internet connection the game crashed. If the games severs were down, the game crashed, if you dropped a network packet to the games sever the game could crash. Not to mention some people had there coppys locked/banned because of BS.
I know all but the last are true as i bought the game, but i gave up on it, the game just loved to crash do to it dmr. I believe they removed it in a patch but im not sure, i gave up on the game early on.

Kotor2 is another AAA game that was poorly coded for the pc, it was so bad that to this day patches are still being made by both the games devs and fans, luckily its now a rather nice game to play.

Reply 68 of 74, by vladstamate

Posted on 2017-10-07, 18:02

vladstamate Offline

Rank Oldbie

Rank: Oldbie
Posts: 967
Joined: 2015-08-23, 01:43

spiroyster wrote:

Why is Assasins Creed a poorly written game o.0 DRM choice has nothing to do with the quality of code?

Personally I loved that game. Had it on the PS3, graphics were good! I still get chills run through my spine everytime I jump off one of those big towers (they nailed that falling/jumping sensation/experience perfectly with FOV transition imo).

I had access to their early engine code, as I was helping them (at Sony) on the graphics side for PS3, to improve it. I remember it had a A LOT of overdraw, especially once you were inside buildings. We did a lot of GPU performance analysis for that engine and while it was not terrible, it was not the best either.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 69 of 74, by dexvx

Posted on 2017-10-26, 23:48

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Scali wrote:

Am I the only one who reads this as dexvx himself wanting to change SM allocations on the fly (so that would have to be at API level by default)? As opposed to having the driver do it?
Beacuse the way he uses phrasing like "I can change..." and "how am I supposed to know what to change it to"... and "am I supposed to..." makes it pretty obvious, to me at least, that he thinks that he (as a developer) has to control the SM allocations, instead of understanding that the driver does it for him (seeing as it's a hardware-specific thing, which is abstracted by the DX12 API).
Which would mean that he is backpedaling hard... Not to mention the rest of his reply being nothing more than personal attacks at this point. Sore loser.

Holy shit you're the sore loser. You have a huge issue of confirmation bias. I will just lay this out to you very simply. Currently, Nvidia has their sh1t implementation of Async Compute disabled on Maxwell. Do you acknowledge this?

Scali wrote:

Oh, and there is no async compute flag in DX12. That's just more lies from Oxide/AMD. You can create multiple queues, and submit work to each of these queues with the DX12 API. How this is executed is up to the driver. You don't know if it will run it in serial or parallel or whatever (much like on CPUs... when you create a thread, you don't normally know on which core(s) it executes or when/how).

The game code path for pipelined graphics/compute and async graphics/compute is totally different. Obviously you think it's a brilliant idea to run arbitrary code on arbitrary hardware like it is the 90's. Or better yet, let's have the end user application have a table of all the hardware features!

BTW, stop talking about CPU scheduling. Any application that relies on CPU scaling or performance (you know, like games?) controls it quite aggressively. Another simple example is the data center; 1% cpu resource saved (due to better scheduling) is millions of $ in either power consumption (and cooling) costs as well as extra available capacity. Then again, if you're creating a bunch of 'hello world' processes (or coding PubG), I suppose it won't matter.

That's what you get when you take a knife to a gun fight.

Except your gun isn't loaded.

Btw, here's the latest AAA game launched that's native Vulkan/DX12 based (no OpenGL/DX11 render path).

I see your claim of RX480/580 equaling GTX 970 to be holding quite well.

Reply 70 of 74, by Scali

Posted on 2017-10-27, 09:32

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

dexvx wrote:
The game code path for pipelined graphics/compute and async graphics/compute is totally different. Obviously you think it's a brilliant idea to run arbitrary code on arbitrary hardware like it is the 90's.

Apparently you don't understand one thing of what I'm saying.
The fact that all software so far has only one async path is the entire problem!
When async compute is enabled, they actually *do* run 'arbitrary' code on 'arbitrary' hardware. That is, they run async code that was optimized for some AMD GPU (since all titles using async code are ported from consoles and/or developed with AMD Gaming Evolved sponsoring).

Time Spy also runs an 'arbitrary' path, but at least with Time Spy, the developers tried to balance the code so it would perform well on both AMD and NV hardware (which makes sense for a benchmark... there's no point in doing a benchmark if you aren't measuring the same code on all hardware... Because what would you be comparing?).
Best practice however is to include separate optimized codepaths for at least each major GPU architecture.

dexvx wrote:
BTW, stop talking about CPU scheduling. Any application that relies on CPU scaling or performance (you know, like games?) controls it quite aggressively. Another simple example is the data center; 1% cpu resource saved (due to better scheduling) is millions of $ in either power consumption (and cooling) costs as well as extra available capacity. Then again, if you're creating a bunch of 'hello world' processes (or coding PubG), I suppose it won't matter.

Again, you have no idea what I'm talking about. I am drawing a parallel between GPU async compute and CPU scheduling, since both are basically a form of multithreading applied to a set of execution cores.
So I have no idea what point you're trying to make, other than clearly showing you don't understand the part you responded to.
It's not so much about the scheduling itself as it is about choosing the proper algorithms and sub-tasks to be able to break up the work in the most efficient way for a particular hardware setup (the point being that you may be able to create 100 threads, but that does no in way guarantee that these 100 threads are all executed in parallel on every system that you run the same code on).

My analogy then would be eg to take some code that parallelizes a task by using 8 integer and 8 float threads, because it was optimized for a system that had 8 integer cores and 8 float cores. Then running this on a single-core system. It would be far from optimal. The extra threads just add additional overhead, and the work breakdown makes no sense for the target hardware, because it doesn't have separate integer and float cores. It's clearly not a case of 'scaling', since scaling assumes that you are using the same basic hardware building blocks, you are just scaling the number of building blocks up or down. This is a different architecture altogether, so assumptions made for other building blocks do not hold.
Yet this is close to the situation with async compute. Code is tuned so that one task can run in the bubbles of another task. But these bubbles are specific to the GPU they optimized for. A different architecture will have different bubbles, and therefore will require a different work breakdown for optimum performance.
If not, the code may actually run slower than a serialized approach (an issue that has been clearly pointed out in various presentations on DX12). It's quite simple really, if you understand the technology.

dexvx wrote:
Except your gun isn't loaded.

See above, it's clearly you with a knife.
Telltale signs are also that you get totally hung up on that 970 thing (read back what I said, I never made any such claim), and still on about async compute on Maxwell. *You* made claims there, that it's 'software', and that Pascal is 'pseudo-hardware'. You have yet to substantiate these claims, but instead you shift the burden of proof on me.
I just pointed out that quite obviously, Maxwell's implementation of async compute requires hardware support (being able to subdivide the GPU into compute and graphics clusters, and running both types of tasks at the same time). One could say that you already lost that argument, even though you never started. You can try to argue that it's 'sh1t', or that it is disabled by default, but neither prove your claim that it is 'software'. It's just a straw man.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 71 of 74, by Jade Falcon

Posted on 2017-10-27, 17:22

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Ever watch mythbusters? They did an episode on bringing a knife to a gun fight. The knife almost always won when the distance was less than 6 ft apart. This sound like the case here.

Reply 72 of 74, by swaaye

Posted on 2017-10-27, 19:14

swaaye Offline

Rank l33t++

Rank: l33t++
Posts: 8158
Joined: 2002-07-22, 21:24
Location: WI, USA

It looks like Wolf 2 is another new game that benefits a lot from >4GB video RAM. Look at the 1060 and RX 470/480.

Reply 73 of 74, by SPBHM

Posted on 2017-10-28, 04:46

SPBHM Offline

Rank Oldbie

Rank: Oldbie
Posts: 854
Joined: 2012-10-26, 15:59
Location: Brazil

swaaye wrote:
It looks like Wolf 2 is another new game that benefits a lot from >4GB video RAM. Look at the 1060 and RX 470/480.

with ultra settings, but it's often the case that the consoles version run lower quality textures than the max PC settings,
also that the PC max settings are inefficient, when lowering things from "ultra" to "high" makes the game far more playable and the visual difference is difficult to notice.

Reply 74 of 74, by Scali

Posted on 2017-10-28, 11:29

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

swaaye wrote:
It looks like Wolf 2 is another new game that benefits a lot from >4GB video RAM. Look at the 1060 and RX 470/480.

Also, it runs on ID Tech 6, so I assume its Vulkan-path has AMD-specific optimizations like DOOM 2016 has, making it invalid as an apples-to-apples comparison.
Which brings me back to my earlier point of cherry-picking:

Depends on what you're interested in. Is it:
1) The performance of video cards (and their drivers) in actual games
2) The performance and capabilities of the actual hardware (and their drivers)

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Main menu

Common searches

Topic actions

Reply 60 of 74, by dexvx

Reply 61 of 74, by Scali

Reply 62 of 74, by dexvx

Reply 63 of 74, by Scali

Reply 64 of 74, by Jade Falcon

Reply 65 of 74, by Jade Falcon

Reply 66 of 74, by spiroyster

Reply 67 of 74, by Jade Falcon

Reply 68 of 74, by vladstamate

Reply 69 of 74, by dexvx

Reply 70 of 74, by Scali

Reply 71 of 74, by Jade Falcon

Reply 72 of 74, by swaaye

Reply 73 of 74, by SPBHM

Reply 74 of 74, by Scali