How long until we have socketed GPUs?

Reply 20 of 27, by Jade Falcon

Posted on 2017-08-07, 15:14

Jade Falcon Offline

Rank BANNED

Rank: BANNED
Posts: 3216
Joined: 2016-05-08, 19:23
Location: Nar Shaddaa.

Your still dealing with a PCI card. if you put a pci card on a PCIe slot via an pci to PCIe bridge it does not suddenly run at PCIe speeds.
Wile it would help, but the card it self just cant handle the bandwidth needed over anything other then a sli bridge.

Reply 21 of 27, by Malik

Posted on 2017-08-07, 15:26

Malik Offline

Rank l33t

Rank: l33t
Posts: 2582
Joined: 2004-04-10, 14:37
Location: Ipoh, Malaysia

Uh...the whole video card IS a slot-able, add-in, GPU processor... 🙄

It's just..... bigger than a CPU. 🤣

Save the time and discussion and let's continue playing, shall we?... or go out for a beer... 😀

SB Dos Drivers

Reply 22 of 27, by dexvx

Posted on 2017-08-07, 16:29

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

xplus93 wrote:

Of course I know how expensive it is. Anything labeled xeon is expensive until you try to sell it used after decommissioning it. How are you going to say the die size or core architecture has any impact? Do you not remember socket 775? That's not even a good comparison for various reasons, but not the ones you mentioned. Socketed would mean more possibilities for cooling actually. And I'm starting to get where you're confusion comes in. I'm not saying put a socket on a gpu card and just making it an expansion planar, although I already explained that. Yeah, having the memory separate is a bit pointless and only a minor brainstorming-only addition

The reason used Xeons are (relatively) worthless used is because for their target audience, their value is decreased considerably after 5 years. The E5-2690 (v1 SandyBridge 8C/16T/2.9GHz) is about $150 on eBay. The Xeon Platinum 8180 (Skylake-SP 28C/56T/2.5GHz) is $10000. That's a 66:1 ratio. Assuming you can put together 40 dual-socket (80 E5-2690 CPU's) servers against a single dual socket Xeon Platnuim 8180 server, you would probably be ahead computationally with the older E5-2690's.

Great Value? Nope. The data center world also cares about cooling (typically 1-1.5 watt usage = 1 watt cooling requirement) and rackspace/density. Enthusiasts and smaller businesses are (relatively) insensitive to that, but their volume is a drop in the bucket (well maybe 1/10 the bucket). I remember someone on HardOCP bought a Quad Xeon X7560 (4x 8C/16T 2.27GHz Beckton/Nehalem) thinking it was a great deal. People did the math for him, and with his usage model, buying a new single Xeon E5-2695 v4 or something, would allow him to break even on power in like a year.

The argument is that Socket 775/1366/115x were so small because those Xeons were designed against a desktop core. LGA-3647 is the first socket designed NOT to be only for CPU's. Thus the massive increase in pin count. A lot of the pins are 'reserved'. The reason Epyc is huge is because of the trace requirements for the inter-CPU communications (that is the drawback of a non-monolithic die).

Socketed is worse for cooling. You lose whatever height that socket was from your heatsink. Not a big deal for cheapo or mid-range GPU's. Bigger problems with the high end 200W+ designs. Even in the server world, 1U (40mm high) servers for 200W Xeon Platinums are a minor problem that require more customized heatsink designs (which adds to cost).

xplus93 wrote:

PCI-E is certainly modular, but really, how close is it to the CPU and main memory? Compare that to the relationship between CPUs and FPUs. We haven't needed anything like that until now where more and more people need specialized data processing. What i'm saying is that we're moving towards the need for modularity in that context. Intel certainly thinks so.

PCI-E from SandyBridge onwards is on the CPU die. It should have the similar bandwidth (given enough lanes), but slightly worse latency. And judging from the recent Intel Optane demo, HPC needs more memory than it needs latency. Fitting your entire data set into Optane DRAM (at 10x the latency of DRAM) is far better than fitting half your data set into standard DRAM and paging from PCIe SSD's.

Intel's strategy for LGA-3647 is a small hedge against GPGPU. But the bigger hedge is against FPGA's and potentially ASIC's (thus the Altera acquisition). And that is divided into 2 main components (IMO):

The datacenter is quickly devolving from general purpose to (mostly) fixed functions. Simple example is Google/MS/Amazon dedicating entire data centers for search algorithms or AI. Why use complex general purpose CPU's when you can use simpler FPGA's? Benefit of FPGA is that any algorithm change and you can reprogram the FPGA (unlike an ASIC).

The other is the network processing aspect, an FPGA NIC is several orders of magnitude faster than the CPU at a fraction of power consumption. A network processing is a massive bottleneck for the Tier1 cloud providers (e.g. Google/MS/Amazon/FB). It's also of huge importance to Telcos. Because Comcast and Verizon would like nothing more than to use hardware based deep packet inspection to make your basic service seem slow but magically get improved with their premium packages. That cynicism aside, it should greatly speed up packet sorting and make everything cheaper in countries where ISP's don't control the government.

Reply 23 of 27, by Kreshna Aryaguna Nurzaman

Posted on 2017-08-11, 10:04

Kreshna Aryaguna Nurzaman Offline

Rank l33t

Rank: l33t
Posts: 3363
Joined: 2006-11-15, 06:44
Location: Indonesia

Is closeness to CPU and system memory that important for GPU? Intel tried that with Intel 740, and it turns out the PCI version --which uses its own RAM-- is faster than the AGP version --which uses system RAM. It's kinda ironic, because Intel 740 is supposed to promote AGP architecture.

Never thought this thread would be that long, but now, for something different.....
Kreshna Aryaguna Nurzaman.

Reply 24 of 27, by Scali

Posted on 2017-08-11, 13:58

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Kreshna Aryaguna Nurzaman wrote:
Is closeness to CPU and system memory that important for GPU? Intel tried that with Intel 740, and it turns out the PCI version --which uses its own RAM-- is faster than the AGP version --which uses system RAM. It's kinda ironic, because Intel 740 is supposed to promote AGP architecture.

Well, AGP has the 'AGP aperture' that makes using system RAM possible. Perhaps the idea was cost-reduction rather than absolute performance, so they wanted it as a showcase for AGP's ability to run without dedicated video RAM?
Or perhaps they thought that AGP would be good enough that it was faster anyway, but it backfired 😀

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 25 of 27, by Kreshna Aryaguna Nurzaman

Posted on 2017-08-11, 14:12

Kreshna Aryaguna Nurzaman Offline

Rank l33t

Rank: l33t
Posts: 3363
Joined: 2006-11-15, 06:44
Location: Indonesia

Scali wrote:
Kreshna Aryaguna Nurzaman wrote:
Is closeness to CPU and system memory that important for GPU? Intel tried that with Intel 740, and it turns out the PCI version --which uses its own RAM-- is faster than the AGP version --which uses system RAM. It's kinda ironic, because Intel 740 is supposed to promote AGP architecture.

Well, AGP has the 'AGP aperture' that makes using system RAM possible. Perhaps the idea was cost-reduction rather than absolute performance, so they wanted it as a showcase for AGP's ability to run without dedicated video RAM?
Or perhaps they thought that AGP would be good enough that it was faster anyway, but it backfired 😀

Intel seems to promote AGP and Intel 740 those days, so it's probably the "backfired" thing, isn't it?

Never thought this thread would be that long, but now, for something different.....
Kreshna Aryaguna Nurzaman.

Reply 26 of 27, by dexvx

Posted on 2017-08-11, 22:51

dexvx Offline

Rank Oldbie

Rank: Oldbie
Posts: 725
Joined: 2017-03-07, 03:32
Location: USA

Kreshna Aryaguna Nurzaman wrote:
Is closeness to CPU and system memory that important for GPU?

You must've missed what I said here:

dexvx wrote:

Are you just concerned that PCI-E length is a problem? We have external PCI-E connectors and you can literally run a GPU from 5m away (at gen3 x8 link) and you wouldn't notice the difference in gaming FPS or 'feel'

Or you can use external Thunderbolt GPU's that are the new rage these days. Usually those cables are 1-2m long. When normalized to the fact that they max at PCI-E gen3 x4 link, they perform nearly identical as if the GPU were on the local motherboard.

Kreshna Aryaguna Nurzaman wrote:

Intel tried that with Intel 740, and it turns out the PCI version --which uses its own RAM-- is faster than the AGP version --which uses system RAM. It's kinda ironic, because Intel 740 is supposed to promote AGP architecture.

In what planet is the PCI i740 faster than the AGP i740 on a consistent basis (assuming both have the same addressable VRAM)? I have a Real StarFighter 12MB PCI and a Diamond G460 8GB AGP. The G460 is faster. And this guy pretty much confirms it.

http://vintage3d.org/i740.php#sthash.eYmpsAfS.dpbs

Vintage3d wrote:

Starfighter PCI is 15% slower than its AGP counterpart.

Reply 27 of 27, by xplus93

Posted on 2018-10-14, 05:16

xplus93 Offline

Rank Oldbie

Rank: Oldbie
Posts: 561
Joined: 2016-12-25, 22:56

https://youtu.be/gwQ8mBaDTJg

Main menu

Common searches