VOGONS


First post, by Sphere478

User metadata
Rank l33t++
Rank
l33t++

Allow me to explain,

In my testing it seems that pcie nvme at least on some older hardware basically runs in some kind of cursed PIO mode, where the cpu gets very loaded down.

I noted that even on a 775 core quad based system the cpu would be rather heavy with read write tasks where as Sata based would mostly free the cpu.

Is this something that even if you can get nvme pcie working on core duo/quad and earlier that sata is still going to be faster? Does pcie nvme only seem to make sense on more modern cpus?

Btw, I once got nvme pcie working on a k6-3+ setup, and performance could be described as kbps.

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 1 of 20, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-08, 13:28:
Allow me to explain, […]
Show full quote

Allow me to explain,

In my testing it seems that pcie nvme at least on some older hardware basically runs in some kind of cursed PIO mode, where the cpu gets very loaded down.

I noted that even on a 775 core quad based system the cpu would be rather heavy with read write tasks where as Sata based would mostly free the cpu.

Is this something that even if you can get nvme pcie working on core duo/quad and earlier that sata is still going to be faster? Does pcie nvme only seem to make sense on more modern cpus?

Btw, I once got nvme pcie working on a k6-3+ setup, and performance could be described as kbps.

I imagine you used a PCIe to PCI converter, yes? Did you somehow get your K6-III to support AHCI ?

Reply 2 of 20, by Sphere478

User metadata
Rank l33t++
Rank
l33t++
douglar wrote on 2024-05-08, 13:50:
Sphere478 wrote on 2024-05-08, 13:28:
Allow me to explain, […]
Show full quote

Allow me to explain,

In my testing it seems that pcie nvme at least on some older hardware basically runs in some kind of cursed PIO mode, where the cpu gets very loaded down.

I noted that even on a 775 core quad based system the cpu would be rather heavy with read write tasks where as Sata based would mostly free the cpu.

Is this something that even if you can get nvme pcie working on core duo/quad and earlier that sata is still going to be faster? Does pcie nvme only seem to make sense on more modern cpus?

Btw, I once got nvme pcie working on a k6-3+ setup, and performance could be described as kbps.

I imagine you used a PCIe to PCI converter, yes? Did you somehow get your K6-III to support AHCI ?

M.2 NVMe working on old school regular pci

Linux to the rescue.

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 3 of 20, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-08, 14:05:

Thanks for the link. I remember the thread now. Fun stuff.

I imagine that there is a significant performance inflection point when you stop operating in AHCI "PATA emulation mode" and start operating in "AHCI native mode". That probably makes the biggest difference.

Now if you had a Sata device, AHCI "PATA emulation mode" should be able to emulate UDMA modes for legacy OS stuff, but I think that when you use an NVME device, AHCI only emulates PIO in "Pata mode". Not such a big deal if you are just doing it with a fast CPU until your OS can load a real driver, but I suspect that your legacy systems are suffering under the weight of both using PIO and emulating PIO at the same time on a slow CPU without loading a native mode driver.

Reply 4 of 20, by Sphere478

User metadata
Rank l33t++
Rank
l33t++

I’m asking about the crossover point for pcie m.2

Assuming that sata is working properly as the competition in the hypothetical system

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 5 of 20, by weedeewee

User metadata
Rank l33t
Rank
l33t

Do you mean you are asking at which hardware point, ie from which hardware specs on is an nvme drive faster ?

I've had an nvme drive attached to a thunderbolt3 controller connected to an X58 mainboard attain 2Megabyte/s in linux. was even hot swappable.
Pretty sure it's also possible in windows if the bios can be modified to report some of the things that windows requires to see before enabling some features.

Right to repair is fundamental. You own it, you're allowed to fix it.
How To Ask Questions The Smart Way
Do not ask Why !
https://www.vogonswiki.com/index.php/Serial_port

Reply 6 of 20, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-08, 15:43:

I’m asking about the crossover point for pcie m.2

Assuming that sata is working properly as the competition in the hypothetical system

Let me try again.

AHCI storage works in two modes, "Pata Emulation" mode and "Standard" mode.

While Sata devices have hardware that allows them to do UDMA transfers while in AHCI "Pata Emulation" mode, my understanding is that NVME devices don't and only manage to do PIO via software emulation while in AHCI "Pata Emulation". As long as your system is both emulating PIO via software and using PIO to pull data, performance is going to be poor compared to a Sata device doing UDMA transfers.

In order to get good performance from an NVME, you need a device driver that supports the devices in AHCI "Standard" mode. So it's not so much a CPU threshold, as it is lack of proper drivers.

Please let me know if I am incorrect about this.

Reply 7 of 20, by Sphere478

User metadata
Rank l33t++
Rank
l33t++

Sounds like you know more about it than I, I was reading that trying to learn haha.

So there may be a way to get that k6 running nvme faster? 🤔

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 8 of 20, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie

Well, not long ago I installed Windows 7 on an NVME on an M5A99X with a Phenom II 1100T and 16GB; I don't think the processor could be the limit, rather the limit should have been the chipset, PCIE 2.0.
It was an NVME PCIE 3.0 4x and only reached 1600 mbs.
Needless to say, Windows 7 on an NVME looks like a real-time operating system, every click is an instant response, but it is clear that the interface limits the NVME.
To Install Windows 7 on an NVME you need the driver, just as XP needed it to use SATA controllers in RAID/AHCI mode, etc.
The first thing is to know which driver the NVME uses, Micron, Phison, Marvel, etc. and search for the driver.
It would be curious to find a driver for those controllers and Win9x, for XP I think there is something but for 9x, no idea.
On this page they have a good collection of drivers for NVME controllers.
https://winraid.level1techs.com/t/recommended … e-drivers/28310

Reply 9 of 20, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Just having a system SSD not chained to the AMD SATA controllers from the Phenom times is already a substantial win.

I'm not sure I understand how NVMe is going to help a K6 with only 133MB/s PCI though. A NVMe drive might rival its L1 cache bandwidth 🤣

Reply 11 of 20, by darry

User metadata
Rank l33t++
Rank
l33t++

It is worth mentioning that on a PCIE 1.x system, 4 lanes of PCIE gives a max of about 1 Gigabyte per second. On such a system, a garden variety 2-lane SATA 3 controller will get about half that, but will be easier to boot from.

On anything pre-PCIE, the difference beween SATA and NVME will be drowned out by bandwidth limitations, assuming everything even works ideally with the NVME drive through a PCIE to PCI bridge.

Reply 12 of 20, by darry

User metadata
Rank l33t++
Rank
l33t++
weedeewee wrote on 2024-05-08, 16:36:

Do you mean you are asking at which hardware point, ie from which hardware specs on is an nvme drive faster ?

I've had an nvme drive attached to a thunderbolt3 controller connected to an X58 mainboard attain 2Megabyte/s in linux. was even hot swappable.
Pretty sure it's also possible in windows if the bios can be modified to report some of the things that windows requires to see before enabling some features.

I just realized that I did actually run some i formal benchmarks on an x58 platform with 2 Samsung 970 EVO Plus drives in RAID 0. I can't recall the CPU load (machine 2 sockets for a total of 12 cores/24 threads), but it seemed reasonable at the time. The throughput was essentially about what one would expect, AFAICR, with 4 lanes of PCIE 2.0 being the bottleneck for each drive. FWIW, the both drives were in the same NUMA zone, also AFAICR. I was using another NVME drive as a boot device using an option ROM on a PCIE adapted 3Com 3C905C NIC. NVME option ROM, a question for the ROM experts.

I have never used an NVME drive in anything older than that.

Reply 13 of 20, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-08, 19:52:

Sounds like you know more about it than I, I was reading that trying to learn haha.

So there may be a way to get that k6 running nvme faster? 🤔

Maybe. There are some NVME drivers for Windows XP. There's a chance that something could work on a PCI bus, even one as soft and tender as a super 7 PCI bus.

Just so I understand, sounds like you are using Grub & libata to make the NVME look like a Pata device, yes?

Reply 14 of 20, by Sphere478

User metadata
Rank l33t++
Rank
l33t++
douglar wrote on 2024-05-09, 12:57:
Sphere478 wrote on 2024-05-08, 19:52:

Sounds like you know more about it than I, I was reading that trying to learn haha.

So there may be a way to get that k6 running nvme faster? 🤔

Maybe. There are some NVME drivers for Windows XP. There's a chance that something could work on a PCI bus, even one as soft and tender as a super 7 PCI bus.

Just so I understand, sounds like you are using Grub & libata to make the NVME look like a Pata device, yes?

It was years ago, the debian jessie installer was able to see it, I started install and it began installing, the plan was to put the boot loader on another drive that the bios/flop could boot from first. Then boot the nvme, but the nvme install ran for like a day and still wasn’t done so it seemed pointless and I ended the installer.

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 15 of 20, by Matchstick

User metadata
Rank Newbie
Rank
Newbie

It will only be as fast as the slowest connection. In which case would be the PCI bus being the bottleneck.

PCI has two versions: 32-bit and 64-bit, running at 33 MHz and 66 MHz, respectively. The potential bandwidth is 133 MB/s in 32 bits at 33 MHz (the standard configuration), 266 MB/s in 32 bits at 66 MHz or in 64 bits at 33 MHz, and 533 MB/s in 64 bits at 66 MHz.

Under Windows 98/XP the max would then be 266 MB/s in 32 bits at 66 MHz (I am not sure but, but I believe SS7 PCI maxes out at 66mhz).
Under Windows 2k though being 64bit could potentially do 533 MB/s in 64 bits at 66 MHz.

Reply 16 of 20, by douglar

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-09, 13:36:

It was years ago, the debian jessie installer was able to see it, I started install and it began installing, the plan was to put the boot loader on another drive that the bios/flop could boot from first. Then boot the nvme, but the nvme install ran for like a day and still wasn’t done so it seemed pointless and I ended the installer.

Maybe you could feed the driver in from a floppy during the install?

Reply 17 of 20, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Keep in mind that for optimal NVME performance you need to connect it to a north bridge.

It is worth mentioning that on a PCIE 1.x system, 4 lanes of PCIE gives a max of about 1 Gigabyte per second.

4 lanes PCIe 1.x can give 800 Megabytes per second tops.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 18 of 20, by Ozzuneoj

User metadata
Rank l33t
Rank
l33t

I'm not trying to derail the thread or diminish the value what is being discussed, but I think as long as SATA drives exist there won't be much of a reason to go with anything else for retro PCs. Even on modern systems there is hardly any perceptible difference between a decent SATA drive and an NVMe in average consumer workloads, despite the big difference in benchmark numbers.

And it isn't because I use crappy drives... I have always been a stickler for good drive performance, beyond what is really necessary, because the price gap between mid\low end drives and upper-mid\high end drives is far smaller than in other devices (CPU, GPU, motherboard). So I have installed Samsung 840 Pro, 850 EVO, Crucial MX500, SKHynix P31 Gold and Soldigm P44 Pro drives in my main PC over the past 10 years or so, as well as many hard drives for bulk storage. I have all of the storage in my main PC currently on NVMe, simply because the drives got cheap and I thought it'd be "neat" to have a PC with no SATA drives.

... and yet, I fully admit that when I get on my 12 year old Asus Q500A laptop with an Ivy Bridge Core i5 3210M (recently upgraded to an i7 Quad core because reasons) and an Adata SU800 SATA SSD, it runs totally fine for browsing the web (with modern bloated browsers in Windows 10) and isn't that much slower than more modern PCs. There is certainly a difference as the workload gets heavier, but if I was dealing with all of the bottlenecks of a 20 year old PC, I don't think the difference could ever be noticeable between a SATA and an NVMe drive in common retro workloads.

Now for some blitting from the back buffer.

Reply 19 of 20, by darry

User metadata
Rank l33t++
Rank
l33t++
The Serpent Rider wrote on 2024-05-09, 18:51:

Keep in mind that for optimal NVME performance you need to connect it to a north bridge.

It is worth mentioning that on a PCIE 1.x system, 4 lanes of PCIE gives a max of about 1 Gigabyte per second.

4 lanes PCIe 1.x can give 800 Megabytes per second tops.

I was quoting the most commonly mentioned theoretical bandwidth figures (from which low level protocol/signaling overhead has been subtracted). But yeah, there's additional overhead, beyond the aforementioned and accounted for protocol overhead, due to things like payload size.