dionb wrote:and due to the assumption that offloading CPU always improves throughput too (it does not, at least not automatically)
Yes, that is an excellent point in general.
I can name some examples where 'offloading' seemed like a good idea at some point, but was surpassed by CPU performance eventually.
One example is DMA transfers: on the original PC/XT machines, the DMA controller was great for transferring HDD data directly between memory and the HDD controller.
The Intel 8237 DMA controller was never upgraded however, and was only suited to operate at about 4-5 MHz max. So when the AT was introduced at 6 MHz, they had to halve the clockspeed on the DMA controller to 3 MHz, to run it in-spec. Even on the later 8 MHz AT, it only ran at 4 MHz, not the full 4.77 MHz as on a PC/XT.
Since HDDs had actually become faster at the same time, it became a better option to just use the CPU for transfers, and skip the DMA controller.
Another interesting case is with graphics acceleration. If you have early video cards with linedrawing and such, it may be interesting to use that with a slow CPU of that era. But at some point, CPUs have become so much faster that it actually takes more time to set up the video card to draw a line, wait for it to complete, set up the next line etc, than to just draw the whole stuff with the CPU.
So with ISA network cards it may well be that some cards have some kind of 'acceleration' that works faster than doing it on the CPU of that time (perhaps a 286 or 386), but by the time you strap a super-fast Pentium III onto it, the CPU can do it faster than the primitive acceleration circuit, so you won't be able to measure any gains, even if they were there at the time.
So it's very important to understand what the different cards do exactly, and what kind of systems they would have been aimed at.
In my case, the SMC apparently did something that was more efficient on a 486 than the 3COM was. But I would not expect that to translate to a high-end CPU.
Perhaps it's as simple as just the size of the onboard buffer. I mean, even the simplest of network cards would require some kind of buffering to send or receive data, and generate an interrupt once an operation is complete, so that the CPU can service it.
It could be as simple as the SMC having a larger onboard buffer than the 3COM, and thereby requiring less CPU interrupts per second to transfer the same amount of data.