VOGONS


Fast Ethernet on ISA

Topic actions

Reply 20 of 41, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

Makes sense.

Impact of FD/HD is negligible given you are mostly transferring data in one way only.

Without bus master DMA, it will likely only use native DMA which will likely slow down performance roughly to the levels you see it (upper limit is about 1.6MB/S on systems with 16bit ISA). In fact it could be slightly faster if you don't set DMA channel and use a simple PIO (not sure if the Linux driver supports that).

Blog|NexGen 586|S4

Reply 21 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie
mpe wrote:

Impact of FD/HD is negligible given you are mostly transferring data in one way only.

With TCP (and any protocol on top of TCP) the impact is noticable - ACK packets need some bandwidth in the opposite direction.

With 100 Mbps Fast Ethernet cards, and fast enough rest of the system, FTP transfer should be 11+ MB/s
With 10 Mbps Ethernet cards, FTP transfer should be 1.1+ MB/s, but... it's usually something like 900..1000 KB/s

Why?
Because 10 Mbps cards - even those that support FD, eg. 3C509B - lack NWay, so they can't auto-negotiate FD with the switch.
So, to get FD working, it's necessary to manually enable it on both the card and the switch - and the latter requires a managed switch.

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.

Reply 22 of 41, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

TCP ACK is like 40 bytes per every 1500 bytes of data in the worst case. Furthermore the TCP window size will grow up to 65,535 bytes so there will be very few ACKs unless there are transmission errors.

Since you likely use a cross-over cable or a switch there are will be no ethernet collisions. Thus the FD/HD should hardly change things especially as your bottleneck is in the host bus speed and not the link speed. It should only improve things if you have a mixed inbound/outbound traffic in a much faster system.

Blog|NexGen 586|S4

Reply 23 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie
mpe wrote:

It should only improve things if you have a mixed inbound/outbound traffic in a much faster system.

TCP is mixed inbound/outbound, FTP - probably even more so.

Just did a few quick "wget -O /dev/null ftp://..." tests, this time with a Davicom 9102, ie. PCI, 100 Mbps, supporting NWay - so by default it auto-negotiates 100baseTx-FD:

10.60 MB/s

mii-tool -A 100baseTx-HD eth0

8.97 MB/s

mii-tool -A 10baseT-FD eth0

1.12 MB/s

mii-tool -A 10baseT-HD eth0

781 KB/s

So, the difference is far from negligible, and in case of 10baseT-FD vs. HD it's even greater than I expected, must be something wrong with this card/driver, as 3C509B in 10baseT-HD achieved 891 KB/s

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.

Reply 24 of 41, by mpe

User metadata
Rank Oldbie
Rank
Oldbie

I wouldn't expect such difference.

Could be a result of TX/RX errors /buffer overruns which then cause retransmissions and avoid growing of TCP window size.

Blog|NexGen 586|S4

Reply 25 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie

Back to testing that card, this time with mTCP...

Hardware - same as in the original post.

Software:
LSL 2.20 (960401)
BUFFERS 16 1600
3C515 1.00 (960708) ODI driver
ODIPKT 3.1
mTCP Mar 31 2023 FTP client
MTU 1500
download to NUL

10 Mbps HD:
996
995
997

10 Mbps FD:
700
701
701

100 Mbps HD:
1718
1718
1695

100 Mbps FD:
1675
1686
1689

Obviously NWay auto negotiation doesn't work very well on that card - I can't check what exactly gets negotiated (unmanaged switch), but Full Duplex being slower than Half Duplex clearly means mismatch.

Also, 3C515CFG allows to select 10 HD, 10 FD, 100 HD - but not 100 FD.
Auto Select sometimes makes it to link as 100 FD, but sometimes as 10 HD.
It's confusing - documentation doesn't clearly state that 100 FD is supported, HELP/README.TXT even mentions: "NOTE: Full duplex operation is only supported when running in 10BASE-T mode."
Linux and Windows NT drivers, however, allow to select 100 FD manually.

Note to self to look at it closer some time, with a managed switch...

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.

Reply 26 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie

Now, let's see how the number of BUFFERS affects the download speed...

The card is manually set to what seems the optimal setting - 100 HD.

1 - seems to hang after transferring 8192 Bytes...
2 - seems to hang after transferring 40960 Bytes...
4 - 1703 KB/s
8 - 1703 KB/s
16 - 1721 KB/s
32 - 1715 KB/s

The necessary number depends of the CPU speed - I recall that 386DX-40 needed 16 for decent performance, but for this Celeron-266 just 4 is already good enough.

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.

Reply 27 of 41, by Sphere478

User metadata
Rank l33t++
Rank
l33t++
cyclone3d wrote on 2019-10-30, 19:48:

Don't forget that ISA on newer systems (past 486) is slower. This can be easily shown by using an ISA video card on a newer than 486 system that has ISA slots.

What happened to cause this?

Sphere's PCB projects.
-
Sphere’s socket 5/7 cpu collection.
-
SUCCESSFUL K6-2+ to K6-3+ Full Cache Enable Mod
-
Tyan S1564S to S1564D single to dual processor conversion (also s1563 and s1562)

Reply 28 of 41, by kingcake

User metadata
Rank Oldbie
Rank
Oldbie
Sphere478 wrote on 2024-05-26, 07:05:
cyclone3d wrote on 2019-10-30, 19:48:

Don't forget that ISA on newer systems (past 486) is slower. This can be easily shown by using an ISA video card on a newer than 486 system that has ISA slots.

What happened to cause this?

Assuming they're referring to ISA being demoted to a 2nd class citizen attached to the southbridge.

Reply 29 of 41, by Tiido

User metadata
Rank l33t
Rank
l33t

Yeah, ISA only gets a shot if no PCI device claims the cycle and it will take several clock cycles before the timeout passes the access to the ISA bus.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 30 of 41, by mkarcher

User metadata
Rank l33t
Rank
l33t
Tiido wrote on 2024-05-26, 07:37:

Yeah, ISA only gets a shot if no PCI device claims the cycle and it will take several clock cycles before the timeout passes the access to the ISA bus.

That's even older than PCI: The chipset doesn't even know what resources are on VL cards, so any chipset supporting some resources to be on any kind of local bus will need to check for "/LDEV" (or however the line to claim a cycle by a local bus device is called) before taking "too many" actions on the ISA bus. The ISA bus architecture, especially the support for 0WS 16-bit memory cycles, is tightly tied to the 286 bus interface design: The 286 outputs the address of a memory cycle at least 0.5 cycles in advance of the nomimal start of that cycle, even with the previous cycle still active. This enables the MEMCS16 signal to be already valid at the nominal cycle start. This feature has been extended into "address pipelining" on the 386, in which the new address can be obtained even earlier, but it has been completely ditched on the 486. Instead of allowing pipelined address decoding to optimize bus performance, the 486 plays the "burst" card instead. In a cacheline fill, the difference between 2-1-1-1 and 3-1-1-1 is not that big, and you don't expect the addressed device to change inside a single cache line, so decoding is no longer considered a bottleneck. This obviously applies to reads only, but you can cope with writes externally by using a write post buffer that might even implement the classic ISA "advance address on LA17..LA23" protocol.

Fun fact: On the Pentium processor, the address pipelining feature returned.

Reply 31 of 41, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie
Grzyb wrote on 2024-05-26, 03:51:

Full Duplex being slower than Half Duplex clearly means mismatch.

Did you ever run these tests on the 3C509B? I've been forcing FD on my cards but now I wonder if this actually works properly with the dumb switch I have, or it is slowing things down as well. If the card sends a packet while receiving one the switch might consider it a collision and abort, that's the problem right?

Reply 32 of 41, by Tiido

User metadata
Rank l33t
Rank
l33t
mkarcher wrote on 2024-05-26, 11:13:

That's even older than PCI: The chipset doesn't even know what resources are on VL cards, so any chipset supporting some resources to be on any kind of local bus will need to check for "/LDEV" (or however the line to claim a cycle by a local bus device is called) before taking "too many" actions on the ISA bus. The ISA bus architecture, especially the support for 0WS 16-bit memory cycles, is tightly tied to the 286 bus interface design: The 286 outputs the address of a memory cycle at least 0.5 cycles in advance of the nomimal start of that cycle, even with the previous cycle still active. This enables the MEMCS16 signal to be already valid at the nominal cycle start. This feature has been extended into "address pipelining" on the 386, in which the new address can be obtained even earlier, but it has been completely ditched on the 486. Instead of allowing pipelined address decoding to optimize bus performance, the 486 plays the "burst" card instead. In a cacheline fill, the difference between 2-1-1-1 and 3-1-1-1 is not that big, and you don't expect the addressed device to change inside a single cache line, so decoding is no longer considered a bottleneck. This obviously applies to reads only, but you can cope with writes externally by using a write post buffer that might even implement the classic ISA "advance address on LA17..LA23" protocol.

Fun fact: On the Pentium processor, the address pipelining feature returned.

What a mess 🤣

I have not looked very closely at 286 bus, only 386 and 486 and not at all one of Pentium. Address pipelining didn't really make too much sense to me, but now it sort of clicked in context of ISA bus 16bit cycles, they're meant to leverage it.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜

Reply 33 of 41, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
Deunan wrote on 2024-05-26, 16:25:
Grzyb wrote on 2024-05-26, 03:51:

Full Duplex being slower than Half Duplex clearly means mismatch.

Did you ever run these tests on the 3C509B? I've been forcing FD on my cards but now I wonder if this actually works properly with the dumb switch I have, or it is slowing things down as well. If the card sends a packet while receiving one the switch might consider it a collision and abort, that's the problem right?

I've had nothing but bad results when forcing FD on my fleet of 3C509B cards. My understanding is it only works if you have a managed switch which can also be manually set to 10/FD. There is no "auto detect"

Reply 34 of 41, by mdog69

User metadata
Rank Newbie
Rank
Newbie
maxtherabbit wrote on 2024-05-26, 17:27:
Deunan wrote on 2024-05-26, 16:25:
Grzyb wrote on 2024-05-26, 03:51:

Full Duplex being slower than Half Duplex clearly means mismatch.

Did you ever run these tests on the 3C509B? I've been forcing FD on my cards but now I wonder if this actually works properly with the dumb switch I have, or it is slowing things down as well. If the card sends a packet while receiving one the switch might consider it a collision and abort, that's the problem right?

I've had nothing but bad results when forcing FD on my fleet of 3C509B cards. My understanding is it only works if you have a managed switch which can also be manually set to 10/FD. There is no "auto detect"

Matches my experience, as stated before if you want a 3C509 to do FD you need to force both ends (NIC and switch) In my case it was back to back 3C509 (force both NICs)

Reply 35 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie

3C509B - and I think all 10 Mbps NICs - don't support NWay auto negotiation.
The only way to use Full Duplex is to set it manually at *both* ends - and there's no way to do it on an unmanaged switch.
Full Duplex NIC + Half Duplex switch = collisions = dropped frames = slow transfer

3C515, however, claims to support NWay.
But it doesn't work well, at least with my modern switch.
3C515 is obviously an early implementation of Fast Ethernet, and such problems are pretty much expected...

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.

Reply 36 of 41, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

I've decided to run some tests on my cards. Reason being I also have 3C509 non-B that I wanted to benchmark against B model in DOS. So far on 486DLC-40 and DOS/MTCP + packet driver I can't really see much difference between HD and FD modes. In fact if anything the FD mode seems to be a tad bit faster on my cheap, dumb switch (Sitecom LN-113). I'm doing get and put via FTP on ~100MB file 3 times (to make sure I don't catch some glitch) - not sure if that's the best way to test but it is my use case. I do realize I'm also limited by my CF card (and XTIDE) here but I figured if there is a major difference it would show.

When I have some more time I will re-run these tests on K6-400 and also on Linux (though I'm not sure what Linux driver does with the FD setting). On Linux I'll also try NFS performance since that is also my use case.

Reply 37 of 41, by mbbrutman

User metadata
Rank Member
Rank
Member
Deunan wrote on 2024-05-27, 18:55:

I've decided to run some tests on my cards. Reason being I also have 3C509 non-B that I wanted to benchmark against B model in DOS. So far on 486DLC-40 and DOS/MTCP + packet driver I can't really see much difference between HD and FD modes. In fact if anything the FD mode seems to be a tad bit faster on my cheap, dumb switch (Sitecom LN-113). I'm doing get and put via FTP on ~100MB file 3 times (to make sure I don't catch some glitch) - not sure if that's the best way to test but it is my use case. I do realize I'm also limited by my CF card (and XTIDE) here but I figured if there is a major difference it would show.

When I have some more time I will re-run these tests on K6-400 and also on Linux (though I'm not sure what Linux driver does with the FD setting). On Linux I'll also try NFS performance since that is also my use case.

That's why I have the SPDTEST program as a standalone program. On the PC side you run SPDTEST, and you run it against another machine (Linux usually) that has NetCat available. SPDTEST will do all of the work that TCP/IP requires, but there is no disk I/O you measure the actual speed of the machine and network card, not the I/O subsystem.

I usually run my tests 5 times and take the average of the top three.

Reply 38 of 41, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

A synthetic test would probably give me more clear-cut results (I will try something like this with Linux test) but then again if I see no difference in a test that is my typical use case then why would I worry about the FD setting? It's not like I'm trying to squeeze the last bit of the performance here, I just want to avoid problems and slowdowns (that are possibly my own doing).

I had to redo some tests because I've switched the test mobo to slow refresh lately and this turned out to freeze the machine during transfers. Seemed stable otherwise, even on Linux with networking enabled, but I wasn't transfering as much data then.
Anyway, I see no difference at all between FD and HD settings with DOS packet driver. Either it is not capable of FD operation or the whole MTCP stack doesn't ever trigger such situation due to how it's all a single thread with blocking. And/or my switch doesn't care about it as well (but you'd think there would some, even minor, speedups on FD). This is true both for 486DLC 40Mhz and K6-2 400MHz. Frankly the performance with K6 leads me to belive the card is always in HD mode.

I might also do Win9x tests but for that I first need to install Windows on 386DX/486DLC, that will not be anytime soon. Linux tests will also have to wait until I have more free time.

In other news, non-B 3c509 performance is very acceptable in DOS on both CPUs, this is assuming a "quiet" network like mine. Results might be different with coax instead of TP connected to multiple busy machines sending quite a few broadcast packets. But since I won't be using setup like that in foreseeable future I'm not going to worry about it. That card is meant primarily to host XTIDE for my 286 class systems, and to pull some data from FTP every now and then. I will still test it on Linux though, for the fun of it.

Reply 39 of 41, by Grzyb

User metadata
Rank Oldbie
Rank
Oldbie
Deunan wrote on Yesterday, 09:00:

A synthetic test would probably give me more clear-cut results (I will try something like this with Linux test) but then again if I see no difference in a test that is my typical use case then why would I worry about the FD setting? It's not like I'm trying to squeeze the last bit of the performance here, I just want to avoid problems and slowdowns (that are possibly my own doing).

Still, be careful - the slowdown resulting from duplex mismatch may be moderate on a quiet network, but much heavier when the network gets busy.

BTW: 3C5X9CFG allows to set Half/Full Duplex on the 3C509B, but there's no such option on the original 3C509, right?

Nie tylko, jak widzicie, w tym trudność, że nie zdołacie wejść na moją górę, lecz i w tym, że ja do was cały zejść nie mogę, gdyż schodząc, gubię po drodze to, co miałem donieść.