VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

What happens when a 16-bit or 32-bit port is accessed? Is the 16-bit or 32-bit access always split into bytes on those accesses, or are the 32-bit accesses always split into 8-bit accesses or 16-bit accesses(when word-aligned), so 32-bit->16-bit(when word aligned)- (when not word aligned) > 8-bit?

So, unaligned accesses are always split to smaller chunks for as long as it's unaligned(dword to word or byte, whichever one comes first). Of course, no I/O acnowledge does the same?

So:
word 16-bit(aligned): response. STOP.
word 16-bit(aligned): no response. Split to byte n and n+1. STOP.
word 16-bit(unaligned): split into byte to n and n+1. STOP.

dword 32-bit(aligned): response. STOP.
dword 32-bit(aligned): no response. Word access logic at n and n+2(see above word logic). STOP.
dword 32-bit(unaligned): Word access logic at n and n+2. STOP.

Is this logic correct? Is this applied at the CPU or motherboard level(so CPU splits like this, hardware(motherboard) splits transparently in single word timing), or a combination of the two splitting methods)?

Anyone?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 4, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

You have to take into account 2 factors:

- alignment and whether your machine is a 16bit (8086, 80286, 80386SX) or 32bit (80386DX) bus machine
- if it is an ISA or on board device. If ISA it is important if the card is 8bit or 16bit. For example for 8bit cards you ALWAYS break into 8bit accesses regardless of anything else.

Other than that, there is no difference between memory accessing and port in/out.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 2 of 4, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, currently I use the logic mentioned above during the first bus cycle, the other cycles are processed like memory cycles(although no I/O is performed at those cycles, whereas there is with memory cycles(in byte quantities, which are processed in variable sized loops, depending on alignment and final position(stopping at modulo bus size locations), assuming the hardware is bus size at all times). So it's currently inaccurate in that way when using word/dword input/output on e.g. the PIT registers.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 4, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, what I can find from the 80386DX is this:
https://www.slideshare.net/Raunaqss/pin-descr … iagramof80386dx

Look at slides 10-11:
• ADS - AddressDataStrobe.
a) The address status output pin indicates that the address bus and bus cycle definition pins are carrying the respective valid signals.
b) This signal becomes active whenever the 80386 has issued a valid memory or I/O address.
• NA# - Next Address causes the 80386 to output the address of the next instruction or data in the current bus cycle. This pin is used for pipelining the address.

• BS16# - Bus Size 16 pin selects either a 32 bit data bus (BS16=1) or a 16 bit data bus (BS16=0)
a) In most cases, 80386DX is operated on 16 bit data bus.
b) The bus size 16 input pin allows the interfacing of 16 bit devices with the 32 bit wide 80386 data bus.
• READY# -The ready signals indicates to the CPU that the previous bus cycle has been terminated and the bus is ready for the next cycle. The signal is used to insert WAIT states in a bus cycle and is useful for interfacing of slow devices with CPU.

So 32 vs 16-bit hardware can be detected by the input of the BS16# pin, which, according to somewhere else(don't remember where atm), is negated for 16-bit bus devices. So in that way, the CPU knows to split up the 32-bit access into aligned 16-bit addresses. I can't find anything about 8-bit accesses, though.

Edit: Thinking about it more, it looks like actual 8-bit accesses from a 16-bit in/out cycle is actually archieved using wait states? So adding 2 bus wait states for each additional 8-bit output? Or maybe even 5 bus wait states(accounting for one bus cycle to detect whether or not hardware will respond to the 16-bit output, so one failing access(1 cycle), then two 8-bit transfers(each 2 cycles)), which is being done by the READY# pin? Thus delaying the CPU by 1 cycle during 32-bit address being split and delaying by 1+(2x2) cycles during the 16-bit transfer(that might be a split or a non-split transfer)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 4, by superfury

User metadata
Rank l33t++
Rank
l33t++

Looking even further, I've found this:
http://collaboration.cmc.ec.gc.ca/science/rpn … 9005f/9005f.htm

It looks like it's even worse than that:

16-Bit Accesses to 8-Bit Adapters There are two fundamental classes of adapters that may be plugged into the AT bus: 8-bit adapt […]
Show full quote

16-Bit Accesses to 8-Bit Adapters
There are two fundamental classes of adapters that may be plugged into the AT bus: 8-bit adapters and 16-bit adapters. The two are distinguished by the extra bus connector that appears only on 16-bit adapters; in addition, 16-bit adapters must announce to the bus that they are indeed capable of handling 16-bit accesses, by raising a particular bus line on the 16-bit connector early on during each access.

What happens if an adapter doesn't have the 16-bit connector, or if it doesn't announce that it's a 16-bit device? Why, then the AT's bus does two things. First, the bus splits each word-sized access to that adapter into 2-byte-sized accesses, sending the adapter first 1-byte and then the other. That's not all the bus does, though: During each of those byte-sized accesses to an 8-bit adapter, the AT bus inserts three extra wait states (in addition to the one-wait state that's routinely inserted), effectively doubling the access time per byte of such adapters to six cycles, as shown in Figure 2. These extra wait states, which I'll refer to as 8-bit-device wait states, form a pivotal and little-understood element of 16-bit VGA. Together with the splitting of word-sized accesses into 2-byte-sized accesses, 8-bit-device wait states can quadruple the access time per word of 8-bit adapters; instead of accessing one word every three cycles, as is possible with 16-bit adapters, the AT can access only 1-byte every six cycles when working with 8-bit adapters.

Three extra wait states are inserted on accesses to 8-bit adapters because the first 8-bit adapters were designed for the PC's 4.77-MHz bus, not the AT's 8-MHz bus. In order to ensure that PC adapters worked reliably in ATs, the designers of the AT decided to slow accesses to 8-bit adapters to PC speeds by inserting wait states to double the access time. Modern adapters, such as the VGA, can easily be designed to run at AT speeds or faster, whether they're 8- or 16-bit devices -- but the AT bus has no way of knowing this, and insists on slowing them down -- just in case. It should be obvious that true 16-bit operation, where an adapter responds as a 16-bit device and handles a word at a time, is most desirable. Not at all obvious is that it's also desirable, that an adapter respond as a 16-bit device even if it can internally handle only a byte at a time. In this mode, an inherently 8-bit adapter announces to the bus that it's a 16-bit device; on writes, it accepts a word from the bus and then performs two 8-bit writes internally, and on reads, it performs two 8-bit reads internally and then sends a word to the bus. From the perspective of the bus, each word-sized operation seems to be a 16-bit operation to a true 16-bit adapter, but in truth two accesses are performed, so the operation takes twice as long as if the adapter were a 16-bit device internally.

Why bother? The advantage of having an 8-bit adapter respond as if it were a 16-bit adapter is this: The bus is fooled into thinking the adapter is a 16-bit device, so it doesn't assume that the adapter must run at PC speeds and doesn't insert three extra wait states per byte. From now on, I'll use the word "emulated" to describe the mode of operation in which an adapter that's internally an 8-bit device responds as a 16-bit adapter; this mode contrasts with the true 16-bit operation offered by adapters that not only respond as 16-bit devices but are 16-bit devices internally. AT plug-in memory adapters, for example, are true 16-bit adapters. 16-bit VGAs, on the other hand, may be either true or emulated 16-bit adapters; in fact, as we'll see, a single VGA may operate as either one, depending on the mode it's in.

Emulated 16-bit operation is at heart nothing more than a means of announcing to the AT bus that an inherently 8-bit adapter can run at AT speeds thereby making the three 8-bit-device wait states vanish. While emulated 16-bit adapters can run up to twice as slowly as true 16-bit adapters (word-sized accesses must still be performed a byte at a time), emulated 16-bit operations can double the performance of an inherently 8-bit adapter that is otherwise capable of responding instantly, by cutting access time from six to three cycles.

So converting to 16-bit probably takes only one cycle(during a 32-bit access. 16-bit access takes no extra cycles at this point). Converting the (then) 16-bit access into 8-bit accesses takes 3 cycles on top of the already existant waitstates introduced by the motherboard(e.g. 1 waitstate for AT motherboards). So for 16-bit converted to 8-bit this adds 1(start)+1(waitstate)+3(split waitstate)+1(T2)=6 cycles, which is then multiplied by 2(=12 cycles) because there are being two byte accesses in the end. Probably add 1 cycle to that for the 32-bit access failing(it has to be detected anyway, thus 12 waitstates in total).

Also, reading further:

Wait States in Other AT-Bus Computers
All AT-bus 80386-based computers slow down both 8- and 16-bit adapters considerably. (Obviously, 16-bit VGAs are wasted in 8-bit PCs, in which they operate as 8-bit devices.) AT-bus 80386 computers insert wait states -- often a great many wait states -- on accesses to 16-bit devices in order to slow the bus down to approximately the 375 nanoseconds excess time speed of the AT bus, so that AT plug-in adapters will work reliably. A 33-MHz 80386 is capable of accessing memory once every 60 nanoseconds (two cycles); ten wait states must be inserted to slow accesses down to about the 375-nanosecond access time of a standard AT. Clearly, memory on 16-bit plug-in adapters responds considerably more slowly than 32-bit memory in 80386 computers; the 80386 in the above example is idle more than 80 percent of the time when accessing plug-in 16-bit memory. Because of this, you can expect to see VGAs built onto the motherboards of most high-performance computers in the future, thereby completely bypassing the many wait states inserted by the AT bus.

In many 80386 computers, 8-bit adapters are worse still. A number of 80386 motherboards slow accesses to 8-bit adapters down to about the PC's bus speed of 838 nanoseconds per access, which could mean as many as about 25 wait states in the above example. However, a number of 80386 computers slow both 8- and 16-bit adapters down to AT speeds; in these computers, the performance distinction between 8- and 16-bit adapters vanishes.

So it's slowing down the waitstates, so that 8/16-bit accesses actually perform at ~375 nanoseconds(6MHz AT bus clock speed, about 2 cycles) for each 16-bit access and 838 nanoseconds(PC 4.77MHz speed, so three 14.31818MHz cycles?) for each 8-bit access? That's of course including the 1 waitstate that's already inserted, as well as the two cycles that are performed at T1 and T2 on a 80286+ CPU.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io