VOGONS


Reply 20 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Looking some more, you're right: The 80386 does up to 2 back-to-back cycles with operands crossing 32-bit offsets:
https://books.google.nl/books?id=-pz8rvnhFDkC … Q6AEwAHoECBIQAQ

So that means that all memory can be fetched in one clock as long as it doesn't cross modulo 4 offset? Otherwise the overflow up to the next dword is fetched in the next cycle?

So e.g. reading dword from offset 1/2/3 will all be done in two cycles fetching bytes 1&2&3, 4 on second cycle ; 2&3, 4&5 on second cycle; 3, 4&5&6 on second cycle respectively?

So actually, read/write until either done or (address modulo 4==0) on 32-bit (80386DX, 80486+). And read/write until either done or (modulo 2==0) on 16-bit(80386SX). 80286- stays unmodified(is already correct)?

Last edited by superfury on 2018-01-22, 01:11. Edited 2 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 21 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Looking some more, you're right: The 80386 does up to 2 back-to-back cycles with operands crossing 32-bit offsets:
https://books.google.nl/books?id=-pz8rvnhFDkC … Q6AEwAHoECBIQAQ

So that means that all memory can be fetched in one clock as long as it doesn't cross modulo 4 offset? Otherwise the overflow up to the next dword is fetched in the next cycle?

I think so yes.

superfury wrote:

So e.g. reading dword from offset 1/2/3 will all be done in two cycles fetching bytes 1&2&3, 4 on second cycle ; 2&3, 4&5 on second cycle; 3, 4&5&6 on second cycle respectively?

So actually, read/write until either done or (address modulo 4==0) on 32-bit (80386DX, 80486+). And read/write until either done or (modulo 2==0) on 16-bit(80386SX). 80286- stays unmodified(is already correct)?

Yes. However for 286 I still follow the 386SX bus rules. So it will read in multiple of 16bits only and mask out whatever it does not need.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 22 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

It seems I was right(sort of, but the result is equivalent(2 line(byte) select (sx) using the same method as the 8086/80286 vs 4 line(byte) select on the dx)): http://www.ece.ubc.ca/~edc/379/lectures/lec2.pdf

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 23 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Essentially, DON'T read entire 16-bit/32-bit data from RAM/hardware, but instead only read the higher 32-bit byte(s) until it reaches the mod 2(386sx) or mod 4(386dx) point. Read the rest in the same way on the next cycle(s). 32-bit read from address 1/2/3 doesn't actually read from addresses 0,4/0,1,6,7/0,1,2,7 after all. Only read what's requested, but in aligned word/dword chunks(so my method is a simple way to take care of all 32-bit cycle splits using simple comparison(mod 2(sx)/4(dx)) after all). Needing to process the chunks before and after on all hardware is too heavy when using masks to do so(and you'd have to calculate and parse those, while simply adding until modulo is much simpler with the same effect).

You can't read whole dwords from e.g. VRAM first and then discard part of it after all. The end result is simply that you only read the subblocks from a 16(386sx/286)/32(386dx/486+)-bit block you need, then on the next cycle do the same way until all is read. So a dword read from address A0001 will read A0001,A0002,A0003 on the first cycle and A0004 on the second. The other data in the blocks isn't read(would break VGA latches or related hardware problems if it did).

Last edited by superfury on 2018-01-22, 02:06. Edited 2 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 24 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Essentially, DON'T read entire 16-bit/32-bit data from RAM/hardware, but instead only read the higher 32-bit byte(s) until it reaches the mod 2(386sx) or mod 4(386dx) point. Read the rest in the same way on the next cycle(s). 32-bit read from address 1/2/3 doesn't actually read from addresses 0,4/0,1,6,7/0,1,2,7 after all. Only read what's requested, but in aligned word/dword chunks(so my method is a simple way to take care of all 32-bit cycle splits using simple comparison(mod 2(sx)/4(dx)) after all). Needing to process the chunks before and after on all hardware is too heavy when using masks to do so(and you'd have to calculate and parse those, while simply adding until modulo is much simpler with the same effect).

I've done that though in CAPE, I am building a mask for each 32bit request and when actually reading memory I discard everything that is not masked. But I gave up speed a while ago. 😀

My point is the code you showed before from UniPCEMU is not correct for 32bit bus. Your code will behave the same (and that is wrong) in 32bit bus vs 16bus because you will break down the reads/writes and you will end up similar between DX and SX when they should not be. Point in case, try to read 32bit from 3 then offset 6 using a 386SX and do it the same for 386DX. If you implementation is correct the 386SX is faster (than itself, we are not talking about DX vs SX speeds here) when reading from offset 6 than from offset 3 whereas the 386 gets no benefit, it is same speed for both offset 3 and 6. If you code can achieve that then you have correct code.

Here is what you should see accessing 32bit:

Offset   bus transactions for 386SX         bus transactions for 386DX
3 4 (or 3) 2
6 2 2

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 25 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

While the 386SX has a 16bit data bus, it is a 32bit processor, and all addresses it generates are 32bit. It wants to read 32bit aligned. And the bus part of it will try to do as little 16bit reads. Hence it will behave worse on an odd address. Like my table above shows.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 26 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Thinking about it: what happens when a word is read/written within a modulo 4 block on a dx? Will it be read in 1 cycle(also following my mod 4 rule) always? Or will e.g. a word from offset 1 be read/written in 1 cycle(no modulo 4 crossed and 4 address lines are supported on the DX)?

Hmmmm.... http://www.phatcode.net/res/260/files/html/Sy … nizationa2.html
It literally says(about 16-bit operations on a 32-bit bus):

With a 32 bit memory interface, the 80x86 CPU can access any byte with one memory operation. If (address MOD 4) does not equal three, then a 32 bit CPU can access a word at that address using a single memory operation.

So that means that on a 32-bit bus, the unaligned word read/write at address 1 will take 1 cycle? So essentially, on all those buses, there's only one rule: read until "((nextaddress and (bussizeinbytes-1))==0) or finished" each cycle. That's the simple rule defining the 8-bit, 16-bit AND 32-bit(maybe even 64-bit+) buses!

Edit: Just applied the new method of handling the 16/32-bit memory accesses on 8/16/32-bit buses using masks:
https://bitbucket.org/superfury/unipcemu/comm … 6f1dfdebb2b5cfe

I do notice a clear difference between 80386SX/DX on the Compaq Deskpro 386 emulation(while it keeps working as it did before, just at a different speed), but until I got a (hdd/floppy) disk image to verify behaviour and fixed related problems(can't install MS-DOS 6.22 using format.com from the AT emulation on a hdd without MS-DOS erroring out on executing format.com)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 27 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just tested(using the #11h DMA test routine of the IBM AT BIOS 6MHz(2nd revision motherboard emulated)) and modified the Intel Inboard 386/AT to apply the following clocks (since I can't find any reference actualy saying the exact waitstates used on the AT board. The XT board does have it's values noted in it's documentation, so I've used the XT as a base(for calculating the middle two clocks, which seems to use a roughly 0.5 factor on each other. So x4 x2 x1 x0 are the four waitstates actually used?).

It's simply done with a breakpoint at F000:05B8 and making sure that CX gets as close as possible to the 'ideal' value: F952h(according to the IBM AT BIOS source code from Intel Assembler).

This gives me the following waitstates (XT using 30 16 8 0, which seems to be a bit rounded up, according to the reference that's on minuszerodegrees, the AT documentation not saying anything about it): 94 47 24 0(using the same roughly rounding up method on the third clock, others are already whole).

Using those, the AT once again boots the Inboard Inboard 386 with the new timings:D I'm getting a CX value of F94A with the above mentioned values(AT value, not the XT ones, which should be working better now, like on a real XT).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 28 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
Thinking about it: what happens when a word is read/written within a modulo 4 block on a dx? Will it be read in 1 cycle(also fol […]
Show full quote

Thinking about it: what happens when a word is read/written within a modulo 4 block on a dx? Will it be read in 1 cycle(also following my mod 4 rule) always? Or will e.g. a word from offset 1 be read/written in 1 cycle(no modulo 4 crossed and 4 address lines are supported on the DX)?

Hmmmm.... http://www.phatcode.net/res/260/files/html/Sy … nizationa2.html
It literally says(about 16-bit operations on a 32-bit bus):

With a 32 bit memory interface, the 80x86 CPU can access any byte with one memory operation. If (address MOD 4) does not equal three, then a 32 bit CPU can access a word at that address using a single memory operation.

So that means that on a 32-bit bus, the unaligned word read/write at address 1 will take 1 cycle? So essentially, on all those buses, there's only one rule: read until "((nextaddress and (bussizeinbytes-1))==0) or finished" each cycle. That's the simple rule defining the 8-bit, 16-bit AND 32-bit(maybe even 64-bit+) buses!

No. Sure you can implement it like that but that is not at all how the HW works. After all PCEM is a good emulator that works but it has nothing to do with how the HW does work internally.

Yes, a word read from offset 1 is 1 bus transfer. I never said the opposite. My whole discussion was about a 32bit read from address 1, as that is what whichcpu does. On a 32bit bus that takes 2 bus transactions of 32bit transactions (not reads, as it will only read 4 bytes but it will request on the bus 8 bytes and will mask out actual reads with "byte enabled lines"). On a 16bit bus that will be either 4 or 3 transactions.

In your emulator you can implement this however you want, as long as you have correct number of bus transactions.

superfury wrote:

Edit: Just applied the new method of handling the 16/32-bit memory accesses on 8/16/32-bit buses using masks:
https://bitbucket.org/superfury/unipcemu/comm … 6f1dfdebb2b5cfe

I do notice a clear difference between 80386SX/DX on the Compaq Deskpro 386 emulation(while it keeps working as it did before, just at a different speed), but until I got a (hdd/floppy) disk image to verify behaviour and fixed related problems(can't install MS-DOS 6.22 using format.com from the AT emulation on a hdd without MS-DOS erroring out on executing format.com)).

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 29 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Eventually just took out my MS-DOS 5.0a disk image and am now using it to format the disk image. Then it's back to MS-DOS 6.22 to copy the OS and finally convert the disk image to a static disk image to copy over all files that were on the old disk image.

Still odd that MS-DOS 6.22 completely fails the FDC, but MS-DOS 5.0 reads it just fine.

Just finished generating a basic MS-DOS 6.22(nothing but the prompt) sys'ed hdd image of 2GB(Bochs/Dosbox format).
Edit: After looking for the problem that occurred with static/dynamic disk images, I've noticed that:
- The dynamic disk image header was incorrectly checked(size check), causing the disk to be formatted in the wrong format(compatibility format instead, so the old UniPCemu format incompatible with Bochs/Dosbox) and wrong CHS geometry to be reported to the OS(BIOS/MS-DOS).
- The static disk images didn't properly create metadata files(.unipcemu.txt and .bochs.txt) when converting dynamic disk images to static disk images, causing the format to be lost when reconverted into UniPCemu's dynamic disk image(.sfdimg) format.

I've managed to boot MS-DOS 5.0a properly from the XT boot floppy and fdisk/formatted the harddisk, but MS-DOS 6.22 doesn't seem to boot?

Edit: IBM AT with Intel Inboard AT seems to crash with a system board error?
Edit: It seems to program all DMA mode control registers for test mode instead of proper floppy write mode for reading sectors? Most DMA registers are filled with 0xAAAA?

The BIOS gives a 303-Keyboard or System unit error and 601-Diskette error? Anyone knows the cause?

Edit: Looking at a possible cause of the 601 error(E601) shows that the FDC might not finish initializing due to hdd testing failing(port 2f4 accesses)?
Edit: The AT 601 error seems to be because the Interrupt 13h Reset for the FDC is failing(AH=20h, while it expects 00h)? It's happening during the second 3C being written to the Diagnostic Port, so I've placed a breakpoint on that code, then single stepped down to the failing INT13h call(test2.asm, line 1169(verify status after reset)). That address corresponds to the BIOS execution at f000:00001260(real mode).

Filename
debugger.log
File size
220.52 KiB
Downloads
39 downloads
File comment
Faulting diskette.asm and related part.
File license
Fair use/fair dealing exception

Anyone can see what's going wrong there?
Edit: Since AH=20h it must be a FDC controller failure(I only see it resetting using the DOR from a hardware point of view, so maybe a processor error or forgotten handling of the FDC?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 30 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just cross-referenced with the AT BIOS source code. It seems the second FDC to give the Sense Interrupt ST0=D1, but C1 is given by my emulator instead, causing it to abort in error?

@line@address:desc
@329> after:interrupt(fdc?) @f000:0000217b.
Then: call 259f(WAIT_INT). WAIT_INT allows 0us response, delays up to 2sec otherwise for timeout.
28a0: NEC_OUTPUT@339. Succeeds sending command 8h: Sense interrupt. Executes.
@341@218e: call RESULTS(2987)
@1881@29b1: E8 82 F0 call 00001a36(WAITF): wait 15-30us for next result byte. returns after 15*cx delay correctly.
@1885@29b8: jump to popres, @29c4. restores di and ret from results.
f000:00002191 returned from results.
@348: goto next_drv(cl+1>=c3). @2183. is log row 3064
second drive expects ST0=D1, gets C1 instead? jumps to DR_ERR. Sets BAD_NEC as a result.

Edit: Modifying reset to clear the ST0 drive fault bit during pending reset now properly continues on. So far checked it recalibrating and seeking properly.
Edit: Yay! I see it continue on to the keyboard initialization now! The test passes now!

Edit: The only errors left now are the 303-system/keyboard error(which one I don't know), BIOS Checksum error and instead of booting the floppy, it's reading the floppy disk sector ID?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 31 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

How long is the 386 bus cycle in processor cycles? I was under the impression that it is like 286, 2 processor cycles. Is that true? I looked around for this info, but I cannot find it.

EDIT: even looking at the datasheet is not fully clear (I am trying to ignore all BUS hold operations). I need to know in the case when the 386 is the master of the bus and nothing upsets it how long does it take (with 0WS memory) for a "bus cycle"? At full speed can the 20Mhz 386DX read 20,000,000 * 4bytes = 80,000,000 bytes per second?

EDIT2: datasheet link

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 32 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Afaik it's like the 286: 2 T-states for each memory transaction for aligned byte/word/dword.

https://www.google.nl/url?sa=t&source=web&rct … S9Dm8VIAmFZcOhy

The only difference between SX and DX being the data bus width(16 vs 32-bits and two(lo/hi switch and enable split?) vs four byte enable lines)?

So 20MHz = 40MB/s.

Edit: Confirmed: http://microsig.webs.com/Events/Pipelined_Arc … _MA%26P_FDP.pdf

So 2 T-states, with each T-state on two double clocked clocks. So essentially 1 clock/T-state when ignored?

Edit: According to https://books.google.nl/books?id=qoJBa2jR3p0C … epage&q&f=false each T-state takes 2 double clocked cycles. And the double clocked cycle takes half the time of the specified clock rate(E.g. 80386DX-16 at 32MHz, 80386DX-33 at 66MHz). So the base clock is double of what's used, it's internally (after CLK2 input) divided by 2, any the resulting signal is the T-clock(or whatever you want to call the stage, Internal Processor clock in the documentation). This T-clock is named Internal 80386 Processor Clock(Same frequency as 82384 clock signal) in the documentation. See section 13.3.1 of the document.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 33 of 36, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just looked at the source code for the 303 error. It only seems to be used during POST 38h, which is a keyboard/8042 controller i/o test(test2.asm line 1017)?

Edit: After fixing the PS/2 keyboard being reset when enabling the First PS/2 Port on the AT+, it now continues on to try and reset the PS/2 keyboard. This somehow fails(CX==0 in the result, causing the BIOS to error out)?

My current notes on this issue:

line 1017 is he 39h 303 error cause?
bios version is 06/10/85 according to the ROMs(and source code matches).
POST 35h triggered as second time.
POST 36h triggered as well.
Reaches line 977 after turning PS/2 keyboard port on(8042 enable keyboard (port), command AEh).
Eventually reaches line 996(KBD_RESET CALL) @F000:1149 -> KBD_RESET gives CX=0(timeout?)? Line 476/test4.asm
XMIT_8042@F000:1FE1
XMIT_8042 returns successfully! CX=FFFF
AA 301 error?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 34 of 36, by SarahWalker

User metadata
Rank Member
Rank
Member

386 0W/S access = 2 cycles. As far as I've ever been able to tell, the 386 bus is just a wider version of the 286 bus. The 486 can drop down to 1 cycle but only as part of a 4-word burst when performing a cache fill.

Reply 35 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
SarahWalker wrote:

386 0W/S access = 2 cycles. As far as I've ever been able to tell, the 386 bus is just a wider version of the 286 bus. The 486 can drop down to 1 cycle but only as part of a 4-word burst when performing a cache fill.

Perfect, that is very clear. Thank you Sarah. That is what I have it implemented as now, but I wanted to double check.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 36 of 36, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

So 2 T-states, with each T-state on two double clocked clocks. So essentially 1 clock/T-state when ignored?

That makes sense, thank you Superfury.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/