VOGONS


Compaq Deskpro 386 CPU emulation issues?

Topic actions

Reply 20 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just lowered the CMOS clock speed to 32kHz, by:
1) Halving the 128kHz clock speed by removing the useless second Square Wave output which isn't used.
2) Halving the then-64kHz clock speed by doubling the rate of the half clock(decreasing all taps by 1, which can be done because taps 1&2 aren't used(they're redirected to taps 8&9, now 7&8 with the new base speed)).

Edit: Just tried firing up the http://minuszerodegrees.net/5170/setup/5170_gsetup_720.htm disk image and run gsetup, but now I see something odd: the time is counting like crazy(1-2 hours per second)? That's odd, seeing as time is counted by the CPU clock which is downgraded to a 32kHz timing base?

Edit: Delta timing doesn't clear correcty it seems!
Edit: Whoops: it was adding the remainder of the processed time instead of setting it as the new remainder(thus adding way more time than used).

Edit: Yay! After those have been fixed, no errors so far anymore on the setup wizard! 😁
Edit: Did various tests on the used hardware(all except parallel ports), but almost all of them checked out. System clocks says:

00:05:44
ERROR -
SYSTEM BOARD 152

That's quite odd, since it seems to be functioning correctly now, fully cycle accurate?

But at least it doesn't crash anymore 😁
Edit: Whoops, spoke too soon: selecting some of the log options from a write-protected A drive seems to send it on a frenzy scanning memory for some 5XXX word?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 21 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just tried the Compaq Deskpro 386 emulation again. It still seems to program the FDC/DMA incorrectly? It's still set to self-test mode accessing the Floppy disk?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 22 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

The CMOS seems to work almost, but the BCD hours(12 hour format) seems screwing up: 00:00(24-hour) is reported as 92:00 in Checkit Diagnostics 3.0?

Docs seem to say midnight is 12:00PM(bit 7 set)? Or is it 12:00AM(bit 7 clear)?

Edit: The FDC is giving errors again, with executing a 0x1D command(Seek High or Equal) which isn't supported by my FDC emulation. It will respond with ST0=80h and reset to command mode(resetted data mode) to wait for a new command to be given? Is that correct behaviour?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 23 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just implemented the FDC verify and Scan Equal/High or Equal/Low or Equal commands, completing the FDC with all known commands on the 82077AA 😁

Although I just used a simple hack on the Verify command, which just executes a normal Read Data command, but doesn't allow data to be transferred in the data phase, while executing said transfer in a more limited form(with DMA/TC support stripped) from a timed handler(just like SEEK/RECALIBRATE commands).

The Scan commands were using my own implementation, except I looked at https://www.aptanet.org/eightyone/downloads/a … lib765/765fdc.c for information on how to process the byte checks on each byte received from the CPU(the comparison data and data read from the floppy disk).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 24 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just tried the AT diagnostics disk again. The 0152 error is back! 🙁

Edit: Just a little question: when the FDC is seeking on any of the four drives, does it indicate that it's busy or not?

Edit: Trying a simple seek from track 0 to track 72 seems to timeout somehow? It performs the associated track before it completes? It should take ~1.8 seconds for the seek to complete, but the BIOS stops waiting sooner and starts reading the result too soon?

Edit: It does seem to be a CPU timing problem: it can only seek to track 49(maybe 50) before timing out the main WAIT_INT loop?

;-------------------------------------------------------------------------------
; WAIT-INT: THIS ROUTINE WAITS FOR AN INTERRUPT TO OCCUR A TIME OUT ROUTINE
; TAKES PLACE DURING THE WAIT, SO THAT AN ERROR MOV BE RETURNED
; IF THE DRIVE IS NOT READY.
;
; ON EXIT: @DSKETTE_STATUS, CY REFLECT STATUS OF OPERATION.
;-------------------------------------------------------------------------------
WAIT_INT PROC NEAR
STI ; TURN ON INTERRUPTS, JUST IN CASE
CLC ; CLEAR TIMEOUT INDICATOR
MOV AX,09001H ; LOAD WAIT CODE AND TYPE
INT 15H ; PERFORM OTHER PUNCTION
JC J36A ; BYPASS TIMING LOOP IF TIMEOUT OOME

MOV BL,4 ; CLEAR THE COUNTERS
XOR CX,CX ; FOR 2 SECOND WAIT
J36:
TEST @SEEK_STATUS,INT_FLAG ; TEST FOR INTERRUPT OCCURRING
JNZ J37
LOOP J36 ; COUNT DOWN WHILE WAITING
DEC BL ; SECOND LEVEL COUNTER
JNZ J36

J36A: OR @DSKETTE_STATUS,TIME_OUT ; NOTHING HAPPENED
STC ; ERROR RETURN

J37:
PUSHF ; SAVE CURRENT CARRY
AND @SEEK_STATUS,NOT INT_FLAG ; TURN OFF INTERRUPT FLAG
POPF ; RECOVER CARRY
RET ; GOOD RETURN CODE
WAIT_INT ENDP

So the seek command is executed, then this little routine is used to wait '2 seconds', but it only waits about

My notes:

F000:2452 CALL SETUP_DBL(271D): Returns doing nothing(Checks flag, clears carry and returns).
F000:2459 CALL DMA_SETUP: Setups and returns, without carry flag set(SETUP OK).
F000:2460 CALL NEC_INIT(Sets Carry Flag @25AA). Reached correctly?
F000:25B1 CALL SEEK (@28C1, line 1094)
F000:28D9 CALL RECAL(@2921, line 1713)
F000:28F1 Check for double step required: single step required, bypass double! line 1730
F000:2906 Seek command to NEC
F000:290D Drive number to NEC for seek
F000:2914 Cylinder number to NEC for seek. Seek starts.
F000:2917 CALL CHK_STAT_2, line 1744 (Carry set on timeout)
F000:293C CALL WAIT_INT (Waits 2 seconds for an interrupt)
F000:293F Returned from WAIT_INT with carry set(Timeout waiting 2 seconds using a loop)?
Incorrect result at this point:
F000:2463 CALL RWV_COM(25CF)

It goes at a speed of 26000026.66... ns for each track seeked(the first one being 26000026.66ns after the final byte of the seek command). So it should be finished after 1872001920ns or only 1.87 seconds for that seek command. But the loop seems to finish way before that? Does that mean that the 286 is running too fast? Even though it's following all the documented timings(minus the memory timings, which are handled by the BIU dynamically) at a 6MHz clocking speed? Since it only seeked 50 cylinders, it's only taken 1.3 seconds, which is far below the 2 seconds that are required by the loop? So it's running (2/1.3=1.53) 53% too fast? Is that purely because I've modified the BIU to behave differently on even/odd word accesses? Afaik behaviour should be the same, but it's the same code as the DMA check loop(which verfies), with one memory access added? So why the massive differentiation?

Last edited by superfury on 2018-01-30, 21:51. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 25 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Such a simple test,jnz,loop can't be that wrong in timing, can it?

https://bitbucket.org/superfury/unipcemu/src/ … ngs.c?at=master

What timings do you use for your instructions, vladstamate? Simply documented instructions minus 2(read or write only for each memory operands applicable) and 4 for read&write operands(like AND on memory operand?

Hmm... The technical reference contains some interresting stuff about stuff like waitstates(currently 1 for all RAM and memory accesses):

The 80286 Microprocessor operates at 6 MHz, which results in a 
clock cycle time of 167 nanoseconds.
A bus cycle requires three clock cycles (which includes 1 wait
state) so that a 500-nanosecond, 16-bit, microprocessor cycle
time is achieved. 8-bit bus operations to 8-bit devices take 6
clock cycles (which include 4 wait states), resulting in a
lOOO-nanosecond microprocessor cycle. 16-bit bus operations to
8-bit devices take 12 clock cycles (which include lO I/O wait
states) resulting in a 2000 nanosecond microprocessor cycle.
The refresh controller operates at 6 MHz. Each refresh cycle
requires 5 clock cycles to refresh all of the system's dynamic
memory; 256 refresh cycles are required every 4 milliseconds.
The following formula determines the percent of bandwidth used
for refresh.
% Bandwidth used 5 cycles X 256 1280
for Refresh = -------------- 5.3%
4 ms/167 ns 24000
The DMA controller operates at 3 MHz, which results in a clock
cycle time of 333 nanoseconds. All DMA data-transfer bus
cycles are five clock cycles or 1.66 microseconds. Cycles spent in
the transfer of bus control are not included.
DMA channels 0, 1,2, and 3 are used for 8-bit data transfers, and
channels 5, 6, and 7 process 16-bit transfers. Channel 4 is used
to cascade channels 0 through 3 to the microprocessor.

So BUS cycles(port i/o) don't use 1 waitstate like RAM, but 4(8-bit)/8(16-bit) waitstates, so 4/byte? Also, DMA runs at 3MHz instead of 4.77MHz?

http://www.minuszerodegrees.net/manuals/IBM_5 … 80070_SEP85.pdf

Having applied the timings(waitstates) to my BIU, the BIOS error beeps out(normal beep,high beep) on the DMA Refresh test?

Also, what's that about a 'refresh controller'? Isn't that done by DMA channel 0?

Edit: Implemented the memory&bus part into my BIU, but setting the Bus waitstates to 4 cycles makes the DMA test (toggle speed test at POST 11h) go wrong, so I disabled it right away. Memory accesses are affected better, now: waitstates are now applied during a bus cycle only(once at 16-bit(previously twice), once at 8-bit, but now also twice at broken up 8-bit(NEW) parts of 16-bit, being two 80bit transfers with each their own waitstate applied(3 cycles/access instead of 5 cycles in total(2+3=>3+3)).

Edit: That 1 cycle did seem to be enough to make the FDC seek correctly to the cylinder now? CheckIt! Diagnostics starts without seek error(timeout) but executing the floppy tests makes it report there is no floppy disk #0?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 26 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just tried booting MS-DOS 6.22. It'll eventually make the CPU hang when starting an interrupt, while the BIU has result data of a memory operation still loaded? Thus some instruction using the BIU isn't terminating it's BIU access correctly, prematurely ending it's operation? I see the BIU runs normally, fetching instruction data from RAM/ROM, but the instruction handler can't continue, waiting for the result buffer to be cleared(which will never happen, locking up the Execution Unit as a safety precaution indicating an error in the last executed instruction). Now the question is: what instruction is prematurely terminated(before it has a chance to finish)?

Edit: Looking at the CS:IP value I see 0070:FFFF, so somewhere in IO.SYS it gets to offset FFFF, which causes a pseudo-#GP fault, causing the EU to crash, because somehow the BIU has a response that´s left unread?

Last edited by superfury on 2018-01-31, 09:46. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 27 of 163, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Also, what's that about a 'refresh controller'? Isn't that done by DMA channel 0?

Not on a AT machine. On XT you get to use DMA channel 0 but in AT architecture, they use dedicated HW, which means it is slightly easier for us to emulate as we really only have to emulate the extra delay now and then.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 28 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

So on AT and up there's a 5-cycle eating coprocessor like DMA that's eating cycles every x cycles(what time isn't mentioned nor it's interval?), sharing the bus with CPU and DMA? What's it's priority? Sub-DMA? Sub-CPU? Highest priority of all?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 29 of 163, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Since it is memory refresh, highest of all.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 30 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

So, DMA is changed to 3MHz, Refresh to 6MHz(what about the 8MHz 286 boards and other CPU Speeds(compaq deskpro 386)??) taking 5 cycles(at what interval?)? Does it start like DMA(when the CPU releases the BUS)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 31 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just took a look at the documentation at pcjs.org:

http://minuszerodegrees.net/manuals/IBM_5170_ … 02243_MAR84.pdf
http://www.minuszerodegrees.net/manuals/IBM_5 … 80070_SEP85.pdf
http://bitsavers.trailing-edge.com/pdf/ibm/pc … rence_Mar86.pdf

The first two have identical information. The third one says the same as the first two, but adds 8MHz timings(which only result in the change of the oscillators for memory refresh and CPU):

Note: Where timing considerations between 6- and
8-MHz are different, the 8-MHz time is shown in
parentheses.
The 80286 microprocessor operates at 6 MHz (8 MHz), resulting
in a clock cycle time of 167 nanoseconds (125 nanoseconds).
A bus cycle requires 3 clock cycles (which includes 1 wait state)
so that a 500-nanosecond (375-nanosecond), 16-bit,
microprocessor cycle time is achieved. Eight-bit bus operations
to 8-bit devices take 6 clock cycles (which include 4 wait states),
resulting in a 1000-nanosecond (750-nanosecond)
microprocessor cycle. Sixteen-bit bus operations to 8-bit devices
take 12 clock cycles (which include 10 wait states) resulting in a
2-microsecond (1. 5-microsecond) microprocessor cycle.
The refresh controller steps one refresh address every 15
microseconds. Each refresh cycle requires 8 clock cycles to
refresh all of the system's dynamic memory; 256 refresh cycles
are required every 4 milliseconds but the system hardware
refreshes every 3.89ms. The following formula determines the
percentage of bandwidth used for refresh for the 6 MHz clock.
% Bandwidth used 8 cycles X 256 2048
for Refresh = -------------- = ----- = 8.7%
3.89ms/167ns 23293
The following formula determines the percentage of bandwidth
used for refresh for the 8 MHz clock.
% Band\,!.i dth used
for Refresh
8 cycles X 256 2048 -------------- = 6.5%
3.89ms/125ns 31120
The DMA controller operates at 3 MHz (4 MHz), which results
in a clock cycle time of 333 nanoseconds (250 nanoseconds). All
DMA data-transfer bus cycles are 5 clock cycles or 1.66
microseconds (1.25 microseconds). Cycles spent in the transfer
of bus control are not included.

This the third one is giving the same as the first two(the second and third ones adding information about the interval of the refresh controller(15us to be exact)). I'd assume the CPU is still the master of the BUS?

Edit: So, thinking about the refresh question:
every (3.83ms/cycle time/256) cycles, 8 cycles(5 cycles on non-8MHz AT motherboards) are spent on a memory refresh read?

DMA channel 0 and PIT 0 aren't connected to anything(PIT output/DMA input), thus having no effect? But the Bochs port.lst still specify it as connected?

Edit: Odd: PIT 0 output on http://www.minuszerodegrees.net/misc/schemati … oard_type_1.pdf : The PIT OUT1 seems to say (SHT21), which I find back on Page 1-72, where a pin says +REFRESH ? So the PIT is still used for refresh? Combined with SH20 (+IO CH RDY)? So it still has some effect on the motherboard?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 32 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++
vladstamate wrote:

Since it is memory refresh, highest of all.

That cannot be: the CPU in my emulator provides the base timing for ALL hardware, thus it'll need to execute first.

Also, memory refresh can't execute while the CPU is using the BUS? When it's not in T1 while not transferring anything(idle bus cycle of the CPU), the memory refresh has to wait for the BUS to be freed by the CPU? Thus it might be 1-2 cycles later, just like DMA? Also, how does it combine with DMA/CPU interruption logic?

Edit: Currently, trying to boot MS-DOS 6.22 from a floppy disk on the IBM AT emulation seems to somehow cause it to fault(thus starting an interrupt), while internal EU state indicates that it's finished processing a BIU operation(memory/BUS I/O), but somehow the BIU is in a state of finished operation(prefetching), but the interrupt handler won't continue because it detects that the BIU response buffer isn't empty(thus indicating an incomplete BIU request/response handling by the EU). The BIU maintains two FIFO buffers(request and response), where the EU adds a request to the request buffer and reads it result afterwards from the response buffer(all the while delaying 1 cycle while the condition of request is empty/response is full isn't met). That way:
1. EU: The EU tries to put a request in the request buffer. While it's filled(BIU is busy processing a request), the BIU isn't ready, thus delay 1 cycle intervals while checking.
2. EU: The EU has successfully written the request to the request buffer(request buffer full). The EU starts waiting for the response buffer to fill(delaying 1 cycle intervals waiting).
3. BIU: Detects the request buffer is filled. Reads the request buffer(emptying it) to start the current request. It then starts the transfer to/from Memory/BUS I/O.
4. BIU: Finishes the request. It then puts a result value in the response buffer(always 1 for writes, value read for Memory/BUS reads). The BIU then starts back at step 2(all the while filling the Prefetch buffer if nothing is requested using the same cycle method).
5. EU: The EU detects the response buffer is filled. It then reads the response buffer and continues the instruction(value read from the buffer is the value read from memory/IO. With writes, it's always 1 and the value is discarded(can be theoretically used to indicate success or other kinds of non-zero information).

Of course, the EU method of filling the request buffer is handled using three kinds of requests: direct memory access request(physical layer), virtual address request(translates and redirects to the direct memory access request function) and BUS request(for IN/OUT instructions). Each and every one of them writes to the request buffer using (translated to physical) values and memory/BUS differentiation information. The reading of the result is the same in all cases(reading the response buffer by the same function calling one of those three methods). This is step 1 in the above list.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 33 of 163, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
vladstamate wrote:

Since it is memory refresh, highest of all.

That cannot be: the CPU in my emulator provides the base timing for ALL hardware, thus it'll need to execute first.

Those two things are orthogonal. I meant highest bus priority.

superfury wrote:

Also, memory refresh can't execute while the CPU is using the BUS? When it's not in T1 while not transferring anything(idle bus cycle of the CPU), the memory refresh has to wait for the BUS to be freed by the CPU? Thus it might be 1-2 cycles later, just like DMA? Also, how does it combine with DMA/CPU interruption logic?

That is correct, if the CPU has a bus request in flight, or if there is a DMA in flight I expect the refresh to wait. Once that starts though, for a certain amount of cycles, nothing can touch the bus. By highest priority I meant if the CPU has a request ready but it is time to to do the RAM refresh, the RAM refresh wins.

superfury wrote:
Edit: Currently, trying to boot MS-DOS 6.22 from a floppy disk on the IBM AT emulation seems to somehow cause it to fault(thus s […]
Show full quote

Edit: Currently, trying to boot MS-DOS 6.22 from a floppy disk on the IBM AT emulation seems to somehow cause it to fault(thus starting an interrupt), while internal EU state indicates that it's finished processing a BIU operation(memory/BUS I/O), but somehow the BIU is in a state of finished operation(prefetching), but the interrupt handler won't continue because it detects that the BIU response buffer isn't empty(thus indicating an incomplete BIU request/response handling by the EU). The BIU maintains two FIFO buffers(request and response), where the EU adds a request to the request buffer and reads it result afterwards from the response buffer(all the while delaying 1 cycle while the condition of request is empty/response is full isn't met). That way:
1. EU: The EU tries to put a request in the request buffer. While it's filled(BIU is busy processing a request), the BIU isn't ready, thus delay 1 cycle intervals while checking.
2. EU: The EU has successfully written the request to the request buffer(request buffer full). The EU starts waiting for the response buffer to fill(delaying 1 cycle intervals waiting).
3. BIU: Detects the request buffer is filled. Reads the request buffer(emptying it) to start the current request. It then starts the transfer to/from Memory/BUS I/O.
4. BIU: Finishes the request. It then puts a result value in the response buffer(always 1 for writes, value read for Memory/BUS reads). The BIU then starts back at step 2(all the while filling the Prefetch buffer if nothing is requested using the same cycle method).
5. EU: The EU detects the response buffer is filled. It then reads the response buffer and continues the instruction(value read from the buffer is the value read from memory/IO. With writes, it's always 1 and the value is discarded(can be theoretically used to indicate success or other kinds of non-zero information).

Of course, the EU method of filling the request buffer is handled using three kinds of requests: direct memory access request(physical layer), virtual address request(translates and redirects to the direct memory access request function) and BUS request(for IN/OUT instructions). Each and every one of them writes to the request buffer using (translated to physical) values and memory/BUS differentiation information. The reading of the result is the same in all cases(reading the response buffer by the same function calling one of those three methods). This is step 1 in the above list.

That is more or less how my BIU <->EU interaction is implemented. What we are missing is that for 286+, there is a decode unit that runs in parallel with EU and BIU. I have not implemented that either.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 34 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just modified all EU instruction handers to check perform memory checks that are used in an instruction only once during it's runtime(based on the current instruction step). That should prevent memory updates from influencing behaviour of the current instruction(although afaik there are none), while also increasing emulation speed(by not checking all memory accesses each cycle 😁 ). It should also prevent those exceptions from triggering in the middle of handling an instruction on the BIU(Since those checks aren't made in the middle of those memory/BIU transfers anymore).

Edit: Now the instruction at 40:FFFD changes to being in the middle of a modr/m read/write? The request buffer is empty, the response buffer is filled(but not processed due to an exception interrupt (fetching the instruction))?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 35 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Strange that the incomplete transfer still occurs: I've checked each and every transfer function giving such requests to the BIU, and each of them has a proper request and response stage in the common I/O, direct memory and modr/m handlers?

Could it be some problem with non-memory transfers(BUS(I/O port) transfers)? Or maybe a simple case of an interrupt occurring at the wrong time(hardware interrupts are handled in the emulator core instead, moving the CPU execution phase from opcode handlers(which are all checked and should be working properly, assuming they're using the correct base(start phase number, which is used to step the execution unit through it's different data fetch(common handlers, usually even number)/store(common handlers, usually even number)/execution(handled by the CPU core itself)/reponse-read(usually odd, following either of the first two) phases.

The main instruction handlers(e.g. CPU8086_OP8E) usually call the instruction variant of the common handlers. Interrupts call the interrupt variant(reads only, only for real mode atm) and the common instruction handlers(called by their main instruction handlers) call the internal variants.

E.g. CPU8086_OP00 calls the instruction modrm read byte common function to buffer the source value with base 0. Then it calls CPU8086_internal_ADD8 with modr/m parameter pointer(NULL for memory) and value buffered from thr previous read(kept through execution).

CPU8086_internal_ADD8 calls the common internal modrm read function to read and buffer the value to process(if the pointer passed is NULL(memory operand)). It then advances the internal instruction state to the next number. Said number causes it to skip modr/m checks and the read phase(although not required for the read phase, it having been incremented into a NOP due to not matching base and base+1 anymore).
The next internal state within the CPU_8086_internal_ADD8 is the execution state. It will calculate the result, update flags and apply the cycle count(adding it to the result count which is used to time the BIU afterwards, in the main CPU_exec function calling BIU_exec). It will then increase the internal state into the writeback phase. After that, when the destination is NULL(memory), it simply aborts like a commom modr/m handler, allowing the CPU main core/BIU to tick until it's processed the instruction, which will end up in the result phase part of CPU8086_internal_ADD8.

The result phase part, when using a register(non-NULL destination) writes to the pointer(pointing to the register). Otherwise(dest==NULL), it will call the common internal modr/m write function to write the result back to memory. When these finish, 0 is returned, causing the main instruction handler to either perform the next parts(usually none) and return with non-cleared executed flag, which causes the CPU main core to perform post-instruction operations(debugger registers executing post-instruction(DRn register support) and REPeat instructions resetting EIP) and reset counters and phases for the next instruction(Setting isfetching and fetchphase to 1). isfetching is a flag indicating it's to fetch instructions(when not repeating). Fetchphase 1 is new instruction start, which will clear instruction information, phase counters etc. The CPU_readOP_prefix function, which handles opcode reading and parsing using the data tables(read data from the prefetch buffer), will then see phase 1 and initialize itself to phase 2: read prefixes. Phase 2 will fetch prefixed and when not a prefix, it switches to phase 3(parameters modr/m). Phase 3 will delegate itself to the modrm_readparams is one is to be used(from this point onwards, everything is controlled by the table values). Phase 3 will eventually increase to phase 4(data parameters). Phase 4(then last fetching phase) will fetch byte/word/dword/ptr16:16(uint_32)/ptr16:32(uint_64)/offset16/32(into one uint_32 variable) from prefetch and enter the finishing phase when done. The finishing phase resets the CPU phases(used during execution of e.g. CPU8086_OP8E), sets the correct instruction handler pointer from the opcode handler lookup table and returns 0, causing the CPU main handler to initialize remaining stuff(REPeat pre-instruction support(blocking operation with (E)CX==0) and launch the current execution handler.

The current execution handler currently has three states: instruction(the pointer stored during CPU_readOP_prefix), interrupt(run an interrupt handler(cycle-accurate for real mode(like a special instruction handler) and task switch(currently non-cycle accurate). It executes a simple pointer to either of the three functions, which is changed by calls to either of the three start handler(instruction one is called by CPU_readOP_prefix to start a new instruction and proceed to execution phase(bypassing fetching/pre-instruction phases).

The final part in the main CPU core handler simply handles the REP and executed(flag) states(finishing the instruction and resetting to fetching mode), calls the BIU to tick cycles(which always reports 1 cycle due to cycle accuracy for the main timing to tick). The final part is simply, on executed, store curent opcode information state(modr/m byte(if any, 0x00 otherwise), opcode byte, 0F prefix) for debugging and then returns to the main execution loop, which handles hardware(and the loop around that executing video renderer updates(up to 60FPS), I/O for mouse/keyboard/touch/SDL events) and emulator creation/destruction(initEMU/doneEMU), app termination and SDL video/audio creation/destruction for SDL.

That's the basics of my cycle-accurate emulation.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 36 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Could it be that the error is caused by an external influence, like an incorrectly triggering hardware interrupt in the middle of handling a BIU request/response combination? The logic for it to trigger is currently very simple, but it shouldn't happen when an instruction is running(it only checks when it's in fetching phase 1(starting a new instruction, before phase 2(read instruction) is triggered, so there should be no BIU request running, as the previous instruction should already have finished(whose common BIU handler won't let it finish while it's handling the request before it retrieves the result(except with modr/m pointing to a register(mod=3)). Oddly enough, the instruction status has stage 2 in it's modr/m parameter, which indicates it's last BIU operation for the current instruction has modr/m related(either request and response read or pointing to a register(mod=3), which causes it to skip the response phase(since it's not supposed to have requested anything from memory/BUS, giving result code 2(register, no response to expect) instead if 1(memory BIU request buffer filled)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 37 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just looked through the code that handles a new instruction being read: the new opcode active information is only reset/loaded when an instruction is conpletely fetched&decoded(starting to pend the modr/m decode phase). Only then does it clear the stepping states, thus it indicates that an modr/m read/write operation is finishing to execute(as an register?) without actually reading it's results? The opcode handler and related information should at that point still point to the offending instruction terminating prematurely. That should be the case for both the actual handler pointer, stepping states(even/odd states in the inner handlers) as well as the actual executing opcode byte for the lookup table. That should indicate something...

So the problem happens at 0070:fff9, after 0070:fff7, during an ADD(opcode 00h) instruction.

So, breakpoint at 0070:fff7(thus after it executes it's waiting in the debugger), single step into the next instruction and debugging the execution should show what's going wrong.

Edit: Found the bug: It starts executing 00h ADD, requesting memory to be read. The BIU processes it and finishes. Then it requests a write, which is requested. The BIU handles it and starts processing. Then eventually, it starts to fetch opcodes, which fault on the memory unit itself(MMU.invaddr=1), causing the ADD to terminate mid-transfer because it's set, never reading it's result, starting the next opcode with an invalid EU state. This faults properly, hanging on INT.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 38 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just removed all those MMU.invaddr checks from the instructions. It gets set on faulting memory transactions, which is not used by the EU directly(imaging word accessing end of RAM(Valid) and one byte further(Invalid) aborting the whole transaction, while a real CPU just reads 0xFFxx or writes the low byte only(xxh to last address), or wraps.
This should fix the incomplete memory transactions(untested atm).
Edit: Just confirmed. It's now no longer hanging(although infinitely faulting and returning to address 70:ffff(#GP fault).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 39 of 163, by superfury

User metadata
Rank l33t++
Rank
l33t++

Now the only question left: what causes the invalid jump from IO.SYS(MS-DOS 6.22) and XT-IDE AT ROM(Trying to reboot after invalid boot).

Edit: There's also the strange case of the Compaq Deskpro 386 crashing due to invalid programming of the DMA channel 0 when using a CGA/MDA adapter. And there's the odd case of it programming DMA channel 2(FDC DMA channel) for mode 0(verify) when setting up DMA for a FDC read?

Edit: The interrupt seems to be located at a IBM PC-compatible address within the Compaq Deskpro 386 ROM, at F000:EC59. So that's a starting point to check...
Edit: So, after some stepping further (skipping the reset call), I get to F000:ECC0, which seems to be the read diskette (function 02h of interrupt 13h). That's a next starting point:D
Edit: ECC0 calls 8FA4, which returns with zero flag set.
It continues on to ECC5, which calls 9061, which writes the DMA mode to 0x42? That's an invalid value?

Then it executes a subroutine, which returns BX=0090. That's some offset in the BDA? I can't find anything about that offset(not listed in any tables)?
Then it calls D490, which starts a seek, then returns with zero flag set.
Now it calls subroutine x9072, which calls subroutine x9190 at x9072. The 9190 subroutine loads the BDA offset in BX and returns.
Then, at x9084, it calls xd3f3, which returns with zero flag set.
Then it calls xd3ff at x9089, which updates the CMOS floppy drive type to 0x34, which is 1.44MB on A drive(correct) and 720K on the B drive(no disk inserted, so correct by default)? So it correctly detects the 1.44MB disk drive:D

Afterwards, it gets to that point again, now writing 0x34 to the CMOS.

Then it compares something on the stack(BP+06, which is the drive number) to 0-3, to check for invalid floppy drive numbers? It doesn't detect any, so it continues on to x909B. It then calls xCA03, which returns. Then x9190 again to load the floppy BDA offset. Then it loads that BDA entry with 0x61.
Edit: Just after some more looking, x9190 gives the disk 0,1,2,3 media state(state, double stepping and data rate, which is then set to double stepping required, drive established, 360Kb diskette/1.2Mb drive not established according to http://stanislavs.org/helppc/bios_data_area.html ).

It then calls xd490, which returns. Then it calls xECF0, which sets the DMA mode control register?

Eventually calls xd3f3(which returns the CMOS checksum status being OK(zero flags set) or bad(zero flag cleared)). In this case, the zero flag is set, thus CMOS checksum is OK. Good to know.

It then calls xd3ff, which returns the floppy disk types from the CMOS RAM.

The trouble all starts at f000:000090a8, according to what my debugger tells me.

So at f000:000090a8, it sets al=6, dma lookup value(AH storage) to 4,AL storage to 1. This happens at row 12556 in my log. Then at f000:0000efc0, it compares it against 2, which fails, so no jump(it's 4 after all). That's at row 12623. Then at f000:0000ed65, it loads the value 4, multiplies it by 2 by shifting it left, and loads the DMA mode register from the table at EFD1, which results in AX=E642? That 42h in AL is then loaded in the DMA mode register, which is invalid?

The specified table contains the following elements(as far as I can see, the low byte is the DMA mode, the high byte following it is the command to issue to the floppy disk controller). So it should have the elements 0xE6X5, where X should be 4 with maybe 1 or-ed in(auto initialize) if wanted to auto initialize on completion(usually not, due to it being a single transfer for a single (set of) sectors).

The values in the looked up table(thus indexed by the value in BL, which is 4(th entry, with the entries starting at 0) in the current case):
0=E908(Command E9, DMA 08). This is a Write deleted data command. Uses DMA: channel 0/4, on demand, reading from memory. Maybe some odd controller, or invalid?
1=9915(Command 99, DMA 15). This is a Scan low or equal command. Uses DMA: transfer on demand, channel 1/5, write to memory, auto-initialize. Maybe some Sound Blaster channel for recording? It looks a bit out-of-place for a Scan low or equal command?
2=E646(Command E6, DMA 46). This is the correct command: Read sector, writing to memory, single DMA transfer mode.
3=C54A(Command C5, DMA 4A). This is the reverse: Write sector, reading from memory, single DMA transfer mode.
4=E642(Command E6, DMA 42). This is the value currently used with the above case: Read sector, Verify mode(no reading/writing to memory or hardware, Self Test of the controller)
5=B703(This probably is something completely different, using DMA channel 3, which isn't connected to the FDC, thus probably past the end of the table).

It's currently using entry 4(as read from BP+01), but looking at the table, it actually should be entry 2 instead.

The value 4 comes directly from the location [BP+01], which is loaded with 4 by a MOV instruction.

Looking at the simple function list that's passed in AH to the interrupt handler shows something interesting:
AH = 02h Read Sectors From Drive : Entry 2 in the table contains it's command and DMA mode.
AH = 03h Write Sectors To Drive: Entry 3 in the table contains it's command and DMA mode.
AH = 04h Verify Sectors: Entry 4 contains a rough equivalent, although the controller is in Verify mode? Maybe this has something to do with it?
AH = 05h Format Track : Entry 5 doesn't match up, so this confirms the end of the table is at the previous entry.

So, it expects the value at BP+01 to actually contain what it's documented to be: the function number the low level disk services gets passed when called. But, at f000:000090aa, it actually overwrites it with value 04, which is the verify sectors command!!!

That line isn't supposed to be executed at all? So there's something going wrong earlier in the code?

Edit: According to disassembly, opcode 86h has it's parameters backwards? XCHG AH,AL should be XCHG AL,AH according to the documentation? According to Online Disassembler(https://onlinedisassembler.com/odaweb/) my behaviour is correct?

Why is it loading the invalid value into the function number, which (according to the start of the interrupt handler's documentation) isn't supposed to change during execution?

Attachments

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io