VOGONS


Reply 60 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

One quick question: what happens when a memory access is made with a 48-bit pointer(32-bit offset), which passes the protection checks, but being a 16-bit offset?

I've just tried the MS-DOS 6.22 himem.sys(instead of the old one(I believe the Windows 3.0 one)). It reports bad memory at address 0x100000(1MB extended memory being bad)I?

Edit: Looking at the execution flow from the point of the 8042, I see it keeps writing the 8042 output port, toggling the A20 line high and low again? Then, accessing the 1MB range, while the A20 line is disabled(this doesn't work, wrapping it to the low memory area instead)?

Also, when running CheckIt Diagnostics on the XT 80186, it runs without problems, but on the 80386, it crashes (the last I see is the 10/12 check in red text, after that most of the screen is cleared to black?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 61 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've been noting down the accesses I see from the 8042 which keep toggling the A20 line on and off through the 8042(HIMEM.SYS afaik):

025F:046D=D1(command)
025F:0476=DF(data)
025F:048D=FF(command)
025F:046D=D1(command)
025F:0476=DD(data)
025F:048D=FF(command)
025F:040D=XX(1MB access)

This keeps repeating for every block of 8 bytes, which skips 8 bytes afterwards? (e.g. 100050-100057, then 100060-10067, 100070-100077, 100080-100087 etc.) Each of these blocks happens after each of those 8042 commands, which of course wrap to low memory (0MB-1MB) due to the A20 gate being disabled?

Opcodes executing are just the 8086+ A6(16-bit CMPSW instruction) instruction.

Edit: I've made a little logging dump of what happens while executing the loop, as well as the accompanying 8042 outputs that keep executing after each REP block.

Filename
debugger_failingCMPSW.zip
File size
698.93 KiB
Downloads
35 downloads
File comment
The CMPSW that keeps failing because the memory is wrapped.
File license
Fair use/fair dealing exception

Also, the REP CMPSB instruction is actually a REP CMPSW instruction(a bug in the debugger that I've fixed immediately after dumping this), but the text generated was wrong(simple if-else that defines the text to log based on the address size was using the wrong text(CPU_Address_size[activeCPU]?"CMPSW [%s:ESI],[ES:EDI]":"CMPSB [%s:SI],[ES:DI]")).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 62 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just tried implementing part of the seperated interrupt(exceptions too)/task switch model, but whenever I try to enable the new code, the Turbo XT BIOS crashes when it gets to the point of memory testing? It displays the empty space before KB, but never gets to the point of actually checking RAM?

This commit still works without problems: https://bitbucket.org/superfury/unipcemu/comm … ef37b0b89cf3e4c
This is the fully modified commit, with the new seperated model of interrupts(&exceptions)/task switching: https://bitbucket.org/superfury/unipcemu/comm … 6091acb901310f9

Can anyone see why this is happening? It shouldn't go wrong, something else(that has to do with the new seperated module instead of the old one which directly executed instructions) is causing the entire software running in the emulator to go haywire?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 63 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've made a little log(stripping the memory I/O from it for easier compatibility) that shows what's going wrong(it seems to go wrong at a conditional jump for some unknown reason?):
Old methods(not seperated and AAD instruction still using 8-bit ADD flags):

Filename
debugger_UniPCemuWorking_OldMethods_20170818_2216.zip
File size
95.21 KiB
Downloads
37 downloads
File comment
Old methods, which still work correctly.
File license
Fair use/fair dealing exception

New methods(Seperated interrupt/exception and task switching handling, AAD instruction using 16-bit ADD flags):

Filename
debugger_UniPCemu_newmethods_20170819_1945.zip
File size
2.12 MiB
Downloads
35 downloads
File comment
New methods, which are failing.
File license
Fair use/fair dealing exception

Can anyone see what's going wrong? What's the cause of this strange behaviour?

Edit: Besides splitting the code to handle interrupts/task switching, I've also modified the error handling mechanism. Maybe that's the problem(the new faultraised status handling)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 64 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

Strangely enough, improving the interrupt mechanism to be seperated makes the CPU fail when starting the memory test, going into bogus memory (Turbo XT BIOS w/ VGA)? Anyone can see what's going wrong?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 65 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just modified the BIU to be more dumb than before, fetching instructions from a starting address that's loaded when EIP is reloaded(and the PIQ is flushed), although currently Paging using that method is unsupported(as I haven't build a cycle-version of the Paging system yet). Loading EIP(through CS or JMP) causes the PIQ address to be reloaded, after which the BIU dumbly starts fetching data from that address and onwards into the PIQ(so it only knows a 32-bit address to put on the memory address lines, which is now masked by A20 in the BIU itself). The BIU now no longer does anything with the Segment Registers nor it's descriptors, only accepting physical memory addresses. The mapping from Segmented -> Linear -> Physical is handled in the split execution module itself.

The only thing left to do on the execution module to make it fully cycle-accurate on all CPUs is to make the Linear -> Physical addressing(on 80286+ CPUs) use a layer in between to translate the 32-bit address into a physical one by communicating with the BIU to retrieve Paging tables and parse them correctly. So currently, Paging is supported, although it's loading it's entries directly from RAM(without using the BIU). There's also the problem now that the (E)IP address is a linear address without translation to a physical address(No Paging is applied to it when fetching it's data from memory, due to the dumb BIU not using Paging tables itself, just using physical addresses supplied by the EU).

How does a real BIU handle these cases? Since the BIU uses physical addresses only, it cannot use wrapping(64K wrapping) nor Paging-based addresses(like EIP) directly?

Edit: Yay! The most recent changes somehow made Megarace run it's videos again! 😁

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 66 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I have a little question(maybe Reenigne or one of the guys of the demo can answer this): Does the prefetching also wrap around 64K? I know that (E)IP does, but does the prefetching on the 80(1)8X wrap around 64K? Or does it just continue into the next 64K of memory? Does the prefetcher know anything about IP? Is it even able to detect wrapping around 64K? Does it operate on a copy of CS&IP directly, or does it just generate a 20-bit address to use when loading CS or IP(through a jump) and simply increase that 20-bit address until it overflows at 1MB? So:
Prefetch(CS/IP loaded): A000:FFFF(Physical address AFFFF)
Prefetch: Physical address B0000
Prefetch: Physical address B0001
etc.

Does the prefetcher work with a segmented address(Seg:Offs, as A000:FFFF, A000:0000, A000:0001 etc.) or with a linear address(AFFFF, B0000, B0001 etc.)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 67 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just found a bug in the 80286+ fault handling: when the fault was pushing it's return information etc. on the stack, the memory handler was refusing the memory writes(simply ignoring it) because it saw that a fault had been raised. Removing this makes it properly write it's error information on the stack, calling the interrupt vector and safely return on a 80286+ CPU. This might have also affected the 8086+ emulation.

Edit: Nope, the 8086 is still crashing the Turbo XT BIOS for some unknown reason. Jazz Jackrabbit setup from MS-DOS 5.0a still crashes with an #UD instruction(FFFF instruction).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 68 of 142, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
I have a little question(maybe Reenigne or one of the guys of the demo can answer this): Does the prefetching also wrap around 6 […]
Show full quote

I have a little question(maybe Reenigne or one of the guys of the demo can answer this): Does the prefetching also wrap around 64K? I know that (E)IP does, but does the prefetching on the 80(1)8X wrap around 64K? Or does it just continue into the next 64K of memory? Does the prefetcher know anything about IP? Is it even able to detect wrapping around 64K? Does it operate on a copy of CS&IP directly, or does it just generate a 20-bit address to use when loading CS or IP(through a jump) and simply increase that 20-bit address until it overflows at 1MB? So:
Prefetch(CS/IP loaded): A000:FFFF(Physical address AFFFF)
Prefetch: Physical address B0000
Prefetch: Physical address B0001
etc.

Does the prefetcher work with a segmented address(Seg:Offs, as A000:FFFF, A000:0000, A000:0001 etc.) or with a linear address(AFFFF, B0000, B0001 etc.)?

I do not know but I would assume the prefetcher would not behave differently than the CPU would. So I would expect it would use somewhat CS:IP and build a 20bit address from that. Unless the prefetch buffer is address tagged (which it could be) then it must behave correctly when an instruction straddles the 64k segment end.

In CAPE I designed that by having the prefetcher maintain its own "prefetch-IP" and use that with the real CS for each time it is allowed to issue a memory read to prefetch. Then increment it. When the real IP changes (JMP, CALL, INT, etc) then the prefetcher is told to update his own prefetch-IP.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 69 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've modified the CPU's prefetch Unit to fetch instruction bytes into the prefetch in two ways:
On 80(1)8X CPUs: Use the base offset of CS combined with the PIQ EIP(actually a copy of EIP which is loaded each time the PIQ is flushed), where the EIP value that's buffered is incremented and passed through the usual segmentation system to formulate a physical address(or logical address, but that's the same on these CPUs).
On 80286+ CPUs: The PIQ EIP address variable is loaded with the EIP address already passed through the Segment translation(when the PIQ is flushed), thus forming a logical address. This logical address is incremented each time a byte is fetched from memory into the PIQ. The logical address is translated to a physical address through the Paging Unit to access the memory. When the Paging Unit doesn't find the address in the TLB(and Paging is enabled), instead of faulting itself it just won't read the data from RAM into the PIQ(protecting against reads past the mapped memory, just like a real CPU is supposed to do). The code that reads from the PIQ will check against the CS descriptor and will cause a Page Fault when it isn't in the TLB(so it handles the function instead of the BIU itself, which is simply waiting for the data to be loaded by the EU). The Paging protection handling in the EU reading the opcode will load the TLB from physical memory, which causes the BIU to resume prefetching(since the TLB is now loaded).

That way, the BIU won't have to deal with the whole protection ordeal and faults and just use physical RAM input/output, while the EU will handle all the faults(which can request data from RAM from the BIU to read the PDEs and PTEs(although this is currently still done by using the memory unit directly, instead of using the BIU)).

For this reason, the PIQ_EIP is renamed PIQ_Address, since it can contain either an IP value(on 80(1)8X) or linear memory address(on 80286+).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 70 of 142, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

I have a little question(maybe Reenigne or one of the guys of the demo can answer this): Does the prefetching also wrap around 64K? I know that (E)IP does, but does the prefetching on the 80(1)8X wrap around 64K?

I haven't tried it, but I'm sure it wraps. Otherwise it would prefetch (and therefore execute) the wrong byte. The only way it could avoid doing that is tagging the bytes in the prefetch queue with their location which would be a lot of extra transistors for no good reason. Therefore the prefetcher's internal "next address to fetch" register must be a 16-bit register which is combined with CS before prefetching, rather than a 20-bit register.

Reply 71 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

OK. It now uses that method on the 80(1)8X processors. The 80286+ processors will use a simple linear memory approach(since overflowing past 64K/4G will trigger a fault anyways, so that shouldn't be a problem).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 72 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

I notice something odd when modifying the interrupts with the new mechanism to be cycle-accurate and seperated from the usual EU core processing: When a 80286+ is used, the POST succeeds, but when a 8086 is used, the fourth IRQ0 that executes(when it starts checking the first block in memory, at address F000:F965) starts, but ends up at an IRET at F000:EFF7, which returns to address 0000:0000(which is invalid)? The stack pointer seems to have changed/corrupted somehow during the interrupt handling sequence(after starting the interrupt it's pointing to 0030:00F4, but executing the IRET, it's pointing to 0030:0285)? So SP is corrupted somehow during the third IRQ0?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 73 of 142, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

OK. It now uses that method on the 80(1)8X processors. The 80286+ processors will use a simple linear memory approach(since overflowing past 64K/4G will trigger a fault anyways, so that shouldn't be a problem).

I do not think that is correct. Like me and reenigne said the prefetcher cannot behave differently than a CPU without a prefetch. Think of it this way. There is only one address generating unit in the CPU. Or rather one implementation. It takes a segment and an offset. On a 286 it interpolates the segmentation logic before and on 386 it postfixes the logic->physical address after. The prefetch mechanism has to use that. In other words you have to do all this translation before every prefetch but with the prefetch's own EIP register.

The same addressing mechanism is used in 2 parts

1) For prefetching as all instruction bytes/words go through there, the CPU HAS to use those.
2) As been fed the output of the EA unit for memory access (such as when an instruction is writing/reading to/from the BIU.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 74 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

What about stuff like segment limits on a 286+? If it prefetches past that it might be invalid(like prefetching into VRAM etc.)? Is that handled the same way as with Paging faults during prefetching(e.g. the prefetcher stops prefetching)?

Edit: I've modified the prefetcher to work the same on 80(1)8X and 80286+, with it using the regular protection and paging checks as normal instructions do. The only main difference is that it doesn't raise faults itself(on faults, it will simply abort the prefetching). The EU, when reading the opcodes from the PIQ, will check against protection/paging faults before reading the current location from the PIQ. If errors occur, they're handled and the PIQ isn't loaded. If no errors occur, the PIQ is loaded into the EU's current opcode handling(which then parses and times the byte retrieved from the PIQ in a normal EU way).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 75 of 142, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

What about stuff like segment limits on a 286+? If it prefetches past that it might be invalid(like prefetching into VRAM etc.)? Is that handled the same way as with Paging faults during prefetching(e.g. the prefetcher stops prefetching)?

I think the prefetch has to be able to generate either an exception or a page fault when reads happen when the prefetch queue is empty. Unless the CPU issues a direct read in cases when the prefetch is empty. Which is also possible but unlikely.

If the PIQ is empty then it is likely the next byte/word/dword is actually needed by the CPU so proper exception/page faulting must occur if necessary. For anything else no. Couple this with the fact that during JMP/LOOP/INT, etc the prefecther is disabled early on.

Think about it: you can have a JMP at the end of the segment then the PIQ will not read past it so no errors. Now if the last instruction was say a MOV then the prefetch will try to read past it, not be able to (because out of valid memory as per segment size) eventually drain (as the CPU reads and consumes the queue). Then the CPU REALLY needs a byte to do anything so now we are in a scenario when the PIQ is empty so next read by PIQ will generate exceptions/page faults as expected.

EDIT: cleared some ideas.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 76 of 142, by superfury

User metadata
Rank l33t++
Rank
l33t++

What I mean is the following:
The EU(which parses and handles the opcodes) will check for protection/page faults on every byte that's to be read from the PIQ. That way, it can handle faults while also being able to request the BIU to fetch data(like protection-related table contents, like the GDT, IDT/IVT, LDT and Paging tables) and process the data only if no faults occur. So, essentially it's:
- EU: Fault occurs? Abort and start fault handling. Else, read byte from PIQ and parse it normally.
- BIU: Fault occurs? Abort and do a NOP. Else, read byte from memory.

That way, the BIU doesn't have to do any fault handling(which might be in the middle of an instruction when the EU is executing one). The EU will handle all faults in a cycle-accurate way.

UniPCemu's BIU fetching algorithm:

void CPU_fillPIQ() //Fill the PIQ until it's full!
{
uint_32 realaddress;
if (BIU[activeCPU].PIQ==0) return; //Not gotten a PIQ? Abort!
realaddress = BIU[activeCPU].PIQ_Address; //Next address to fetch!
if (checkMMUaccess(CPU_SEGMENT_CS,CPU[activeCPU].registers->CS,realaddress,0x10|3,getCPL(),0,0)) return; //Abort on fault!
realaddress = MMU_realaddr(CPU_SEGMENT_CS,CPU[activeCPU].registers->CS,realaddress,0,1); //Generate actual address directly!
if (is_paging()) //Are we paging?
{
realaddress = mappage(realaddress,0,getCPL()); //Map it using the paging mechanism!
}
writefifobuffer(BIU[activeCPU].PIQ, BIU_directrb(realaddress,0)); //Add the next byte from memory into the buffer!
++BIU[activeCPU].PIQ_Address; //Increase the address to the next location!
//Next data! Take 4 cycles on 8088, 2 on 8086 when loading words/4 on 8086 when loading a single byte.
}

EU fetching algorithm:

byte CPU_readOP(byte *result) //Reads the operation (byte) at CS:EIP
{
uint_32 instructionEIP = CPU[activeCPU].registers->EIP; //Our current instruction position is increased always!
if (CPU[activeCPU].resetPending) return 1; //Disable all instruction fetching when we're resetting!
if (BIU[activeCPU].PIQ) //PIQ present?
{
PIQ_retry: //Retry after refilling PIQ!
//if ((CPU[activeCPU].prefetchclock&(((EMULATED_CPU<=CPU_NECV30)<<1)|1))!=((EMULATED_CPU<=CPU_NECV30)<<1)) return 1; //Stall when not T3(80(1)8X) or T0(286+).
//Execution can start on any cycle!
//Protection checks have priority over reading the PIQ! The prefetching stops when errors occur when prefetching, we handle the prefetch error when reading the opcode from the BIU, which has to happen before the BIU is retrieved!
if (checkMMUaccess(CPU_SEGMENT_CS, CPU[activeCPU].registers->CS, instructionEIP,3,getCPL(),!CODE_SEGMENT_DESCRIPTOR_D_BIT(),0)) //Error accessing memory?
{
return 1; //Abort on fault!
}
if (readfifobuffer(BIU[activeCPU].PIQ,result)) //Read from PIQ?
{
if (cpudebugger) //We're an OPcode retrieval and debugging?
{
MMU_addOP(*result); //Add to the opcode cache!
}
++CPU[activeCPU].registers->EIP; //Increase EIP to give the correct point to use!
++CPU[activeCPU].cycles_Prefetch; //Fetching from prefetch takes 1 cycle!
return 0; //Give the prefetched data!
}
//Not enough data in the PIQ? Refill for the next data!
return 1; //Wait for the PIQ to have new data! Don't change EIP(this is still the same)!
CPU_fillPIQ(); //Fill instruction cache with next data!
goto PIQ_retry; //Read again!
}
if (checkMMUaccess(CPU_SEGMENT_CS, CPU[activeCPU].registers->CS, instructionEIP,3,getCPL(),!CODE_SEGMENT_DESCRIPTOR_D_BIT(),0)) //Error accessing memory?
{
return 1; //Abort on fault!
}
*result = MMU_rb(CPU_SEGMENT_CS, CPU[activeCPU].registers->CS, instructionEIP, 3,!CODE_SEGMENT_DESCRIPTOR_D_BIT()); //Read OPcode directly from memory!
if (cpudebugger) //We're an OPcode retrieval and debugging?
{
MMU_addOP(*result); //Add to the opcode cache!
}
++CPU[activeCPU].registers->EIP; //Increase EIP, since we don't have to worrt about the prefetch!
++CPU[activeCPU].cycles_Prefetch; //Fetching from prefetch takes 1 cycle!
return 0; //Give the result!
}

Also, yes: that last part is a simple fallback for when prefetch is disabled(or no buffer is allocated), which reads directly from memory(past goto PIT_retry). Also, some remnants of the old always-buffered method is still there.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 77 of 142, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:
What I mean is the following: The EU(which parses and handles the opcodes) will check for protection/page faults on every byte t […]
Show full quote

What I mean is the following:
The EU(which parses and handles the opcodes) will check for protection/page faults on every byte that's to be read from the PIQ. That way, it can handle faults while also being able to request the BIU to fetch data(like protection-related table contents, like the GDT, IDT/IVT, LDT and Paging tables) and process the data only if no faults occur. So, essentially it's:
- EU: Fault occurs? Abort and start fault handling. Else, read byte from PIQ and parse it normally.
- BIU: Fault occurs? Abort and do a NOP. Else, read byte from memory.

Right but how do you deal with this:

You are at the segment end. The next read would cause an exception/page fault. The PIQ is empty as it tried to read earlier but it could not for obvious reason. Now the CPU really needs a byte. You cannot have the EU validate the PIQ as there is nothing in it. Instead you let the PIQ issue a read when empty and this read can cause exception/page faults.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 78 of 142, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Your solution will work. As the EU would need a new byte so it first checks if the CS:IP is good. And in this case it will not be, regardless of what is in the PIQ.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 79 of 142, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie

I don't know if this helps, but the last time I tried this, it was on some Athlon/Duron type of CPU long time ago, but here goes:

Using plain real mode, I filled a whole 64k segment (65536 bytes) with NOPs, and wrote a RETF into some low 32k address, and then I made a FAR CALL into some of the high 32k address.

Apparently the code runs to end of 64k segment, wraps around to low address, and eventually encounters the RETF and code executes as usual.

As the opcodes were plain NOPs, nothing ever tried to for example access 0xFFFF to read a word or anything, I believe that should have invoked an exception.