OK. Added a testsuite for checking the behaviour of string instructions (zero flag, with and without prefix, CX and memory results).
It's basically a more complete version of the checks done by test386.asm (although written myself) and able to compile as both a .COM MS-DOS program and as a BIOS ROM.
In dosbox 0.74-3 it gives the expected result ("No errors with string instructions.") and at least tries to terminate the app cleanly (using INT 21h AH=4C function call, followed by INT 20h and a simple RET). The caller(MS-DOS) FLAGS register is also pushed and restored to/from the stack on program entry and before said MS-DOS exit function call.
But for some reason, Dosbox becomes unresponsive? I don't know if the issue with Dosbox itself or my app though.
Now I'd just need to run it inside my own emulator. One nice thing is that if ran as a BIOS ROM (when compiled with the flag enabled) it will post it's results on the POST diagnostics port (default 80h, configurable) instead, when 'terminating the app' by hanging the CPU (CLI followed by HLT, just like other test ROMs do).
Oddly enough, when changing the setting to use 80386 opcodes (and adding a 32-bit version of the same tests, with 32-bit instructions on 16-bit address size), Dosbox doesn't give any results anymore somehow, perhaps crashing for some unknown reason? The code should still be functional afaik? Unfortunately with Dosbox, you can't inspect the POST card status, if it even has one, can you?
Perhaps an issue with compiling for 32-bit somehow? Or specifically the 32-bit operand size versions of the opcodes and data?
The code can be found at UniPCemu's repository within the UniPCemu/assembly subfolder, under the name teststring.asm (can be compiled with nasm using the supplied makefile, compiling to the same folder as the UniPCemu executable relative path (within ../../projects_build/UniPCemu)).
Edit: I think I found the issue. The startup routine creates a data segment out of a part of the payload of the executable, then takes the start of it, rounding the segment selector to use up towards it's first 16-byte alignment (to keep the data inside aligned as much as possible, which is why the segment is made larger by 32 bytes just to make sure there's enough room for all the data.
The program saves the stack pointer of the caller, together with the DS and ES segments (the code segment doesn't change) and then replaces the data segments with it's own (except the stack segment, which is stored and restored after execution, but otherwise used as before). The data and extra segments however are overwritten to point to the aligned segment address inside it's data block that's directly after the executable code (after the text constants).
Having fixed the global routine to perform it's job properly, using the data segment that's required instead of the segment that's incorrectly located at 0000:0000 in memory (where the IVT and a whole bunch of things like MS-DOS itself lives! 😖 ), it now properly writes and reads the data from the blocks it creates within said memory area (at offset 0, 1000h and 2000h if needed).
The program properly finishes executing on the Dosbox emulators without any visible issues, so it seems to be operating properly.
Of course the BIOS version strips away much of the DOS functionality, replacing it with simply I/O to the diagnostic port (configured in the configuration section, much like how test386.asm does it, except not in a seperate file).
It also has an option to add 80386 support, which really only turns the instruction set to a 386-compatible one (instead of 8086-compatible) and adds the 32-bit operand size instruction tests that work in the same way as the 16-bit and 8-bit ones (those should be theoretically unmodified from the 8086 version, unless the compiler does weird things with it (I know it converts some conditional jumps to the error handler to a "jump-if-to-not-take; jump to error handler; label to not an error" (which takes up no space, being the next instruction of the testsuite of course), which is factually just a 2-instruction combination it somehow creates itself (it's just a conditional jump inside the source code, but not in the executable and assembly listing (if you look at the instruction bytes that is, with knowledge of how x86 instructions encode you can clearly see a conditional jump to past the jump to the error handler if you do the math (the conditional jump immediate byte added to the unconditional jump instruction address that follows after it))).
So I now have a pretty good testsuite I think (although it's roughly the same as test386.asm, just checking more cases with REP and without REP and the zero flag operating correctly in multiple cases (zero flags (being set, cleared or left alone all being tested), repe/ repne/rep all operating correctly for every instruction it applies to)).
This would probably help me figure out what happens to the string instructions inside UniPCemu, if there's anything going wrong with them at all in the first case (and the issue isn't something entirely different).