Managed to fix some issues with the 32-bit RGB modes of the SC11487 (undocumented case from WhatVGA's mode 3a without pixel repack register bit 0 set, which causes a 32-bit rising-edge only (documented as both edges for pixel repack bit being set only, other mode now implemented is undocumented behaviour?) RGBA mode, based on WhatVGA's results (the top byte might be ignored or might not be ignored. The software leaves it 0 always)).
Also repack mode 2(24-bit mode)'s BGR mode is enabled again (it was being applied but not precalculated from the register values, thus effectively forced to 0 in the precalcs).
Edit: Also, the CRTC/Sprite start address above 2MB is now fixed by implementing the missing 22th address bit to the register precalcs.
Just added support for touch inputs to act like the mouse inputs when performing light pen inputs. This is now archieved by touching the middle mouse button area on the touch screen, then touching the right mouse button area to enable the custom input. After that, any touch will be registered as the location of the light pen to hold it at (as long as the other two areas are still touched. the location is registered and changed by touching with a third touch. A touch after the third touch will register as the light pen button instead. So you use two touches to enable the mode, then one touch for the location and another touch or re-touch (for both the location and button)). So that amounts to basically two touches on specific locations to enable the mode, and two linked (if two touches are used) for the button(two finger) or location(one finger).
Afaik this is the first emulator supporting light pen inputs on a touch screen now! 🤣
After lots of bug fixing, the new touch based input and existing mouse inputs now properly map to the light pen inputs.
I've also added a simple indicator (L for the light pen inputs being activated (using the first two inputs out of the four touch inputs to activate the mode, the third pointer not being shown, the fourth being shown using another LED display) and when it's showing the L it will show a P to the left of it when pressing the button on the light pen). The location will keep tracking into the emulation(updating location) as long as the light pen input is enabled and the third touch is (re)pressed. The release of the mode (middle mouse button area and right mouse button inputs not being pressed at the same time) will remove the light pen from the screen. The button is unaffected by this and will only be released when the left mouse button(using a mouse) is released or fourth touch is released from the screen.
I've also modified the VGA to not detect the light pen inputs when it's set to a location outside of active display (since the address is incorrect when latched that way (it could be at the right or left part of the screen, while being detected as being at the end of the last scanline, which of course is incorrect). So the overscan area won't detect the light pen input and will ignore it.
I've made a little bugfix to the way the i430fx/i440fx motherboards handle memory access in the 640K-1MB address space. They now should be properly not responding with memory when it shouldn't (when disabled using the chipset registers).
I've also been busy on making the Android Studio project a bit more simplified, removing many of the used files from the source control entirely. That should help with the constantly changing files that are specific to the user itself.
Most of those files will immediately be recreated when the IDE starts anyways.
Edit: Also, the new events on SDL 2.0.2 and up (the new audio and video events) are now supported and used. So when a new audio device (dis)connects, the app will automatically switch to the new source and back to the default when disconnecting. The same applies to recording sources, but it will go silent instead (as there is no default known to connect back into) until a new recording source is connected.
The new video refresh events are also handled, causing a recreation of textures (by recreating the entire SDL window and related textures etc.) or causing the display inside the window to be updated with actual pixels (for the other event).
Just added a command line parameter "fullscreenwindow" to the app. It will, when using the windowed mode(the initial mode), stretch the window using SDL2 to full desktop size (minus the task bar on Windows). So that should make it more usable on large or high resolution displays instead of becoming a much too small window (and requiring fullscreen to be usable otherwise).
Managed to implement a bit of extra support for high-DPI monitors. When the DPI reported by SDL is past 96 DPI, the rendering of the text surfaces on said monitor will cause UniPCemu to change the rendering of the text surfaces (all the text UniPCemu displays that's not from the emulated video card itself) to use roughly the same mode as used on Android and static screen devices (like the PSP). The PSP doesn't need any such rendering (it's enough to use 1:1 rendering), but Android (unchanged behaviour here) and now such high DPI monitors as well will use the same method for rendering the text surfaces on the display. So the display on the high DPI monitors will now actually cause UniPCemu to stretch the text display to fit the entire window, improving readability on said monitors (because otherwise the text would become too small).
Just adjusted the ISA DMA accesses to the UMB on the i430fx/i440fx motherboard emulation. This will float the bus when any UMB segment other than the E-segment is addressed(so for address ranged A0000-DFFFF and F0000-FFFFF).
I've just adjusted the Sound Blaster (including it's 1.5 version and the DSP 2.01 version) to use a partial decoding on the low 4 bits. It will now decode the upper 3 bits (as in the Sound Blaster clone (see my other thread Unknown port in/out to Sound Blaster ports on Windows 98?) of said 4 bits, using the upper 3 bits to decode the chip used (and DSP chip addressing using no more), while the other chips in the 22xh range using bit 0 as well(in this case only applied to the OPL2 chip, while the SAA-1099 chips are decoded directly instead(as do the OPL2 388/389 chip I/O range) with full 16-bits decoding(they might not really on the actual chipset though).
So this only applies to the 22xh range above 224h(so the range of 224h-22fh). This also means that the DSP chip aliases to their odd addresses(for example 22E aliasing to 22F).
Just fixed a bug in the UART, where if the device that can't receive data or no device being plugged in, while the UART is in loopback mode, allowing the loopback to properly send and receive data to the UART itself instead of not sending any data.
Just found something big that's now optimized away with my latest commits: the sound buffer rendering and pre-filtering algorithms.
When it was having sound channels that are in the 'new' state (meaning that they don't have any samples to parse, being disabled in their handling (effectively rendering silence only), it was still processing all silenced audio that's expected to be returned by the sound handler. So that includes: retrieving the sample, converting it to a standard floating point to parse(different kinds of samples are supported, like floating point, signed, unsigned etc.), applying volume to it, low-pass filtering it, optionally perform a high-pass filter as well, limiting the sample range to be within expected range, writing it to the filtered samples.
But since the filtered samples aren't used for channels when the audio renderer reports it has no audio to play, all those heavy processing steps can be ignored and not performed at all!
That pretty much reduces the entire audio rendering from a ~20% CPU usage to only ~2% when not rendering any MIDI channels (which are some 24 channels of audio being ignored now)!
Also made 2 other changes in the emulator (in source control, for the next release):
- The IRQ for the Sound Blasters that are emulated changed to IRQ 7 (seems to fix Windows detection somehow. It doesn't like to detect it on IRQ 5 it seems?).
- Better Windows modem dialing an empty phone number number (performing "ATDT;") to report "OK" and essentially NOP.
- Improved speed more by optimizing DMA transfer clocks that don't start any transfer when ticked multiple times to only parse the first clock, although ticking the remainder as well (first tick checks, remaining ticks NOP because it would do exactly the same).
- Saved some memory by shrinking the memory allocation registration from 256 byte text to only 18 byte text for each pointer, saving about 2MB (rounded) because of the wasted text space. Also changed various structures using an allocation name(or label) for their memory blocks to fit using shorter labels (only a few pointers used this, mainly audio precalcs, text surfaces and it's precalcs, GPU display memory and the main SDL wrappers for the display surfaces). And ~2MB is quite a lot of saving on small memory devices (like the normal PSP-1000, which only has about 20MB in total for the app to use).
I've changed a bit in my latest commits:
- Disk image readonly setting doesn't get cleared anymore when swapping disk images (unless unmounted completely by pressing triangle to remove the disk image mounted).
- Improved MIDI active sense handling with locks.
- Optimized directory listing for file lists.
- Improved debugger with virtual memory viewer and renamed the old memory viewer to physical memory viewer. Virtual memory can be viewed as either kernel privilege or debugged instruction privilege.
- Fixed EIP displaying in the on-screen debugger to not show part of the lower word on the next row (showing 32-bits properly when required, otherwise 16-bits) for both current and previous EIP addresses.
- Fixed debugger thread properly shutting down when the application is terminated.
- Properly stop logging when the debugger is active (not triggering memory access logs when the debugger is reading memory).
- Reduced default CPU IPS clocking speed to 3 MIPS for 80386 and up.
- Optimized TSC/APIC timing.
- Optimized directory listing to stop scanning the folder when it can't store any more items (list is full). Also don't check for dynamic/static disk images when not listing disk images.
- Don't reload the music file list when returning from a played song (only reload said list when choosing the option from the sound options menu).
- Generalized PSP code fixes in various modules to be done in the common emulator framework.
Also, the Windows builds using MinGW now properly compile with new compiler flags, not needing the use of the window fix program anymore to do it manually.
The PSP builds also have been fixed, now no longer crashing when terminating the application (this was caused by the SDLmain main() handler conflicting with UniPCemu's exit thread handling (which implements a proper timeout and lets the main thread close itself instead of SDL's direct call to sceKernelExitGame without cleaning up, which was causing crashes (due to missing memory allocations being half-deallocated and multithreading race conditions) or premature termination of the application)).
I've just changed the memory viewers in the debugger to use the single square button to start the memory viewer.
It will first ask for the address to view.
After that it will ask for the mode (which is new).
There are four options at this point:
- Triangle: View physical memory
- Square: View kernel privilege virtual memory
- Cross: View debugged instruction virtual memory
- Circle: Cancel and return to the debugged instruction.
Any of these (except circle) will open the memory viewer at the selected address in the chosen mode. Triangle is the old mode that was used before.
All memory is read directly. So effectively all ROM and memory mapped I/O is disabled for this viewer (this is to not disturb any state on said memory mapped I/O devices). Areas that the RAM won't respond to or are unmapped(including unmapped Virtual memory) will display accordingly (bytes displaying all ones and grayed out. It will also make the cursor become dark orange(for selected addresses) instead of green for such unmapped memory).
Just divided the CPU settings menu up a bit.
It will now have all debugger settings in a seperate submenu in the CPU Settings menu.
I also made a few additions to the settings in the settings menu:
- The Analog minimum range is now configurable during runtime instead of just through the settings file.
- The Modem listen port is now configurable during runtime instead of just through the settings file (it does need a reboot to apply the port though, since it requires a re-intialization of the serial modem emulation to apply).
- The modem settings won't be visible in the settings menu anymore without the modem being properly emulated (this counts for the listen port, null modem setting and passthough options).
Gaming mode has been improved as well. It now supports pressing the face buttons to select one of five (no face button being pressed being the default (old) option) gaming mode mappings. Each of said mappings can either have it's input mapped through it's specified keys (the usual gaming mode) or through the emulated joystick). Whether to input to the emulated joystick or not through the selected gaming mode depends on a seperate setting for each gaming mode face button now.
So you can now, for example, map keys to one of the gaming mode face button modes, while the others or only some of them map to the joystick instead.
So this is now possible:
- Default mode: joystick mode
- Triangle mode: Directional mapped to Directional keys, analog stick to mouse, two buttons mapped to mouse buttons and the two remaining buttons mapped to some chosen keys.
- Square mode: mapped to joystick as well.
Then you can just input to a full joystick (that's setup for all the joystick modes) using the default mode in the above example, while the joystick input is disabled and a normal gaming mode exists for the triangle mode. And the square gaming mode maps to joystick as well. Or you can simply unmap inputs during a mode entirely by unmapping all inputs on said mode and disabling the joystick for said mode as well (becoming a no-input mode). Although it can only return to the normal keyboard/mouse/direct input mode and not input anything to the emulated machine.
I've just modified the ROM, RAM and BIU cache for larger memory reads to be able to cache a whole 64 bits of data in preparation for the memory to be read (still depends on alignment though, if it's split up or not).
So when the BIU or EU tries to read a memory address, said address and up to 8 bytes after that will be cached into the CPU.
Thus everything of 8 bytes or smaller will only perform 1 memory read.
Looking at https://www.strchr.com/x86_machine_code_statistics , most instruction inside x86 fall in that category. So for sequential instructions (without PIQ flushing), almost all x86 instructions can be fetched from RAM/ROM in 1 or sometimes 2 reads (if it crosses a 8-byte area), provided they are aligned to qword addresses.
So if a 8-byte data structure is read from RAM, it's read in one go. But if it's unaligned to 8 bytes and the next instruction starts within the next 8-byte block, the read function will be called 2 times (which should relatively not happen much at all).
It also means that segment descriptors are read from memory in one go usually, unless they aren't qword-aligned.
Since instructions lengths can go up to 15 bytes, that means it will do only 2 reads from memory instead of the previous 4 when flushing the PIQ (4x4 bytes read=16 bytes, which now is 2x8 bytes read=16 bytes, thus faster). That pretty much cuts things like loop timing in half (2 reads instead of 4 reads every instruction), like with a "LOOP $-2" instruction.
Of course, when using a PIQ with less than 8 bytes available (pretty much anything 286 or lower, which have a PIQ 6 bytes or less), that will make all instructions only last 1 fetch each instruction, never more.
Hmmm... The i440fx running Windows 9x(95 and 98) acts weird when trying to reboot it seems(after updating drivers for PCI devices(HDD controllers))? It somehow gets the CPU into a permanently reset state(CLI HLT combination it seems)?
Edit: I also see Windows 95 doing weird stuff when shutting down for rebooting. It's executing a LODSB instruction with the trap flag set and the stack is set to segment F000 in real mode! So it enters the single step handler, which IRETs to a contant from the BIOS ROM! That's definitely not supposed to happen?
Edit: Perhaps slightly related: does Windows 95/98 properly handle swapping a i430fx motherboard for a i440fx motherboard without issues? Or is that some weird case where the hardware needs to be removed first (due to conflicts with the old motherboard drivers)?
Edit: It might be a weird issue that's introduced with the new method of reading 128-bit RAM blocks somehow? Perhaps a 128-bit shifting issue or other memory-related issue? Trying again with the last release build...
Edit: The last release build seems to run fine (reboots and boots properly without issues, including the i430fx to i440fx motherboard update). So something with the improved memory caching is causing issues somehow. Or perhaps the slight paging unit optimization made (that causes it to not check linear page boundaries when not checking for paging (which usually is for when performing segmentation checks only (which have priority over paging))).
Last edited by superfury on 2021-10-03, 23:35. Edited 2 times in total.
Hmmm... My Minix 3.3.0's compiled CD-ROM has different results with the latest bugfixes commits of UniPCemu:
1vm(8): panic: vm: boot process load of process rs (ep=2) failed
Then it follows with a stack trace etc.
I'm not sure if this is related to the issue you just mentioned, but it might be (seems RAM-related).
I've now revisited this Minix3 issue you noticed a while back (the Minix3 version I'm using is git-4db99f401, this is from 2018).
While this is not a solution, I got it narrowed down somewhat:
The reason for the boot failure is that sys_physcopy() returns a buffer that is filled with zeroes, instead of the ELF module that is expected.
The modules are loaded into memory just fine, but this copying is not performed correctly for some reason.
The code in question is in Minix3 source file src/minix/servers/vm/main.c, function exec_bootproc().
Note that the first call to sys_physcopy() always succeeds, but the next one will fail (because of something done in handle_memory_once()).
I managed to track this further to a single syscall SYS_MEMSET, called from alloc_pages().
The code portion of interest is in src/minix/kernel/arch/i386/memory.c, in the function vm_memset().
The value of the "who" variable identifies the caller, on these erraneous calls it's always 31743 or 0x7bff. Idk which process this value refers to (for the first two kmodules it's 6 or 2?), but it seems to be correct (it's the same in qemu).
So, the bottom line: UniPCemu does *something* differently in that while loop in vm_memset() than does, qemu for example.
Haven't been able to pinpoint what exactly that something is, though.
EDIT: Another possibility is that the problem resides somewhere within sys_physcopy().
EDIT2: OK I now have a strong candidate for this: reload_cr3(). It's a very short assembly code that just flushes cr3, see here (the link points to an older version, but it's the same).
I did the checking by doing a read to address 0xf08df000 (=start of ds/mod01) both before, and after that call and then compared with qemu.
Last edited by mr.cat on 2021-10-05, 18:30. Edited 2 times in total.
Slightly related to the last blocks of commits I've made since the last release of the app on itch.io:
- I've moved the commits for the new memory caching (with 128-bit memory caches in both MMU and BIU) to it's own branch (memory-speedup) for now.
- I've moved the new changes for the O(1) optimization for the Paging Unit to it's own branch as well (paging-speedup).
Those are two new things I'm working at right now. Although there also have been some general bugfixes on the default branch (a Paging 32-bit check optimization), which also applies to the current paging-speedup branch (since it requires it).
So on the memory-speedup branch, the old commits from the default branch are now moved there (since it isn't quite working properly for some reason, needing more work to fix. I notice some issues when trying to reboot Windows 98 while those changes are present).
And I've started working on trying to get the Paging Unit much faster by removing loops as much as possible (only two of those loops still remain in the common running code: opposite size removing (as this can be multiple 4KB entries for a single 2/4MB entry) and the cache invalidation (for INVLPG), which also needs to examine various different entries in all TLB caches, not just one entry, thus can't be optimized that easily). Those changes so far crash Windows even sooner, after just a few cache writes (writeTLB) it seems? It should theoretically work properly, but somehow it isn't yet?
Just probably managed to get at least the O(1) optimization for the Paging Unit to work. Still running a final check for the optimized clearing of the entire TLB index structure (1MB+2K data) when performing full CPU cache flushes (which are done according to CPU documentation, like toggling paging on and off and the like).
Edit: confirmed working properly. That's one done, one to go(the memory read optimization and split prefetch cache).
I've placed the new optimized paging unit back on the default branch (merging the branch) and removed the now finished TLB optimization O(1) branch.
Although the downside of the optimization is that it adds 1026KB RAM usage for each emulated CPU for this, so for the two CPUs 2052KB in total is used for the required tables for bookkeeping of the TLB. That will lower the PSP version of the app to have that much less available(it already only had 12MB available with all hardware and MIDI emulation(using the 1MB AweROMGM soundfont), so now only about 10MB is available for that build to use as emulated RAM, at least on a PSP-1000).
Edit: The paging unit optimization is running fine now.
Now only the new memory optimization needs to be fixed. It might have something to do with the 128/256-bit shifts being performed. Still checking that one.
Hmmm... Disabling the memory cache entirely also makes Windows 9x crash in exactly the same way as before (with the cache enabled).
So the issue is actually not in the cache itself, but probably in one of the other affected areas, probably the BIU itself?
Or perhaps something else that's changed? Hmmm...
Edit: Hmmm... Reverting the BIU to no longer split the memory accesses for prefetching (always using the same cache instead of double caching) and disabling the caching of multiple bytes at once seem to have no effect. It will still ned up at the invalid instruction with trap and sign flags being set (which isn't supposed to happen)...
So the search for the error continues...
Edit: OK. So the cause seems to have been the physical memory block repeated read optimization, which remembers if the last access to a physical memory is to skip the MMIO devices mapping for the most recently used 4K block of memory.
When disabling said optimization, Windows 98 at least seems to reboot correctly?