VOGONS


Getting UniPCemu up and running

Topic actions

Reply 60 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
mr.cat wrote on 2021-07-30, 14:22:
Commit 65ffa33f seems fine also (similar behavior to e5c540ef). About that i8042 "hang" this code clip from Tilck's modules/kb80 […]
Show full quote

Commit 65ffa33f seems fine also (similar behavior to e5c540ef).
About that i8042 "hang" this code clip from Tilck's modules/kb8042/generic_x86/i8042.c may be illuminating:

static NO_INLINE void i8042_io_wait(void)
{
if (in_hypervisor())
return;

delay_us(1);
}

So no, I wouldn't say it's that weird 😁

That delay is way too short? Although when sending or receiving, it's done at 16.7kHz intervals, so that's many thousands of microseconds.
Also, the bit when reading port 64h (until bit 0 is set for receiving or bit 1 cleared for sending (from the CPU perspective)) would need to be wrapped around that delay probably?
Which one of those bits needs to be tested depends on the location said function is called at.

Last edited by superfury on 2021-07-31, 06:28. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 61 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

Yeah, bottom line is that the timing is off and needs to be skipped (for qemu too).
There's an attempt at adjusting the timing by way of variable called "loops_per_us". This is then used to do some nop looping.
The value of that variable is shown during boot as "Tilck bogoMips" and it seems to be zero.
So the adjustment has failed one way or another, but perhaps it can be fixed.
In fact Vlad has very recently made some timing-related changes, so maybe I should just try with the latest git.
EDIT: Tried with the latest git and it's the same.

Last edited by mr.cat on 2021-07-31, 10:43. Edited 1 time in total.

Reply 62 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just calculated it at 16.7kHz,it's at least 60us delay(depending on the CPU clock relative to the 16.7kHz signal) that would be required at least if doing the same approach.
Otherwise, it's a combination of polling port 64h for bits 0 or 1(depending on if it's for a read(while bit 0 cleared) or write(while bit 1 set)) with 1us delays between each poll as a for loop, which is even more foolproof(and accounts for bogomips issues as well).
Even better would be adding a timeout on that loop as well after two 16.7kHz ticks(more than 120us having been looped)?
The BIOS does that as well.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 63 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

I found out the cause for the halt/freeze:
If loops_per_us is zero, that means the "nop loop" counter (ecx) flips over to its max value. So it's not really frozen, just taking it's time doing nop 😁
I changed the variable to 1 and now it boots all the way.

EDIT: This issue has now been properly fixed as of commit 2d2cf5a1 (thanks, Vlad!).

Attachments

Reply 64 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++

About the 0 bogomips... At what speed is the emulated CPU running? (Default is 3 MIPS on 386+, which is slightly faster than a 286@12MHz and Dosbox/Bochs default as well afaik)

Also, 0 bogomips is weird? Even a 8088@4.77MHz should be at least 0.019 BogoMIPS?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 65 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

The actual value was something like 0.8, it just got truncated to zero (for a longer explanation, see Tilck issue #74).
In SETTINGS.INI I have clockingmode=1 and cpuspeed=2400. Tilck currently needs P2, so cpu must be set to 7.

Reply 66 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
mr.cat wrote on 2021-08-06, 09:51:

The actual value was something like 0.8, it just got truncated to zero (for a longer explanation, see Tilck issue #74).
In SETTINGS.INI I have clockingmode=1 and cpuspeed=2400. Tilck currently needs P2, so cpu must be set to 7.

Looking again at the explanation of how it's calculated, the same issue would happen on the other emulators. 1/100000th of a second(which would be 10us) would only be enough to execute about 300 instructions at a speed of 3 MIPS. So using such a base for microsecond delays(only 3 instructions are executed each microsecond) would be way too small a delay to be usable in that way(0nly returning from kernel to user mode probably already exceeds that?).
Or is the timer supposed to be set to 0.1 second instead? That would loop around 300000 NOPs, if given the chance.
Logically, using the setup, loops_per_us should end up at 3 when at 3 MIPS in IPS clocking mode(2.4 in your case for 2400KIPS).
Unless there's a calculation error?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 67 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

Yes the bug could happen on real hardware too, provided it's slow enough.
Here's some of the variable values. These were gathered running the patched version of Tilck (probably doesn't matter whether patched or not, at least the __bogo_loops is the same as before the patch).

TIMER_HZ is 100
__tick_duration 9999312
__bogo_loops 8
loops_per_tick = 8000

So if I understood this correctly, one tick is approximately 0.01s, and the bogomips timing test goes through 8000 loops in that time frame.
The code for the measurement is in kernel/timer.c, and the nop loop itself can be found in kernel/arch/i386/misc.S.

EDIT: Hmm it seems I've chosen the cpuspeed value 2400 simply because it's the one that works. I now tried with 2700 and Bochs BIOS fails to boot with an error message:
FATAL: ata-detect: Failed to detect ATAPI device
With 2600 it boots and bogoMips goes up to a whopping 0.9...
EDIT2: It's some kind of timing issue. But I did manage to boot with some higher cpuspeed values. For instance, with cpuspeed=15000 it does boot, with bogoMips value of 5.
The boot doesn't "feel" any snappier though (probably because my host is too slow).
EDIT3: This issue seems to be affected by clockingmode. With clockingmode=0, higher values can be used.

Last edited by mr.cat on 2021-11-22, 11:23. Edited 2 times in total.

Reply 68 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
mr.cat wrote on 2021-08-06, 16:55:
Yes the bug could happen on real hardware too, provided it's slow enough. Here's some of the variable values. These were gathere […]
Show full quote

Yes the bug could happen on real hardware too, provided it's slow enough.
Here's some of the variable values. These were gathered running the patched version of Tilck (probably doesn't matter whether patched or not, at least the __bogo_loops is the same as before the patch).

TIMER_HZ is 100
__tick_duration 9999312
__bogo_loops 8
loops_per_tick = 8000

So if I understood this correctly, one tick is approximately 0.01s, and the bogomips timing test goes through 8000 loops in that time frame.
The code for the measurement is in kernel/timer.c, and the nop loop itself can be found in kernel/arch/i386/misc.S.

EDIT: Hmm it seems I've chosen the cpuspeed value 2400 simply because it's the one that works. I now tried with 2700 and Bochs BIOS fails to boot with an error message:
FATAL: ata-detect: Failed to detect ATAPI device
With 2600 it boots and bogoMips goes up to a whopping 0.9...

Just calculated it at a speed of 2600000IPS(2600KIPS). It matches the instruction timing in IPS clocking mode. So indeed 8000 loops are executed each 0.01 second. So it should be able to (if used correctly) provide a delay of down to 1us with that, provided the delay is correctly translated to the used BogoMIPS (that being one loop = 1.15us at 2600KIPS CPU clock speed).
Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Edit: Also, it's weird that the CD-ROM/ATAPI driver of the BIOS would start to fail when getting faster? Would that be a CPU bug or something related to Bochs itself?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 69 of 81, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
superfury wrote on 2021-08-06, 20:48:

Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Hi 😀
I'm the Tilck guy.

So, with the newest patch the kernel does not require one loop per microsecond in order to work. Just, in case the vCPU is slower than that, and delay = 1us, no iteration will be performed. If the delay is bigger enough, some loop iterations will be performed instead. About your second question, MIPS in my case does not strictly mean "millions of instructions per second". As you can see, one loop iteration requires more than one instruction. So here MIPS means "millions of iterations per second". I know it might sound confusing but as you know physical CPUs take a different amount of time for each instruction depending also on the context, so talking about #instructions/second does not really make sense. That's why "bogoMips" is a "bogus" way to measure time. It's defined simply by a reference loop, no matter how it's implemented and then we count the iterations of that specific loop both when calculating "bogoMIPS" and when trying to busy-wait for very little time. As long as the loop code is the same, the time it will take to complete will be roughly what we expect it to be.

BTW, for reference, I've noticed that Tilck's bogoMIPS =~ Linux's bogoMIPS / 2. That's simply because on Linux, the loop is implemented in a different way.

I hope I've helped a little clarifying this.
Vlad

Reply 70 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
vvaltchev wrote on 2021-08-07, 01:21:
Hi :-) I'm the Tilck guy. […]
Show full quote
superfury wrote on 2021-08-06, 20:48:

Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Hi 😀
I'm the Tilck guy.

So, with the newest patch the kernel does not require one loop per microsecond in order to work. Just, in case the vCPU is slower than that, and delay = 1us, no iteration will be performed. If the delay is bigger enough, some loop iterations will be performed instead. About your second question, MIPS in my case does not strictly mean "millions of instructions per second". As you can see, one loop iteration requires more than one instruction. So here MIPS means "millions of iterations per second". I know it might sound confusing but as you know physical CPUs take a different amount of time for each instruction depending also on the context, so talking about #instructions/second does not really make sense. That's why "bogoMips" is a "bogus" way to measure time. It's defined simply by a reference loop, no matter how it's implemented and then we count the iterations of that specific loop both when calculating "bogoMIPS" and when trying to busy-wait for very little time. As long as the loop code is the same, the time it will take to complete will be roughly what we expect it to be.

BTW, for reference, I've noticed that Tilck's bogoMIPS =~ Linux's bogoMIPS / 2. That's simply because on Linux, the loop is implemented in a different way.

I hope I've helped a little clarifying this.
Vlad

As with physical CPU instruction timing, UniPCemu supports both modes. It can either run in IPS clocking mode(each instruction taking the same amount of time) or cycle-accurate mode(each instruction taking a specific amount of clocks depending in the instruction and BIU phase relative to the other hardware's clocking). Although LOCK still can delay the CPU(making the instruction multiple IPS clocks instead) if another device ( such as DMA, video card or other CPU)has control of the BUS. The CPU will effectively do NOPs until it can get control of the bus. Also, currently LOCK locks the entire bus and not just the space used by the current instruction (for the exact address and operand size as the documentation says for the 386+). And the lock lasts until the end of any instruction executed(even if triggered by the paging unit, for example).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 71 of 81, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
superfury wrote on 2021-08-07, 10:20:

As with physical CPU instruction timing, UniPCemu supports both modes. It can either run in IPS clocking mode(each instruction taking the same amount of time) or cycle-accurate mode(each instruction taking a specific amount of clocks depending in the instruction and BIU phase relative to the other hardware's clocking). Although LOCK still can delay the CPU(making the instruction multiple IPS clocks instead) if another device ( such as DMA, video card or other CPU)has control of the BUS. The CPU will effectively do NOPs until it can get control of the bus. Also, currently LOCK locks the entire bus and not just the space used by the current instruction (for the exact address and operand size as the documentation says for the 386+). And the lock lasts until the end of any instruction executed(even if triggered by the paging unit, for example).

Not sure why you're mentioning the LOCK prefix. In doubt, I'll copy paste the code from misc.S:

FUNC(asm_nop_loop):

.loop:
nop
sub ecx, 1
jne .loop

ret

END_FUNC(asm_nop_loop)

FUNC(asm_do_bogomips_loop):

# Note: these NOPs are important to align the instructions
# in the inner loop. Trying removing them and looping.
# On my machine, the loop count drops in half!

nop
nop
nop

.outer_loop:
mov eax, BOGOMIPS_CONST

.inner_loop:
nop
sub eax, 1
jne .inner_loop

mov eax, 1
lock xadd DWORD PTR __bogo_loops, eax
test eax, eax
jns .outer_loop

ret

END_FUNC(asm_do_bogomips_loop)

As you can see, the asm_do_bogomips_loop() function does not use LOCK on each iteration, but only once every BOGOMIPS_CONST (= 10,000) iterations, exactly because LOCK has a significant overhead. In other words, we're counting the iterations in a register (EAX) and increment __bogo_loops by +1 once every 10,000 iterations. Later in the C code, __bogo_loops is multiplied by 10,000 so, the overhead of LOCK is ignored, but it's negligible because it's happens extremely rarely.

When we have to delay, the loop used is: asm_nop_loop(), which contains no LOCK prefixes and doesn't need any form of synchronization.

Vlad

Reply 72 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

Hi Vlad, welcome! I had a feeling you might show up 😁

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

Reply 73 of 81, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
mr.cat wrote on 2021-08-08, 22:28:

Hi Vlad, welcome! I had a feeling you might show up 😁

Thanks! I'm happy to help when I can 😀

mr.cat wrote on 2021-08-08, 22:28:
The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux. I find that many times read […]
Show full quote

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

I totally understand that. The source code is what the CPU does, but sometimes it requires almost a sort of reverse engineering effort to understand why.
So, don't hesitate on asking questions about Tilck's code: I'll be happy to answer. Also, by asking questions, we all might realize there is a subtle bug somewhere.
There is always room for improvement.

BTW, I'd be curious to build and run UniPCemu myself. Can I build it on Linux?

Reply 74 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
vvaltchev wrote on 2021-08-09, 11:14:
Thanks! I'm happy to help when I can :-) […]
Show full quote
mr.cat wrote on 2021-08-08, 22:28:

Hi Vlad, welcome! I had a feeling you might show up 😁

Thanks! I'm happy to help when I can 😀

mr.cat wrote on 2021-08-08, 22:28:
The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux. I find that many times read […]
Show full quote

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

I totally understand that. The source code is what the CPU does, but sometimes it requires almost a sort of reverse engineering effort to understand why.
So, don't hesitate on asking questions about Tilck's code: I'll be happy to answer. Also, by asking questions, we all might realize there is a subtle bug somewhere.
There is always room for improvement.

BTW, I'd be curious to build and run UniPCemu myself. Can I build it on Linux?

It should be able to, last time I verified it on Ubuntu 20.
Simply follow the build steps (ofc with SDL(2) installed(and SDL(2)_net and libpcap optionally)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 75 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++

The fixes to UniPCemu are now in the latest release of the app for Android/Windows/PSP.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 76 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

Something I came across on Hacker News...

Here's a Linux image with a modern kernel that actually manages to boot:
https://ocawesome101.github.io/486-linux.html

Didn't have much luck getting Linux going in UniPCemu until I came across this one.

Reply 77 of 81, by mr.cat

User metadata
Rank Member
Rank
Member

OK here's how to make SeaBIOS to execute the VGA optrom, so that we actually get to see something on VGA.
I noticed that in the old SeaBIOS versions there's this switch called CONFIG_OPTIONROMS_DEPLOYED, that's needed to run optroms in old qemu versions (and Bochs).
The switch was removed in commit 9d691ace (2016-04-01), so for this test I used the last commit that still has it (8ef686f6).
There's a small adjustment needed for new compilers: stacks.c fails to compile with new gcc versions unless -fno-pie is added to COMMONCFLAGS (in Makefile).

Then launch "make menuconfig" and enable the config option mentioned above in the "BIOS interfaces" section (it's called "Option roms are already at 0xc0000-0xf0000" there).
It seems 256kB images don't work(?) so some stuff needs to be disabled to get the resulting rom image size down to 128kB.
When that's done, save and exit, and run make. If all goes well the romfile will then end up in out/bios.bin.

Note that any OPTROM*.BIN files in UniPCemu's ROM directory will all be run automatically (and will also show up in the boot menu).
This can be controlled via pci-optionrom-exec option (see docs/Runtime_config.md).
It might be best to rename or remove them while doing the testing.

EDIT: ...and here's a patch for SeaBIOS to undo commit 9d691ace. This is against the latest git.

Attachments

Reply 78 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++
mr.cat wrote on 2022-04-04, 16:17:
OK here's how to make SeaBIOS to execute the VGA optrom, so that we actually get to see something on VGA. I noticed that in the […]
Show full quote

OK here's how to make SeaBIOS to execute the VGA optrom, so that we actually get to see something on VGA.
I noticed that in the old SeaBIOS versions there's this switch called CONFIG_OPTIONROMS_DEPLOYED, that's needed to run optroms in old qemu versions (and Bochs).
The switch was removed in commit 9d691ace (2016-04-01), so for this test I used the last commit that still has it (8ef686f6).
There's a small adjustment needed for new compilers: stacks.c fails to compile with new gcc versions unless -fno-pie is added to COMMONCFLAGS (in Makefile).

Then launch "make menuconfig" and enable the config option mentioned above in the "BIOS interfaces" section (it's called "Option roms are already at 0xc0000-0xf0000" there).
It seems 256kB images don't work(?) so some stuff needs to be disabled to get the resulting rom image size down to 128kB.
When that's done, save and exit, and run make. If all goes well the romfile will then end up in out/bios.bin.

Note that any OPTROM*.BIN files in UniPCemu's ROM directory will all be run automatically (and will also show up in the boot menu).
This can be controlled via pci-optionrom-exec option (see docs/Runtime_config.md).
It might be best to rename or remove them while doing the testing.

EDIT: ...and here's a patch for SeaBIOS to undo commit 9d691ace. This is against the latest git.

OK.
Also, if you're deleting all ROMs for different architectures to remove their functionality, you can also specify a ROM with a size of 0 bytes (en empty file) to effectively remove them for the specified architecture only (when UniPCemu hits it in the priority chain of filenames, it will count the ROM as unexisting and not continue parse the chain). This is now mentioned in the manual. That's also the method I use with architectures past the Compaq Deskpro 386, to make the option ROMs for the Compaq disappear on newer models, if needed (depending on used ROMs for said architecture and it's chain used of course).

The current commits also support some new motherboards, namely i450gx, which is i450gx/i440fx>i430fx>PS2 etc. and 85C496/7 > PS2 etc.
Edit: Just changed it up a bit. Now i450gx/i440fx/i430fx/85C496/7 don't have each other as a priority anymore. Missing ROMs for those have been added. The i4x0 ROM has been added for i430fx/i440fx/i450gx instead of the old i430fx priority below it to provide generic ROM replacements for those BIOSes. 85C496 doesn't use such a generic ROM, as it's the only architecture of it's kind that's implemented.

Also, the current commits have a 85C496 northbridge only (without southbridge) on the Compaq/PS2 architectures now. This is mainly to support the PCI IRQ requirements on said architectures (a small hack to make improved PCI support). Stuff like the southbridge of the 85C496 (the 85C497) on those architectures aren't emulated (currently only the BIOS/option ROM PCI/ISA/RAM mapping, ESC chip and it's related architecture-specific components), because those are already implemented the Compaq way.

The 85C496/7 architecture is mostly implemented, except it's specific ESC functionality (of course including SMM) other than register storage (it's I/O ports are emulated though). PCI is now fully functional, like on the Compaq chipsets, as is the BIOS ROM support and required basic motherboard functionality (like CPU resets, PCIRST#, INIT support and extended memory DMA support).

One nice thing about PCI now being fully functional is that stuff like IRQs and memory(unused) are now fully mappable by the OS.
One thing to note is that the i450gx as i440fx setting doesn't affect the BIOS ROM loading. It just adds a i440fx compatibility layer to the i450gx chipset. It can use a i440fx ROM loaded as a i450gx ROM that way, while keeping the i450gx motherboard functionality active (allowing up to 8GB RAM and some of it's more advanced features to be used, using a translation layer between the two chipsets).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 79 of 81, by superfury

User metadata
Rank l33t++
Rank
l33t++

You might want to try the latest commits again. The i430fx has been reverted to it's old behaviour and the other PCI motherboards have full PCI configuration support now.
Although for the Compaq boards(Compaq and PS/2) it has the Sis 85c496 northbridge added(without it's BIOS ROM remapping support, as it's Compaq-incompatible) to provide for PCI IRQ mapping.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io