VOGONS


Getting UniPCemu up and running

Topic actions

Reply 60 of 75, by superfury

User metadata
Rank l33t
Rank
l33t
mr.cat wrote on 2021-07-30, 14:22:
Commit 65ffa33f seems fine also (similar behavior to e5c540ef). About that i8042 "hang" this code clip from Tilck's modules/kb80 […]
Show full quote

Commit 65ffa33f seems fine also (similar behavior to e5c540ef).
About that i8042 "hang" this code clip from Tilck's modules/kb8042/generic_x86/i8042.c may be illuminating:

static NO_INLINE void i8042_io_wait(void)
{
if (in_hypervisor())
return;

delay_us(1);
}

So no, I wouldn't say it's that weird 😁

That delay is way too short? Although when sending or receiving, it's done at 16.7kHz intervals, so that's many thousands of microseconds.
Also, the bit when reading port 64h (until bit 0 is set for receiving or bit 1 cleared for sending (from the CPU perspective)) would need to be wrapped around that delay probably?
Which one of those bits needs to be tested depends on the location said function is called at.

Last edited by superfury on 2021-07-31, 06:28. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 61 of 75, by mr.cat

User metadata
Rank Member
Rank
Member

Yeah, bottom line is that the timing is off and needs to be skipped (for qemu too).
There's an attempt at adjusting the timing by way of variable called "loops_per_us". This is then used to do some nop looping.
The value of that variable is shown during boot as "Tilck bogoMips" and it seems to be zero.
So the adjustment has failed one way or another, but perhaps it can be fixed.
In fact Vlad has very recently made some timing-related changes, so maybe I should just try with the latest git.
EDIT: Tried with the latest git and it's the same.

Last edited by mr.cat on 2021-07-31, 10:43. Edited 1 time in total.

Reply 62 of 75, by superfury

User metadata
Rank l33t
Rank
l33t

Just calculated it at 16.7kHz,it's at least 60us delay(depending on the CPU clock relative to the 16.7kHz signal) that would be required at least if doing the same approach.
Otherwise, it's a combination of polling port 64h for bits 0 or 1(depending on if it's for a read(while bit 0 cleared) or write(while bit 1 set)) with 1us delays between each poll as a for loop, which is even more foolproof(and accounts for bogomips issues as well).
Even better would be adding a timeout on that loop as well after two 16.7kHz ticks(more than 120us having been looped)?
The BIOS does that as well.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 63 of 75, by mr.cat

User metadata
Rank Member
Rank
Member

I found out the cause for the halt/freeze:
If loops_per_us is zero, that means the "nop loop" counter (ecx) flips over to its max value. So it's not really frozen, just taking it's time doing nop 😁
I changed the variable to 1 and now it boots all the way.

EDIT: This issue has now been properly fixed as of commit 2d2cf5a1 (thanks, Vlad!).

Attachments

Reply 64 of 75, by superfury

User metadata
Rank l33t
Rank
l33t

About the 0 bogomips... At what speed is the emulated CPU running? (Default is 3 MIPS on 386+, which is slightly faster than a 286@12MHz and Dosbox/Bochs default as well afaik)

Also, 0 bogomips is weird? Even a 8088@4.77MHz should be at least 0.019 BogoMIPS?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 65 of 75, by mr.cat

User metadata
Rank Member
Rank
Member

The actual value was something like 0.8, it just got truncated to zero (for a longer explanation, see Tilck issue #74).
In SETTINGS.INI I have clockingmode=1 and cpuspeed=2400. Tilck currently needs P2, so cpu must be set to 7.

Reply 66 of 75, by superfury

User metadata
Rank l33t
Rank
l33t
mr.cat wrote on 2021-08-06, 09:51:

The actual value was something like 0.8, it just got truncated to zero (for a longer explanation, see Tilck issue #74).
In SETTINGS.INI I have clockingmode=1 and cpuspeed=2400. Tilck currently needs P2, so cpu must be set to 7.

Looking again at the explanation of how it's calculated, the same issue would happen on the other emulators. 1/100000th of a second(which would be 10us) would only be enough to execute about 300 instructions at a speed of 3 MIPS. So using such a base for microsecond delays(only 3 instructions are executed each microsecond) would be way too small a delay to be usable in that way(0nly returning from kernel to user mode probably already exceeds that?).
Or is the timer supposed to be set to 0.1 second instead? That would loop around 300000 NOPs, if given the chance.
Logically, using the setup, loops_per_us should end up at 3 when at 3 MIPS in IPS clocking mode(2.4 in your case for 2400KIPS).
Unless there's a calculation error?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 67 of 75, by mr.cat

User metadata
Rank Member
Rank
Member

Yes the bug could happen on real hardware too, provided it's slow enough.
Here's some of the variable values. These were gathered running the patched version of Tilck (probably doesn't matter whether patched or not, at least the __bogo_loops is the same as before the patch).

TIMER_HZ is 100
__tick_duration 9999312
__bogo_loops 8
loops_per_tick = 8000

So if I understood this correctly, one tick is approximately 0.01s, and the bogomips timing test goes through 8000 loops in that time frame.
The code for the measurement is in kernel/timer.c, and the nop loop itself can be found in kernel/arch/i386/misc.S.

EDIT: Hmm it seems I've chosen the cpuspeed value 2400 simply because it's the one that works. I now tried with 2700 and Bochs BIOS fails to boot with an error message:
FATAL: ata-detect: Failed to detect ATAPI device
With 2600 it boots and bogoMips goes up to a whopping 0.9...
EDIT2: It's some kind of timing issue. But I did manage to boot with some higher cpuspeed values. For instance, with cpuspeed=15000 it does boot, with bogoMips value of 5.
The boot doesn't "feel" any snappier though (probably because my host is too slow).

Last edited by mr.cat on 2021-08-06, 23:23. Edited 1 time in total.

Reply 68 of 75, by superfury

User metadata
Rank l33t
Rank
l33t
mr.cat wrote on 2021-08-06, 16:55:
Yes the bug could happen on real hardware too, provided it's slow enough. Here's some of the variable values. These were gathere […]
Show full quote

Yes the bug could happen on real hardware too, provided it's slow enough.
Here's some of the variable values. These were gathered running the patched version of Tilck (probably doesn't matter whether patched or not, at least the __bogo_loops is the same as before the patch).

TIMER_HZ is 100
__tick_duration 9999312
__bogo_loops 8
loops_per_tick = 8000

So if I understood this correctly, one tick is approximately 0.01s, and the bogomips timing test goes through 8000 loops in that time frame.
The code for the measurement is in kernel/timer.c, and the nop loop itself can be found in kernel/arch/i386/misc.S.

EDIT: Hmm it seems I've chosen the cpuspeed value 2400 simply because it's the one that works. I now tried with 2700 and Bochs BIOS fails to boot with an error message:
FATAL: ata-detect: Failed to detect ATAPI device
With 2600 it boots and bogoMips goes up to a whopping 0.9...

Just calculated it at a speed of 2600000IPS(2600KIPS). It matches the instruction timing in IPS clocking mode. So indeed 8000 loops are executed each 0.01 second. So it should be able to (if used correctly) provide a delay of down to 1us with that, provided the delay is correctly translated to the used BogoMIPS (that being one loop = 1.15us at 2600KIPS CPU clock speed).
Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Edit: Also, it's weird that the CD-ROM/ATAPI driver of the BIOS would start to fail when getting faster? Would that be a CPU bug or something related to Bochs itself?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 69 of 75, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
superfury wrote on 2021-08-06, 20:48:

Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Hi 😀
I'm the Tilck guy.

So, with the newest patch the kernel does not require one loop per microsecond in order to work. Just, in case the vCPU is slower than that, and delay = 1us, no iteration will be performed. If the delay is bigger enough, some loop iterations will be performed instead. About your second question, MIPS in my case does not strictly mean "millions of instructions per second". As you can see, one loop iteration requires more than one instruction. So here MIPS means "millions of iterations per second". I know it might sound confusing but as you know physical CPUs take a different amount of time for each instruction depending also on the context, so talking about #instructions/second does not really make sense. That's why "bogoMips" is a "bogus" way to measure time. It's defined simply by a reference loop, no matter how it's implemented and then we count the iterations of that specific loop both when calculating "bogoMIPS" and when trying to busy-wait for very little time. As long as the loop code is the same, the time it will take to complete will be roughly what we expect it to be.

BTW, for reference, I've noticed that Tilck's bogoMIPS =~ Linux's bogoMIPS / 2. That's simply because on Linux, the loop is implemented in a different way.

I hope I've helped a little clarifying this.
Vlad

Reply 70 of 75, by superfury

User metadata
Rank l33t
Rank
l33t
vvaltchev wrote on 2021-08-07, 01:21:
Hi :-) I'm the Tilck guy. […]
Show full quote
superfury wrote on 2021-08-06, 20:48:

Or does that mean that kernel requires at least one loop per microsecond to properly work and delay? So that would need to be a speed of at least 3MIPS(where it should reach 1 bogoMIPS) with those 3 instructions in the loop being exactly 1us?

Hi 😀
I'm the Tilck guy.

So, with the newest patch the kernel does not require one loop per microsecond in order to work. Just, in case the vCPU is slower than that, and delay = 1us, no iteration will be performed. If the delay is bigger enough, some loop iterations will be performed instead. About your second question, MIPS in my case does not strictly mean "millions of instructions per second". As you can see, one loop iteration requires more than one instruction. So here MIPS means "millions of iterations per second". I know it might sound confusing but as you know physical CPUs take a different amount of time for each instruction depending also on the context, so talking about #instructions/second does not really make sense. That's why "bogoMips" is a "bogus" way to measure time. It's defined simply by a reference loop, no matter how it's implemented and then we count the iterations of that specific loop both when calculating "bogoMIPS" and when trying to busy-wait for very little time. As long as the loop code is the same, the time it will take to complete will be roughly what we expect it to be.

BTW, for reference, I've noticed that Tilck's bogoMIPS =~ Linux's bogoMIPS / 2. That's simply because on Linux, the loop is implemented in a different way.

I hope I've helped a little clarifying this.
Vlad

As with physical CPU instruction timing, UniPCemu supports both modes. It can either run in IPS clocking mode(each instruction taking the same amount of time) or cycle-accurate mode(each instruction taking a specific amount of clocks depending in the instruction and BIU phase relative to the other hardware's clocking). Although LOCK still can delay the CPU(making the instruction multiple IPS clocks instead) if another device ( such as DMA, video card or other CPU)has control of the BUS. The CPU will effectively do NOPs until it can get control of the bus. Also, currently LOCK locks the entire bus and not just the space used by the current instruction (for the exact address and operand size as the documentation says for the 386+). And the lock lasts until the end of any instruction executed(even if triggered by the paging unit, for example).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 71 of 75, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
superfury wrote on 2021-08-07, 10:20:

As with physical CPU instruction timing, UniPCemu supports both modes. It can either run in IPS clocking mode(each instruction taking the same amount of time) or cycle-accurate mode(each instruction taking a specific amount of clocks depending in the instruction and BIU phase relative to the other hardware's clocking). Although LOCK still can delay the CPU(making the instruction multiple IPS clocks instead) if another device ( such as DMA, video card or other CPU)has control of the BUS. The CPU will effectively do NOPs until it can get control of the bus. Also, currently LOCK locks the entire bus and not just the space used by the current instruction (for the exact address and operand size as the documentation says for the 386+). And the lock lasts until the end of any instruction executed(even if triggered by the paging unit, for example).

Not sure why you're mentioning the LOCK prefix. In doubt, I'll copy paste the code from misc.S:

FUNC(asm_nop_loop):

.loop:
nop
sub ecx, 1
jne .loop

ret

END_FUNC(asm_nop_loop)

FUNC(asm_do_bogomips_loop):

# Note: these NOPs are important to align the instructions
# in the inner loop. Trying removing them and looping.
# On my machine, the loop count drops in half!

nop
nop
nop

.outer_loop:
mov eax, BOGOMIPS_CONST

.inner_loop:
nop
sub eax, 1
jne .inner_loop

mov eax, 1
lock xadd DWORD PTR __bogo_loops, eax
test eax, eax
jns .outer_loop

ret

END_FUNC(asm_do_bogomips_loop)

As you can see, the asm_do_bogomips_loop() function does not use LOCK on each iteration, but only once every BOGOMIPS_CONST (= 10,000) iterations, exactly because LOCK has a significant overhead. In other words, we're counting the iterations in a register (EAX) and increment __bogo_loops by +1 once every 10,000 iterations. Later in the C code, __bogo_loops is multiplied by 10,000 so, the overhead of LOCK is ignored, but it's negligible because it's happens extremely rarely.

When we have to delay, the loop used is: asm_nop_loop(), which contains no LOCK prefixes and doesn't need any form of synchronization.

Vlad

Reply 72 of 75, by mr.cat

User metadata
Rank Member
Rank
Member

Hi Vlad, welcome! I had a feeling you might show up 😁

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

Reply 73 of 75, by vvaltchev

User metadata
Rank Newbie
Rank
Newbie
mr.cat wrote on 2021-08-08, 22:28:

Hi Vlad, welcome! I had a feeling you might show up 😁

Thanks! I'm happy to help when I can 😀

mr.cat wrote on 2021-08-08, 22:28:
The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux. I find that many times read […]
Show full quote

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

I totally understand that. The source code is what the CPU does, but sometimes it requires almost a sort of reverse engineering effort to understand why.
So, don't hesitate on asking questions about Tilck's code: I'll be happy to answer. Also, by asking questions, we all might realize there is a subtle bug somewhere.
There is always room for improvement.

BTW, I'd be curious to build and run UniPCemu myself. Can I build it on Linux?

Reply 74 of 75, by superfury

User metadata
Rank l33t
Rank
l33t
vvaltchev wrote on 2021-08-09, 11:14:
Thanks! I'm happy to help when I can :-) […]
Show full quote
mr.cat wrote on 2021-08-08, 22:28:

Hi Vlad, welcome! I had a feeling you might show up 😁

Thanks! I'm happy to help when I can 😀

mr.cat wrote on 2021-08-08, 22:28:
The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux. I find that many times read […]
Show full quote

The Linux folks were probably familiar with the BogoMIPS term already, but not everyone is on Linux.
I find that many times reading the actual source code is the best way to remove guesswork.
But in practice that's also a bit difficult to do sometimes (finding enough time to do it, poorly commented/obscure code, a huge codebase, etc. etc. - there's no shortage of excuses 😀
It's a great shortcut to have the actual author commenting.

That's a good point about context btw. I see you had some comments in that nop loop about alignment playing a big role in performance for instance.

I totally understand that. The source code is what the CPU does, but sometimes it requires almost a sort of reverse engineering effort to understand why.
So, don't hesitate on asking questions about Tilck's code: I'll be happy to answer. Also, by asking questions, we all might realize there is a subtle bug somewhere.
There is always room for improvement.

BTW, I'd be curious to build and run UniPCemu myself. Can I build it on Linux?

It should be able to, last time I verified it on Ubuntu 20.
Simply follow the build steps (ofc with SDL(2) installed(and SDL(2)_net and libpcap optionally)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io

Reply 75 of 75, by superfury

User metadata
Rank l33t
Rank
l33t

The fixes to UniPCemu are now in the latest release of the app for Android/Windows/PSP.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows and PSP on itch.io