VOGONS


Reply 40 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, the protection is handled by the MMU during all memory reads/writes (including CPU instruction reads). So when an instructions reads data from memory (any instruction opcode/parameter) an exception is triggered and executed after which the CPU core itself aborts executing the instruction when it gets to the 'execution' part. Anything before that simply parses the data as if it were valid data (in fact, the only part of the full CPU emulation that actually parses such data is the CPU core REP/REPZ/REPNZ check and the 'execution' phase in opcodes_80*86.c(which executes the instruction itself and decodes modr/m when used with the instruction). All other parts will try to handle it if possible or raise an exception if it hasn't been triggered already during the 'decoding' phase by the CPU core (cpu.c) or modr/m core(modrm.c's modrm_decode8/16/32). The modrm_decode8/16/32 is called by modrm_readparams, which is in turn called by the instruction handler itself (opcodes_80*86.c). The instruction handler is called by cpu.c(the 'core' of the CPU handling containing all basic CPU functionality and special actions (stack, REP, related initialisation calls(MMU, MODR/M, instruction fetching) for every instruction).

Currently without the prefetching all instruction fetching is handled by CPU_readOP(w/dw). This function in turn calls MMU_r(b/w/dw) which handles protected mode from the MMU(including all checks and exceptions) and accesses memory and MMI/O hardware.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 41 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

I don't see the problem.
Your protected mode is only dependent on values of CS:EIP inside the instruction fetching, right?
And you still emulate the values of CS:EIP inside the instruction fetching, right?
So what does it matter that you don't *actually* read the instruction bytes directly from CS:EIP, but from a buffer that you have fetched earlier?
Literally *everything* should still work the same as before, except for when you actually pass the instruction byte from memory to your CPU fetch/decode routine.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 42 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

When 16 bytes of memory data are prefetched in protected mode and I read the prefetch buffer from memory with less than 16 bytes left in the code segment it will raise an exception, even when less than 16 bytes are actually executed (since it will try to read past the end of the code segment, raising an exception).

So it will either execute unfilled data from the buffer without protection when it gets to that point(using a buffer only) or raise an exception while executing past the last 15/16 bytes of the code segment(loading the buffer from CPU_readOP overflows reading from memory using the code segment).

The complete CPU emulation currently depends on CPU_readOP>MMU_rb raising an exception when reading wrong memory (CS limit, executing data segment, paging etc.).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 43 of 63, by idspispopd

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

This shouldn't be a problem during real mode. But what happens when prefetching in protected mode? How is memory protection and paging handled with prefetching? Does the prefetch not read memory past the segment limit? Will it error out when less than 15 bytes before the end of the code segment? What does it contain in that case?

8088/8088 don't have protected mode. Later CPUs have different prefetch queues (a short search gives 6 bytes for 286, 16 bytes for 386). I don't know if a cycle-exact emulation for anything but a 4.77 MHz 8088 is actually useful.

Reply 44 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

When 16 bytes of memory data are prefetched in protected mode and I read the prefetch buffer from memory with less than 16 bytes left in the code segment it will raise an exception, even when less than 16 bytes are actually executed (since it will try to read past the end of the code segment, raising an exception).

Erm, why?
Are you telling me that you actually use the physical CPU's MMU for your emulation?
That is, you expect your emulated code segments to map exactly to protected memory segments on the host OS, and expect the host OS to trigger a native exception for you?

Because I assume you don't do that (which you shouldn't), in which case you can easily read 16 bytes from anywhere.
At the end of the emulated memory space, you wraparound, which can be done with a simple masking operation. Just like the real hardware would do (somehow this sentence comes up far too often in a discussion about emulators... An emulator should do *exactly* what the real hardware does, by definition).

How hard can it be, really, to fetch data *exactly* as you did before, but not *during* the instruction decoding, instead *before* the instruction decoding? Fetching and decoding of an instruction is a linear memory operation by definition.
The exceptions don't have to occur when you fetch into the prefetch buffer, just when you perform the instruction fetching/decoding. Which is where they always were I assume.

I really don't see the problem.
I mean, if you go for the total no-brainer approach, you just make a copy of all your functions that are called during the emulation of an instruction.
With this new set of 'prefetch' functions, you remove any of the actual CPU-state emulation, keeping only the functionality that fetches the bytes from memory.

You modify the original functions to fetch their data from the buffer instead of from memory.
For each instruction, you first run the new 'prefetch' functions until the buffer is full. Then you run the original functions to emulate the instruction.

That's it.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 45 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

The problem is that if I remove the call to MMU_rb(which is also used during normal memory accesses by the CPU) and replace it with a direct memory approach (needing a complete copy of the entire memory module, the CPU emulation won't have ANY protection left anymore. No exceptions will be triggered for ANY illegal memory execution (like the ones I explained before, limits etc.) ALL memory accesses go through the MMU_r/w(b/w/dw) instructions to handle memory protection and access to MMI/O devices atm. If I remove the protection from that part, the CPU emulation won't handle any illegal cases anymore. The CS segment won't have ANY protection on it. It could be executing data or whatever it finds at that memory location without checks, just like real mode, even in protected mode.

Or I would have to make a copy of MMU_rb, remove all segment&paging checks(keep mapping it to emulated RAM ofcourse) from it, use that function to fill the buffer and move the removed protection&paging protection calls to CPU_readOP to keep the protection on any memory(including page faults).

Last edited by superfury on 2015-11-16, 15:31. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 46 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

Why would you remove the call?
I really don't see your problem.
How hard is it to do:
1) Perform MMU_rb() to handle protection
2) Read the byte you're actually going to feed to your CPU emulation routine from the prefetch buffer

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 47 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

If I call MMU_rb for any illegal segment:offset(it executes both paging and segment protection emulation as well as read the data from emulated memory) from the prefetch buffer filling routine, it will always raise exceptions after 16 bytes before the end of the code segment:

IP+prefetchsize>=CS.limit raises #GP, no matter how much bytes in the buffer are actually executed.

Thus MMU_rb cannot be used for prefetching, only for normal memory operations(execution phase).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 48 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

If I call MMU_rb for any illegal segment:offset(it executes both paging and segment protection emulation as well as read the data from emulated memory) from the prefetch buffer filling routine, it will always raise exceptions after 16 bytes before the end of the code segment:

Obviously you shouldn't do that for prefetching 😀
You should create a 'dummy' function, as I said, which only reads the bytes into the buffer, but doesn't raise any exceptions, since these checks will be done during execution (where you use MMU_rb to read the bytes, but substitute them for the bytes from the prefetch buffer before decoding the actual instruction).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 49 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
idspispopd wrote:

8088/8088 don't have protected mode. Later CPUs have different prefetch queues (a short search gives 6 bytes for 286, 16 bytes for 386). I don't know if a cycle-exact emulation for anything but a 4.77 MHz 8088 is actually useful.

This isn't about cycle-exactness.
This about self-modifying code.
This trick is also often used for anti-debugging. Because a debugger will interfere with the prefetch buffer.
This also works in protected mode.
See here, under "Hardware tricks": http://pferrie.tripod.com/papers/unpackers.pdf

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 50 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

What about MMI/O, like VGA VRAM? When the prefetch reads VRAM which is protected(but protection is ignored or simply too far past the code to be executed), it might cause the VGA memory latches to be loaded when actually they shouldn't? Like REP STOSW; RET just before VRAM, at address 9000:FFFD (real mode)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 51 of 63, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie

Interesting corner case. But in real mode, and on a 8088/8086, the IP is limited to 16 bits anyway so it would not fetch bytes beyond 9000:FFFF. I think most likely a 8088/8086 would wrap to 9000:0000. I mean, what would touch the segment register to increase it? But actually, I have never tried that, although it is relatively easy case to test by filling a 64k segment with NOPs and put a RETF at offset 0, and then call far segment:FF00 for example, on a real CPU.

Reply 52 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

What about MMI/O, like VGA VRAM? When the prefetch reads VRAM which is protected(but protection is ignored or simply too far past the code to be executed), it might cause the VGA memory latches to be loaded when actually they shouldn't? Like REP STOSW; RET just before VRAM, at address 9000:FFFD (real mode)?

I suggest you try it on real hardware. I wouldn't be surprised if a real CPU would actually prefetch into the VRAM. Because to me it seems unlikely that they handled such corner-cases in hardware as well.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 53 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

I don't have real hardware to test it with, unfortunately.

I've implemented the FIFO buffer in my latest build.

CPU_readOP now tries to read from the FIFO if possible(fully filled for every instruction executed). When the FIFO is empty, the FIFO EIP to be read next instruction is updated with the current incremented EIP. After this happens the byte is read from memory(also without FIFO this happens).
Instructions updating CS or EIP clear the FIFO.

8088MPH now runs fully. The music at the end sounds beepy though (this is probably because of 512 buffer samples being filled/sampled at once, depending on the set frequency. Should I convert this to a FIFO being filled at 44100Hz rate a sample at a time in the CPU thread, which is only being read by the rendering callback(Reading from a converted 16-bit FIFO)? Of course the rendering callback falls back to the normal rendering when not enough samples are provided.).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 54 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've moved PC Speaker buffering to the CPU thread(tickSpeakers function). Somehow I hear nasty pops in the sound? Anyone knows why?

https://bitbucket.org/superfury/x86emu/src/dd … ker.c?at=master

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 55 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've just tested the new PC speaker emulation against 8088 MPH. The music seems fine for the most part (except some CPU heavy parts at which the sound changes very slow).
It seems the FIFO buffer emulation slows the CPU emulation down to becoming very slow (running at about 0.2-0.3MIPS, integer instructions and register to registers ~0.50 according to MIPS 1.10).

When it comes to the 8088 MPH credits, all I hear is a solid tone (some low C tone i think) instead of the MOD music? Anyone knows what's going wrong in my PC speaker emulation?

Also when it finishes, the VGA cursor position seems to be arround the center of the ">" character instead of at the bottom? This is probably because the CRTC of the VGA is incompatible?

Anyone can explain to me how the MOD playback is done from a hardware point of view(PIT -> PC speaker output)? I already know it counts down from the set value to 0, after which it does something with the output signal (transit high to low or low to high or both at the same time)? So this square wave is sent to the PC speaker. What happens to it, when you look at the square wave sent and the resulting sampled output? What is the output in this case? Any information on how to implement it in my emulation? (I calculates samples at a rate of 44.1kHz. Each sample is calculated using currentfunction(), frequency and time.
The audio thread is stopped while generating samples in tickSpeakers(). This function is called after every CPU instruction executed. It should take care of updating the sound buffers for the PC speaker.
The speakerCallback function simply reads the buffer and gives the renderer output.

Anyone knows how to fix this? Anyone got some formulas on how to handle output correctly?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 56 of 63, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

When it comes to the 8088 MPH credits, all I hear is a solid tone (some low C tone i think) instead of the MOD music? Anyone knows what's going wrong in my PC speaker emulation?

Do you support the PIT mode used by 8088 MPH to generate pulse width modulation?
Even if you do, do you just simply skip timer output bits and take the output every 27 timer ticks to get 44100 Hz output rate? If so, you throw away audio data.

It may not be worth it to emulate it like a real timer chip would, it would be enough to understand the pulse width values loaded into timer as PCM data.
But then the hardest part is to determine the range of pulse widths which depend on the playback sampling rate but I think assuming the range is 8-bit is fine.
Also the playback sampling rate needs to be determined, as it may be arbitrary. It can be measured by time between loading the PIT counter register I suppose.

Ultimately the most theoretically correct way of emulating the speaker would be to downsample the 1193180 Hz timer output bitstream with a proper anti-alias lowpass filter to get 44100 Hz playback rate. Again this would consume too much CPU time so maybe some neat tricks like band-limited steps could be used.

Reply 57 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

The MOD player is cycle-exact code.
That is, it doesn't use the timer to time the samples. The code is written so that each sample takes exactly 288 cycles, resulting in a replay rate of 16.5 kHz on a 4.77 MHz 8088. If your CPU emulation isn't cycle-exact, it means that the 'samples' it outputs (PWM values to PIT) will not be correct for the effective sample rate (the carrier frequency), which means the whole idea of PWM modulation falls apart, and it will sound distorted.

The MOD player is probably the hardest effect in 8088 MPH to get right in an emulator. Note that it also relies on emulating DMA refresh and CGA waitstates correctly (for the scroller updates), in order to get the code to be 288 cycles in all cases.
Although, to be fair, we've done a spectrum analysis, and it appeared that not all possible variations of the sample code were exactly 288 cycles. So the mod player doesn't run at exactly 16.5 kHz, but there is a tiny bit of fluctuation here and there.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 58 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

Do you know how to make my PC speaker emulation so that (assuming software samples at the correct speed, like using the PIT0 for timing), playback by software gives the correct PCM output?

Edit: I've tried running the current version (build 2015/11/23 8:53) on my Intel i7@4.0GHz. The CPU doesn't run at full speed (according to MIPS 1.10 it's running at 0.77MIPS). 8088MPH says it's about 1XX% difference between what it's expecting and what it needs to run correctly.

The end credits music seems to give partly 'correct' output: I definitely hear some kind of 'drums' together with what sounds like some bubbly sounds(some kind of instrument with tones)? Anyone can tell me more about this?

Filename
x86EMU_20151123_0853.zip
File size
347.12 KiB
Downloads
45 downloads
File comment
x86EMU build 2015/11/23 8:53
File license
Fair use/fair dealing exception

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 59 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

There's one thing I'm still wondering: if the cpu reads (and thus removes) a byte from the prefetch input queue(which is replaced with the next byte from memory using a circular FIFO), how can the CPU execute REP(z/nz) instructions while looping this way? The CPU will first read the first prefix(REP/REPNZ), after that a new byte could be read by the prefetch unit. So while the CPU is executing a 2-byte opcode, like REP MOVSB, those two bytes are removed from the FIFO. They could be filled with data on next instruction(s) already in a circular FIFO. So when the CPU tries to execute the instruction again, it executes bytes from memory of the next instruction(s). So how does the CPU handle this? Does it even use a circular FIFO(My emulation does)?

All instructions are fetched(and during fetch removed from FIFO to be overwritten during either following instruction or overflow(buffer refill for instructions larger than the queue).

If possible, my emulation reverts the FIFO back to the start of the instruction(instruction length up to FIFO size) or reloads FIFO from memory(instruction larger than FIFO size).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io