VOGONS


Reply 20 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

This is the code in the sprite part that synchronizes the drawing of the sprite with the beam position.
It does this by setting up the timer interrupt at 60 Hz (19912 ticks), synchronized to the vertical blank interval.
This means that the counter starts at 19912 ticks for a frame, and counts down to 0 as the beam travels over the screen. Each scanline takes exactly 76 ticks.
So, we know our sprite position, and we know how high the sprite is, so we can determine at which scanline the sprite is drawn by the beam. And then we can calculate the counter value at that scanline.
So we wait for the counter to reach that point.

This is the code, the expected counter value is in dx:

@@waitRaster:
mov al, 00000000b ; al = channel in bits 6 and 7, remaining bits clear
out 043h, al ; Send the latch command

in al, 040h ; al = low byte of count
mov ah, al ; ah = low byte of count
in al, 040h ; al = high byte of count
xchg al, ah ; al = low byte, ah = high byte (ax = current count)

cmp ax, dx
ja @@waitRaster

It looks like your counter is somehow not working properly, so it never exits the loop (the counter also triggers an interrupt at 60 Hz, and the music replayer is in there, so it will continue to play music during this wait-loop).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 21 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

The counters are interpolated when read this way, using port 0x40-42:

https://bitbucket.org/superfury/x86emu/src/e6 … pit.c?at=master

Look at functions out8253 and in8253 for the calls from the CPU, updatePITstate updates the current value in the counter based on the high resolution timer, updatePIT0 (although incorrect name still) updates the timers before every CPU instruction and cleanPIT0 discards pending time (for when the emulator has been paused and resumed, since the timers will hold huge values to process, instead of no time (it's paused after all)).

Can you see what's going wrong here?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 22 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

The code looks correct at first sight.
Another issue could be speed-related. Namely, I have a safeguard value of 1 scanline (I believe)... because that is about the time it takes to poll the counter. So if my sprite is too low on the screen (very small value in dx), I skip the loop to avoid polling for a value that I can never reach, because the counter will wraparound everytime.
If your emulator runs slower than a real 8088 at 4.77 MHz, perhaps the safeguard of 76 ticks is not enough, and you still get deadlocked when the sprite is at the bottom of the screen (which is where it starts in that effect).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 23 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've fixed the timing a bit: https://bitbucket.org/superfury/x86emu/src/24 … pit.c?at=master.

The application does tell me it's detecting the CPU running at 0% speed of a 8086 (according to MIPS 1.10, it's actually running at up to 2.56x 8086 speed.

General instructions: 1.50
Integer instructions: 2.56
Memory to memory: 1.76
Register to register: 2.35
Register to memory: 1.37

Performance rating: 1.74
).

So it's actually running really fast (1.74-2.56x 8086 speed). Is this a problem?

Edit: Changed timings using the ms-cycle (Dosbox-style) setting in the BIOS menu. I'm now getting:
General instructions: 0.84
Integer instructions: 1.48
Memory to memory: 1.02
Register to register: 1.35
Register to memory: 0.78

Performance rating: 1.00

Edit2:
It still hangs with above timings with the static(unchanging) white noise screen, but the 4K graphics mode is more readable (even recognizable now, can read 4K colors and other pieces of text now with such a slow CPU). Same with those 16 color, with hacks 256 colors at most coloured text part. Still text-mode blocky.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 24 of 63, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Do you run your 8253 in a separate thread than you CPU? I used to do that but it never really worked properly, and when booting the XT BIOS (or the AT IBM BIOS) sometimes it would error out because those BIOSes require an accuracy of +/- 10% in cycles for a given number of ticks of the 8253. So instead for a XT machine I actually take advantage of the 4:1 ratio between the CPU time and the 8253 time so every 4 cycles of the CPU I run a tick of the 8253, same thread no complications. When I turn on the mode of each instruction taking roughly correct number of cycles (from here: http://www.eecs.wsu.edu/~aofallon/ee234/hando … ts/x86times.pdf and here http://ece425web.groups.et.byu.net/stable/lab … tructionSet.pdf) I get Topbench to correctly recognize it as a 4.77 IBM PC XT machine.

Depending on how your CPU run loop is designed this might be easier or harder to implement (if you already do not do that).

I need to run MIPS too so I can compare with your results.

Regards,
Vlad.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 25 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

The application does tell me it's detecting the CPU running at 0% speed of a 8086 (according to MIPS 1.10, it's actually running at up to 2.56x 8086 speed.

Well, whatever happens, it should never report 0%, especially not if it's actually faster.
I think that is another indication that something is wrong with your counter somewhere.

Edit:
I believe this line is not required:

if (calculatedpitstate[channel] == 65536) calculatedpitstate[channel] = 0; //We start counting from 0 instead of 65536!

Because this line cancels it out:

calculatedpitstate[channel] &= 0xFFFF; //Convert it to 16-bits value of the PIT!

Besides, I don't think you should actually be counting from 0. Semantically the timer will count down from 65536, but you can only read 16 bits from it, so it will appear as 0. Your and already takes care of that though.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 26 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've updated the PIT:
https://bitbucket.org/superfury/x86emu/src/87 … pit.c?at=master

It seesm to continue running now. It lags a bit at some times, but continues as it should. I get some screens with static noise on the screen. Eventually the Hornet etc. appears with the pyramid moving as it should (but in b/w of course). After that a glowing (stronger and weaker) turning cube, pentagon. Those are all in b/w and tripled like the 'tunnel' effect it starts after the first b/w screen (can't recall the name atm). Next a big bar of noise. Then half a colored screen (looks like blinking text) with "Hornet, CRTC, and DESiRE" in a black part at the bottom left.

Music hangs at this point. The text mode cursor is blinking at about position 5,24(textmode character cursor position).

The log of instructions executing at that point:

Filename
8088mph_log2.zip
File size
101.43 KiB
Downloads
43 downloads
File comment
Final screen debugger log.
File license
Fair use/fair dealing exception

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 27 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

Music hangs at this point. The text mode cursor is blinking at about position 5,24(textmode character cursor position).

If you mean the end-part... It contains some self-modifying code where an instruction is overwritten which is already in the prefetch buffer. This means that the unmodified instruction gets executed, and the modified instruction doesn't get executed until the next iteration.
If you execute it right away, it will lock up (PCem and DOSBox do that anyway, all real CPUs support this).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 28 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

Any tips on how to archieve this 'prefetch buffer' emulation (it's currently fetching all instructions directly from memory using CS and (E)IP)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 29 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

Any tips on how to archieve this 'prefetch buffer' emulation (it's currently fetching all instructions directly from memory using CS and (E)IP)?

Well, I don't think you need perfect emulation for our demo to work. Just some amount of prefetching to avoid direct rewrites of instructions would work.
Perhaps this will work:
- Create a buffer that can hold the longest possible instruction (I think 15 bytes will do?). This means you don't have to worry about handling multiple fetches during a single instruction, keeping it simple.
- At startup, fetch a full buffer.

Then, for each instruction:
- Decode the instructions from the buffer instead of from CS:(E)IP directly.
- Once the instruction is decoded and the length is known, shift the data in the buffer forward by length, and fill with new bytes (at CS:(E)IP + length).
- For every jump, flush and fetch a whole new buffer from the new CS:(E)IP

If you want accurate emulation, you should make it 4 bytes (the length of the buffer on 8088), and you'll need to emulate the bus cycles as well. For every idle bus cycle, you fetch 1 byte into the buffer.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 30 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

The problem is that my CPU emulation decodes the data from memory while executing it:
First is a little loop:
- Fetch byte from CS:(E)IP.
- If prefix, set flag and loop, else run opcode handler from lookup table.

The opcode handler from the lookup table:
- Fetch ModR/M byte if needed.
- Fetch SIB byte if needed.
- Fetch parameters one byte at a time if needed.
- Execute instruction itself (Make calculations, change registers, send data to/from hardware(IN/INS/OUT/OUTS) or memory etc.)
- Return to caller (The CPU core itself).

Finally the CPU core itself will check for looping functions (REP instructions) and reset CS:EIP to the location before the first loop was executed to execute the same instruction again.

How can I convert this process to use a prefetch instead without having to change the whole structure of my CPU emulation dramatically?

Btw EIP is incremented for every instruction byte read from memory.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 31 of 63, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie

I don't know how to best emulate the prefetch queue in your case, but here's one suggestion.

Fetch bytes from CS:IP to a 4-byte FIFO (or ring buffer), and execute instructions from there. Fetch more bytes into FIFO when it's not full.

And whenever an opcode changes IP like call, jump, jxx, int, ret, retf, reti, clear the FIFO so that bytes from new CS:IP are fetched to FIFO and then executed.

That's actually how it's drawn in the block diagrams, the CPU Execution Unit itself executes opcodes from the queue, which is filled by the Bus Interface Unit. Kind of like the CPU core does not have IP or segment registers in the EU, as at least the segment registers are in the BIU. The "model" how a programmer sees the CPU just combines all the stuff together and the prefetch queue is very rarely mentioned, as it does generally affect anything else except self-modifying code, the bytes already in the queue cannot be changed as they are already fetched from memory.

Reply 32 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

How can I convert this process to use a prefetch instead without having to change the whole structure of my CPU emulation dramatically?

Well, the way I described it is the easiest.
You only have to modify the instruction decoder to fetch instructions from the buffer instead of from memory (because as I say, the buffer is always larger than an instruction, so you don't have to take special care of it). Then you have to track CS:(E)IP before and after processing an instruction, so that you know the length. And then you update the buffer by that amount.
Then take care of any instructions that modify CS and/or (E)IP by forcing a flush of the buffer.
That's it.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 33 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
Jepael wrote:

That's actually how it's drawn in the block diagrams, the CPU Execution Unit itself executes opcodes from the queue, which is filled by the Bus Interface Unit. Kind of like the CPU core does not have IP or segment registers in the EU, as at least the segment registers are in the BIU. The "model" how a programmer sees the CPU just combines all the stuff together and the prefetch queue is very rarely mentioned, as it does generally affect anything else except self-modifying code, the bytes already in the queue cannot be changed as they are already fetched from memory.

Yes, it surprises me that although x86 is by far the most popular CPU in the world, there doesn't seem to be any accurate emulator for it.
People just write dumb interpreters for them, with a very simplified model of the CPU (let alone the rest of the hardware).

If you look at an emulator for a C64 or an Atari VCS for example, the whole CPU is emulated down to the cycle-level. It not only spends the right amount of cycles on every single instruction, but it also makes sure to perform all memory read and write operations on the correct cycle as well. This is important for memory-mapped hardware registers, for example. In the case of the C64, the VIC-II will also steal the bus from the CPU in some cases (loading a new 'bad line', or accessing sprites), and that is all emulated down to the cycle as well.

What we need is a PC emulator that does all this as well. Emulate the prefetch buffer, emulate the DMA memory refresh, emulate the wait states generated by CGA, that sort of thing.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 34 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

What about repeating instructions, like REP MOVSB/W? So I need to do the buffer loading after (E)IP is reset after the instruction to catch those. What about more instructions that need to be pipelined? Like REP MOVSB followed by a second instruction needed to be executed from the same pipeline buffer and re-executing the REP MOVSB before it? Is the pipeline always cleared when re-executing it or is it saved?

again: REP MOVSB
JCXZ again

So the JCXZ might not jump to again with a 4-byte prefetch? (Total is 5 bytes large, 4 buffered at reaching again, so the high byte of the again jump pointer might be altered?)

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 35 of 63, by Scali

User metadata
Rank l33t
Rank
l33t

Pipelining is not the same as the prefetch-buffer. The prefetch-buffer is just a primitive cache. An 8088 is far too primitive to have any pipelining.
What I described was just a quick hack to make the self-modifying code in 8088 MPH work. For a proper emulation you need to emulate all bus cycles, as already mentioned earlier. This probably involves a complete rewrite of your emulation routines. Once you do that, it becomes trivial to emulate the prefetch buffer accurately. The CPU just tries to fetch bytes when the bus is idle.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 36 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

There's still another problem: with a 4-byte FIFO, how do I handle instructions with a length of more than 4 bytes? Fetch directly from memory instead of FIFO past the 4th byte? E.g. MOV modr/m addr16 imm16 is a 6 bytes long instruction. Only 4 bytes can be loaded into the prefetch buffer. Should my emulation fetch bytes 5&6 directly from RAM?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 37 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

There's still another problem: with a 4-byte FIFO, how do I handle instructions with a length of more than 4 bytes?

That's why I said you should make a buffer that can contain the largest possible x86 instruction (15+ bytes) 😀
It's not perfect, but should be good enough to run 8088 MPH, and much easier than true emulation.

If you want to properly emulate the prefetch buffer, then you have to implement that buffer the way it actually works, as said before.
So all instruction fetches go via the prefetch buffer. You have to emulate all bus cycles, and fetch instruction bytes into the prefetch buffer on idle cycles (if you don't emulate it properly, you may not have fetched enough, and the self-modifying code will be executed *before* you fetch the data, so you won't solve the problem).
Your instruction decoder will fetch one byte at a time (again, during fetching you still emulate bus cycles, so new bytes should be prefetched if possible). If the prefetch buffer runs out before your instruction is done, then you have to perform fetches into the buffer before each next decoding step.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 38 of 63, by superfury

User metadata
Rank l33t++
Rank
l33t++

This shouldn't be a problem during real mode. But what happens when prefetching in protected mode? How is memory protection and paging handled with prefetching? Does the prefetch not read memory past the segment limit? Will it error out when less than 15 bytes before the end of the code segment? What does it contain in that case?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 39 of 63, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

This shouldn't be a problem during real mode. But what happens when prefetching in protected mode? How is memory protection and paging handled with prefetching? Does the prefetch not read memory past the segment limit? Will it error out when less than 15 bytes before the end of the code segment? What does it contain in that case?

Why would you care?
You're just prefetching in an emulator. You can do the actual checking for memory protection during the decoding of instructions as you always did. It's not a problem for the emulator if the prefetch-buffer contains bytes that it shouldn't have read, as long as it still consistently checks for protection as it always did.

The only function of the buffer is to read bytes BEFORE they can be overwritten by self-modifying code, so that you still have the old instruction bytes. Everything else should remain exactly the same as it was.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/