VOGONS


First post, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

No post about 8088MPH can go without saying how awesome that demo is!

However I do have some questions regarding some innards to better run it in my emulator.

1) What mechanism does 8088MPH use to determine processor speed in the beginning?

2) During the bobbing up and down scrolling effect (the one in between each part of the demo) it seems the scrolling is achieved by modifying start address. However this is done in 320x200 graphics mode but the start address changes in values of 40 (and it is multiple of 40). What is the start address in units of? Cannot be pixels?

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 1 of 18, by Alegend45

User metadata
Rank Newbie
Rank
Newbie
vladstamate wrote:
No post about 8088MPH can go without saying how awesome that demo is! […]
Show full quote

No post about 8088MPH can go without saying how awesome that demo is!

However I do have some questions regarding some innards to better run it in my emulator.

1) What mechanism does 8088MPH use to determine processor speed in the beginning?

2) During the bobbing up and down scrolling effect (the one in between each part of the demo) it seems the scrolling is achieved by modifying start address. However this is done in 320x200 graphics mode but the start address changes in values of 40 (and it is multiple of 40). What is the start address in units of? Cannot be pixels?

I don't know about question 1, but question 2 is fairly simple. The start address on a CGA is in units of bytes.

Reply 2 of 18, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

That is what I thought too. But I have to multiply that by 2 to get proper aligned scrolling. Because a line of pixels in 320x200 graphics mode is 80 bytes not 40. This could be related to this bit that Trixter says here on his page:

"Because the 6845 displays two onscreen rows per “row”, the text could only move to even lines, which is why the movement isn’t as smooth as it could be."

However I would like to understand better why 40 is the value to add (or subtract) to the starting offset to scroll smoothly up and down.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 3 of 18, by Jorpho

User metadata
Rank l33t++
Rank
l33t++
vladstamate wrote:

1) What mechanism does 8088MPH use to determine processor speed in the beginning?

I would have assumed that it doesn't. Is it not intended to run on exactly one very specific architecture?

Reply 4 of 18, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
Jorpho wrote:

I would have assumed that it doesn't. Is it not intended to run on exactly one very specific architecture?

Oh it is. In only runs properly on a 4.77Mhz original IBM PC XT machine with a CGA card. It uses cycle counting for almost all effects and specific timers. That is why emulators have so much trouble with it.

For my emulator it reports 5000% speed of a 8088 and I need to understand what mechanism they use to find my bug.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 5 of 18, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
vladstamate wrote:

No post about 8088MPH can go without saying how awesome that demo is!

Thanks! Glad you like it!

vladstamate wrote:

1) What mechanism does 8088MPH use to determine processor speed in the beginning?

It runs a piece of code that contains lots of different instructions, reads the timer before and after, and subtracts the two values. Pretty standard stuff.

vladstamate wrote:

2) During the bobbing up and down scrolling effect (the one in between each part of the demo) it seems the scrolling is achieved by modifying start address. However this is done in 320x200 graphics mode but the start address changes in values of 40 (and it is multiple of 40). What is the start address in units of? Cannot be pixels?

The start address is in units of CRTC characters. Each CRTC character is two bytes, so to scroll by one character row, the start address must be changed by 40 CRTC characters in the active part of a scanline in standard graphics and 40-column text modes, and 80 in standard 80-column text modes. Each row is 2 scanlines in standard graphics modes and 8 scanlines in standard text modes. It is possible (though much more difficult) to scroll vertically with scanline granularity but the bobbing text effect in 8088MPH doesn't do this.

Reply 6 of 18, by Alegend45

User metadata
Rank Newbie
Rank
Newbie
vladstamate wrote:
Jorpho wrote:

I would have assumed that it doesn't. Is it not intended to run on exactly one very specific architecture?

Oh it is. In only runs properly on a 4.77Mhz original IBM PC XT machine with a CGA card. It uses cycle counting for almost all effects and specific timers. That is why emulators have so much trouble with it.

For my emulator it reports 5000% speed of a 8088 and I need to understand what mechanism they use to find my bug.

That... sounds like it might be a PIT timing bug, honestly.

Reply 7 of 18, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, I guess you will need around 100% cycle accuracy in order to get everything including the Kefrens Bars effect running properly. Even UniPCemu with it's near-100% accuracy(on CPU), with 4% off still doesn't get the Kefrens Bars effect fully correct. Although the rest of the demo runs a lot better with the accuracy currently implemented. The only problems left are mostly the (I)DIV timings(unknown timings) and DMA cycle accuracy(currently 10 14MHz cycles per memory access). Anyone has figured those exact timings of (I)DIV out yet? I can't find out any more than the documentation's min-max cycles.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 8 of 18, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Well, I guess you will need around 100% cycle accuracy in order to get everything including the Kefrens Bars effect running properly.

Agreed.

superfury wrote:

The only problems left are mostly the (I)DIV timings(unknown timings) and DMA cycle accuracy(currently 10 14MHz cycles per memory access).

I've observed in bus sniffer timings that it can be as many as 18, as there seem to be at least 4 CPU cycles from an the S4 state of the DMA transfer to the (resumed) T4 state of the interrupted CPU transfer.

superfury wrote:

Anyone has figured those exact timings of (I)DIV out yet? I can't find out any more than the documentation's min-max cycles.

I haven't done this yet, but consider how division is implemented on a CPU using only subtraction and comparison. To compute a/b, calculate N such that b<<N is less than or equal to a but b<<(N+1) is greater than a. Then subtract b<<N from a and increase the divisor by 1<<N. Repeat until a is less than or equal to b, at which point a will be the remainder. Write down this algorithm and think of all the places in it that are likely to take some time when implemented in silicon/microcode. Then tune the cycle counts for all these places until it matches the observed numbers.

However, DIV and IDIV aren't used in the Kefrens effect. So, even if that is implemented I think there's still a bit piece of the puzzle missing - specifically, the data about when (within the execution of an instruction) each byte of the instruction is removed from the prefetch queue (and how, if applicable, that depends on the state of the prefetch queue and the bus). Unless you've already got a model for that derived from the sniffer logs. But any such model needs to be validated against a lot of different pieces of code (covering all the instructions and including lots of edge cases involving DMAs and wait states) before we can have any confidence in it.

Reply 9 of 18, by superfury

User metadata
Rank l33t++
Rank
l33t++

Easier said than done: I understand a bit about the timings used for x86 instructions(base cycles, add cycles for shifts or delays etc.), but I don't know anything about how it works on the silicon level, unfortunately(besides the way I know about transistors passing or blocking, depending on the voltage etc.). I just know the basic formulas I find and apply them in my code(until it works, basically). Mostly trial-and-error in that case, trying to make software match what I can find out about the CPU etc..

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 18, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Another question. The demo sets for few effects (like the 1024 colors one) VERTICAL_DISPLAYED register to 2 forcing the entire screen to have 2 character scanlines with a MAX_SCANLINE value of 0 (which is 1 scanlines). I do not understand how does this translate to what the CGA monitor (not the adapter) is doing. In other words how do I emulate that?

What is the resolution of the framebuffer that I am emulate into (the one that will be displayed on my modern PC, the equivalent of the CGA monitor itself)? 80x25 characters?

1) How does the CGA adapter deal with only rendering 2 character scanlines each with only 1 scanline height? Does it go back to read from start address+offset?
2) How does the CGA monitor deal with that? Does the beam keep going down to the next scanline? Or does the beam do a vertical retrace?

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 11 of 18, by superfury

User metadata
Rank l33t++
Rank
l33t++

The vertical retrace only happens once the next scanline is loaded, while the vertical retrace start is lower/equal to the current idea of the rendered scanline. The effect works by keeping it out of range, preventing retrace from happening due to modulo, until it's time to happen, at which point it's moved inside the range, it matches the buffer scanline and retrace happens. Directly after it, it's moved outside of range to render a new frame.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 12 of 18, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie

To understand what's going on here you really need to understand how the CGA card tells the monitor the size of the active image. The answer is: it doesn't. The only information the CGA card sends the monitor is the signal itself (either composite or RGBI) and the sync pulses. So as long as the sync pulses are in the right places, the monitor doesn't care if the CRTC's idea of the image size is 2 scanlines or 262. On receiving a vertical sync pulse, a real CRT won't necessarily immediately move the beam to the top of the screen - it only has an effect if the beam is close to the place where the vertical retrace would be happening anyway. So the monitor has its own idea about the image size. In the NTSC standard, this is defined by the sync frequencies - 15.734kHz horizontal, 59.94Hz vertical (with a certain tolerance). This works out at 910 hdots by 262.5 scanlines.

The CGA card doesn't generate a signal with precisely these timings, though - it generates a signal that is 912 hdots by 262 scanlines. However, all real NTSC CRTs have sufficient tolerance in their sync frequencies to display such an image correctly. So, to emulate the machine properly you need another set of horizontal and vertical counters (separate from the ones in the CRTC). These counters form part of the emulated CRT. If you get a horizontal sync pulse at hdot 902-918 then act on it, and likewise for a vertical sync pulse between scanline 248-276. If the beam gets to the end of this region then do a flyback anyway (resetting the counter to zero). These tolerances were measured from an IBM 5153.

For the 1K colour mode specifically, the 2 scanlines-per-CRTC-frame mode works like superfury said - the end-of-frame happens before the vertical sync position so the CRTC restarts its "frame" (including resetting the memory counter to the start address) but the monitor doesn't. On the last row the CRTC timings are changed so that a real sync pulse happens in the right place.

Reply 13 of 18, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

Easier said than done: I understand a bit about the timings used for x86 instructions(base cycles, add cycles for shifts or delays etc.), but I don't know anything about how it works on the silicon level, unfortunately(besides the way I know about transistors passing or blocking, depending on the voltage etc.). I just know the basic formulas I find and apply them in my code(until it works, basically). Mostly trial-and-error in that case, trying to make software match what I can find out about the CPU etc..

You don't really need to know much about the low-level circuitry of the CPU to figure this out, I think. You just need to know what operations the CPU can do in a fixed number of cycles. We know the CPU can't do a division in a fixed number of cycles (because the timing depends on the divisor and dividend). We know the CPU can do a subtraction, a compare and a 1-bit shift in a fixed number of cycles (because the timings for SUB, CMP, SHL and SHR are fixed). So we need to write down an algorithm for division in terms of these things. I outlined one above which I think is the one the 8086/8088 uses. Then its just a matter of tweaking the timings in it until it fits (or trying to come up with a different algorithm if it doesn't).

Reply 14 of 18, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Thank you reenigne and superfury. That makes a lot more sense. One more thing I am a bit confused on: So when I am in say 80x25 text mode what do I set the resolution of my emulated framebuffer (the one that SDL displays on the screen)? I had it to HORIZONTAL_DISPLAYED * VERTICAL_DISPLAYED (taking into account correct units, characters) but that does not work when the demo does the trick with VERTICAL_DISPLAYED=2. It does work for all other effects in the demo though.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 15 of 18, by superfury

User metadata
Rank l33t++
Rank
l33t++

You will at least have to process the vertical timing in order for the effect to work. Horizontal timing might be needed(for retrace detection to work properly). In the case of the Kefrens Bars effect it won't do for anything less than 100% accuracy afaik.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 16 of 18, by reenigne

User metadata
Rank Oldbie
Rank
Oldbie
vladstamate wrote:

Thank you reenigne and superfury. That makes a lot more sense. One more thing I am a bit confused on: So when I am in say 80x25 text mode what do I set the resolution of my emulated framebuffer (the one that SDL displays on the screen)? I had it to HORIZONTAL_DISPLAYED * VERTICAL_DISPLAYED (taking into account correct units, characters) but that does not work when the demo does the trick with VERTICAL_DISPLAYED=2. It does work for all other effects in the demo though.

Ideally your output buffer should not depend on the CRTC values (after all, reprogramming the CRTC in my 5160 does not cause the physical size of my 5153 to change!). So your output buffer should be 640 hdots wide by 200 scanlines high, plus (say) 10% for overscan on each side giving 768*240. Then you want to scale this so that the standard active image has aspect ratio 4:3 (i.e. by 2.4 times more vertically than horizontally), perhaps to 768*576 (1:1 scaling horizontally) or 640*480 (2:1 scaling vertically). And just tune the position so that with the standard CRTC register values, the active image is centered in the output buffer.

Reply 17 of 18, by Scali

User metadata
Rank l33t
Rank
l33t
reenigne wrote:
vladstamate wrote:

Thank you reenigne and superfury. That makes a lot more sense. One more thing I am a bit confused on: So when I am in say 80x25 text mode what do I set the resolution of my emulated framebuffer (the one that SDL displays on the screen)? I had it to HORIZONTAL_DISPLAYED * VERTICAL_DISPLAYED (taking into account correct units, characters) but that does not work when the demo does the trick with VERTICAL_DISPLAYED=2. It does work for all other effects in the demo though.

Ideally your output buffer should not depend on the CRTC values (after all, reprogramming the CRTC in my 5160 does not cause the physical size of my 5153 to change!). So your output buffer should be 640 hdots wide by 200 scanlines high, plus (say) 10% for overscan on each side giving 768*240. Then you want to scale this so that the standard active image has aspect ratio 4:3 (i.e. by 2.4 times more vertically than horizontally), perhaps to 768*576 (1:1 scaling horizontally) or 640*480 (2:1 scaling vertically). And just tune the position so that with the standard CRTC register values, the active image is centered in the output buffer.

I'd like to add that there may not be a single answer here.
If I look at VICE, the popular C64 emulator, it allows you to select 4 different border sizes:
- Normal borders
- Full borders
- Debug borders
- No borders

They all make sense, in some way.
'Normal' is what you'd usually get on a regular TV/monitor. Say 10% overscan area.
'Full' borders is basically 'maximum' overscan, which some people may have tuned in on their monitors back in the day (the popular 1084S monitors and related models had the controls on the outside), so that they would not miss any of that border-sprite action on the C64.
'Debug' borders is for software development: it basically shows everything, even the part of the signal that would not be physically visible on a CRT, since that's where it does the actual moving of the beam from right to left.
If you have a cycle-exact emulator (like VICE is), it makes sense to have this option: it allows you to visualize cycle-exact code very accurately.
'No borders' will just show the regular graphics area, which may have its uses, eg when you're trying to make screenshots of something. Basically what DOSBox and many other emulators do (I can't get over how wrong this is, since CGA/EGA/VGA very much do have borders which you can set to a given colour, much like the C64... which is used for effects in some games/demos).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 18 of 18, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've noticed something strange in UniPCemu: when using reenigne's NTSC decoding routine(Old-style NTSC mode), when updating text(like printing the initial text of 8088MPH showing the emulation being 4% off compared to the real CPU), it looks like it's toggling the NTSC vs RGB mode(or switching it into b/w mode)? Although the Setting shouldn't be changed? Is that a bug in the NTSC rendering, or some fault in my emulator? It only happens when scrolling the text window up/down at the starting of 8088 MPH.

Edit: Looking at the way the rendering is updated, I notice that only bit 2 (value 4) of the CGA Mode Control Register is toggling on and off while the display is updated(the display is shifted up by the MS-DOS&BIOS video routine). Is that supposed to happen in MS-DOS 5.0(Turbo XT BIOS v3.0)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io