VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

When running average software or demos, what is the average amount of cycles spent per instruction on a 8086/8088 (e.g. Dosbox-style, but with actually ~4.77M cycles per second being counted and divided by this number is the exact instructions per second)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 49, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

This is very hard to say with any measure of exactness. But I would say somewhere between 4 and 6. According to wikipedia: "the average performance for the Intel 8088 ranged from approximately 0.33–1 million instructions per second." Which would amount to something between 4 to 14 cycles. My guess is using 6 is probably safe for the bulk of instructions. That would give you about 0.8 MIPS. But this is all very hand wavey.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 2 of 49, by Scali

User metadata
Rank l33t
Rank
l33t

It will be far more than 4 on average. It takes 4 cycles just to load a single byte from memory.
Since most instructions are more than 1 byte, you tend to spend more than 4 cycles on just loading the instructions from memory.
And that's not including the access to memory that the instructions themselves might do... or the address calculation they may perform.

Problem is, it's a fundamental mistake to take an 'average' amount of cycles. Some instructions are WAY slower than others. The whole reason why all PC emulators suck so badly is because they are not representative of actual performance at all.
A single mul or div instruction can take 100 to 160+ cycles, while the fastest possible instruction would be a mov that runs from the prefetch buffer, at 2 cycles. Problem is, all instructions that can execute in 2 cycles are 2 bytes or more, so you won't fit a whole lot of them in your prefetch buffer. And as soon as you need to fetch a 2-byte instruction from memory, it adds 8 extra cycles.

So, there's no such thing as 'average', or 'one size fits all'.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 3 of 49, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Here is a histogram of executed instructions in my emulator. I got this after booting to DOS, then running Checkit, then playing ACE for a bit then starting AutoCAD. It seems between 0x8b (MOV Gv, Ev), 0x75 (JNZ) and 0x74 (JZ) you got about a third of all real instructions (0x26 is not an instruction as such, it is just ES: prefix).

I agree that there is no "size fits all" but if you want to give the "feel" of correct speed, without being accurate making every instruction take 6 or 8 cycles is still better than 1.

NOTE: forgot actually add the histogram:

0x0 441 
0x1 17798
0x2 3256
0x3 227913 ==========
0x4 693
0x5 512
0x6 103407 ====
0x7 94001 ====
0x8 23448 =
0x9 213
0xa 158556 =======
0xb 140769 ======
0xc 279
0xd 21
0xe 76275 ===
0xf 0
0x10 0
0x11 105
0x12 0
0x13 20183
0x14 0
0x15 32
0x16 68503 ===
0x17 3
0x18 0
0x19 0
0x1a 5
0x1b 2702
0x1c 0
0x1d 555
0x1e 160119 =======
0x1f 187197 ========
0x20 140
0x21 0
0x22 20469
0x23 18365
0x24 6901
0x25 187
0x26 890998 ==========================================
0x27 0
0x28 0
0x29 300
0x2a 6697
0x2b 71390 ===
0x2c 2588
0x2d 1934
0x2e 386933 ==================
0x2f 0
0x30 48
0x31 2280
0x32 72091 ===
0x33 124501 =====
0x34 135
0x35 0
0x36 638522 ==============================
0x37 0
0x38 7138
0x39 16847
0x3a 205007 =========
0x3b 223241 ==========
Show last 197 lines
0x3c 41608 =
0x3d 32874 =
0x3e 328
0x3f 0
0x40 11923
0x41 2338
0x42 7682
0x43 27350 =
0x44 0
0x45 6
0x46 233915 ===========
0x47 18144
0x48 2085
0x49 27500 =
0x4a 6900
0x4b 14234
0x4c 0
0x4d 72
0x4e 7905
0x4f 5041
0x50 338655 ================
0x51 159597 =======
0x52 157111 =======
0x53 171697 ========
0x54 348
0x55 154891 =======
0x56 107747 =====
0x57 94746 ====
0x58 156773 =======
0x59 158545 =======
0x5a 158753 =======
0x5b 167913 ========
0x5c 0
0x5d 154763 =======
0x5e 101667 ====
0x5f 94908 ====
0x60 0
0x61 0
0x62 0
0x63 0
0x64 0
0x65 0
0x66 0
0x67 0
0x68 0
0x69 0
0x6a 0
0x6b 0
0x6c 0
0x6d 0
0x6e 0
0x6f 0
0x70 0
0x71 0
0x72 393840 ==================
0x73 218150 ==========
0x74 741379 ===================================
0x75 849663 ========================================
0x76 4358
0x77 25650 =
0x78 24542 =
0x79 247
0x7a 0
0x7b 0
0x7c 14012
0x7d 3457
0x7e 5529
0x7f 5667
0x80 453451 =====================
0x81 284544 =============
0x82 0
0x83 465642 ======================
0x84 175245 ========
0x85 0
0x86 24798 =
0x87 6677
0x88 287030 =============
0x89 320770 ===============
0x8a 398172 ===================
0x8b 1256735 ============================================================
0x8c 236163 ===========
0x8d 43473 ==
0x8e 473047 ======================
0x8f 47616 ==
0x90 399
0x91 3206
0x92 685
0x93 3593
0x94 0
0x95 22
0x96 284
0x97 55886 ==
0x98 53509 ==
0x99 1996
0x9a 103923 ====
0x9b 0
0x9c 94029 ====
0x9d 28846 =
0x9e 259
0x9f 259
0xa0 84938 ====
0xa1 61972 ==
0xa2 63742 ===
0xa3 116257 =====
0xa4 9853
0xa5 294542 ==============
0xa6 11337
0xa7 1032
0xa8 36189 =
0xa9 693
0xaa 26518 =
0xab 106002 =====
0xac 10261
0xad 65728 ===
0xae 40382 =
0xaf 0
0xb0 8333
0xb1 3755
0xb2 3788
0xb3 195
0xb4 115172 =====
0xb5 658
0xb6 8
0xb7 10182
0xb8 68429 ===
0xb9 30007 =
0xba 54500 ==
0xbb 97007 ====
0xbc 2606
0xbd 664
0xbe 23945 =
0xbf 7341
0xc0 0
0xc1 0
0xc2 0
0xc3 399575 ===================
0xc4 178661 ========
0xc5 108033 =====
0xc6 48042 ==
0xc7 106361 =====
0xc8 0
0xc9 0
0xca 35385 =
0xcb 178968 ========
0xcc 0
0xcd 61479 ==
0xce 0
0xcf 86300 ====
0xd0 181057 ========
0xd1 471987 ======================
0xd2 2113
0xd3 2382
0xd4 0
0xd5 0
0xd6 0
0xd7 1587
0xd8 0
0xd9 2
0xda 0
0xdb 2
0xdc 0
0xdd 0
0xde 0
0xdf 0
0xe0 15383
0xe1 0
0xe2 268488 ============
0xe3 3735
0xe4 427
0xe5 0
0xe6 2427
0xe7 0
0xe8 488254 =======================
0xe9 30352 =
0xea 991
0xeb 494524 =======================
0xec 185888 ========
0xed 0
0xee 6524
0xef 0
0xf0 0
0xf1 0
0xf2 7363
0xf3 11568
0xf4 0
0xf5 1347
0xf6 180661 ========
0xf7 24726 =
0xf8 55977 ==
0xf9 29468 =
0xfa 88591 ====
0xfb 125964 ======
0xfc 38955 =
0xfd 12
0xfe 120806 =====
0xff 310209 ==============

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 4 of 49, by Stiletto

User metadata
Rank l33t++
Rank
l33t++

reenigne was working on a PCB that could be used for measuring clocks per instruction on an 8088 for a cycle-exact XT emulator.
http://www.reenigne.org/blog/isa-bus-sniffer-update/
http://www.reenigne.org/blog/isa-bus-sniffer/
http://www.reenigne.org/blog/i-bought-an-xt/

"I see a little silhouette-o of a man, Scaramouche, Scaramouche, will you
do the Fandango!" - Queen

Stiletto

Reply 5 of 49, by superfury

User metadata
Rank l33t++
Rank
l33t++

I don't have an XT(Nor do I think I'll ever have), so all I can rely on for now is the information I can find on the internet.

I've just set it up to use 8 cycles/instruction on 80(1)86/88 CPUs and the old 4 cycles/instruction on 286+ CPUs. 8088 MPH worked somewhat(end credits music) with 4 cycles, still need to test on 8 cycles.

Edit: Just tested running 8088 MPH. The music sounds kind of correct (recognisable), but it still stutters (CPU running at ~71% speed according to the CPU indicator when enabled, thus explaining the missing audio chunks (every second only being filled ~70% of 44100(samplerate) samples instead of all that's needed, since all audio output generation (except MIDI SF2 emulation) is tied to the CPU core emulation(the clock cycles reported to have run by the CPU for the current instruction))).

Test results of a Intel i7 4790K@4.0GHz (not overclocked due to stock cooling, using the x86EMU commit of 2016/02/19 02:25):

MIPS 1.10 is reporting: Default: 250 cycles:
General Instructions: 2.07 0.84
Integer Instructions: 3.62 1.48
Memory to Memory: 2.49 1.02
Register to Register: 3.31 1.35
Register to Memory: 1.92 0.79
Performance rating: 2.44 1.00

With 8 cycles applied to the Default setting(newly hardcoded in the emulator for pre-286 CPUs, 286+ still use 4 cycles).

According to 8088 MPH deviating 6% with 250 cycles setting. Better results are when the cycles setting is at ~595 cycles (Set while playing though(shouldn't make a difference to the software?). )Dosbox-style cycles, e.g. 1 cycle/instruction). Although there's still a high pitched noise between the music.

Edit: I've also decreased EMS to 2MB according to the emulated board, and MMU(RAM) memory on 80(1)86/88 to 1MB. This makes the app use less memory(which is basically unaddressable or simply unused in the case of the EMS memory board(128/256 pages used by the driver)).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 6 of 49, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

Although there's still a high pitched noise between the music.

That is the nature of the PWM technique of playing samples.
Since you generate pulses at a fixed rate, this automatically creates a 'carrier' frequency for every start of the pulse.
On real hardware, the end tune of 8088 MPH runs at ~16.5 KHz, so you should hear a carrier frequency of 16.5 KHz in the music.
Unlike most PWM routines, it is not synchronized to the PIT, but the code is 'free-running'. That is, the time between samples is padded with nops (or nop-like instructions) so that the time between the out-instructions of playing a sample is exactly 288 cycles (since 4.77 MHz / 288 == ~16.5 KHz).
Which means your emulator needs to execute these instructions at exactly the same speed as well, to get proper results.

If your emulation runs slower, then the frequency will also be lower, and it will be more noticeable (16.5 KHz is near the limit of hearing of a grown human... it's also near the limit of a real PC speaker).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 7 of 49, by superfury

User metadata
Rank l33t++
Rank
l33t++

One little question: if the carrier is at ~16.5kHz, won't the samples be distorted with high tones(above 16.5 / 2 = 8.25kHz frequencies in the song, when sampled at 16.5kHz?)

Also, I'm still thinking on how to apply the different combinations of cycles in instructions universally to my instruction execution.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 8 of 49, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

One little question: if the carrier is at ~16.5kHz, won't the samples be distorted with high tones(above 16.5 / 2 = 8.25kHz frequencies in the song, when sampled at 16.5kHz?)

Well, because of Nyquist, you can't have tones above 8.25 KHz anyway. And indeed, it all sounds rather grainy and distorted, especially in the high frequencies (there's also considerable quantization noise, since we only have ~6 bits of precision, and samples are quantized before mixing, so you effectively have only 17 unique values of amplitude per channel, if I'm not mistaken. And no interpolation for any resampling either, of course. In a way it's amazing it sounds as good as it does, given how limited the precision is 😀).
The videos we made of 8088 MPH on YouTube were tapped directly from the PC speaker output of a PC. So that is what it sounds like exactly (although it seems the YouTube encoding process has filtered off most of the carrier whine. I could extract a raw .wav file from my capture, if you like).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 9 of 49, by superfury

User metadata
Rank l33t++
Rank
l33t++

That .wav file would come in handy for comparing with the output of my emulator. It might be giving 'correct' output with 8088 MPH after all (Recording emulated output is also possible using it's sound menu option in the BIOS settings menu)? It's giving about correct results with the new Default setting(8 CPI@4.77MHz). At least it sounds correct, considering the output lag(audio output clearing buffer faster than the emulated CPU fills it?)

Btw CPI=Cycles per instruction.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 49, by superfury

User metadata
Rank l33t++
Rank
l33t++

I implemented 9 cycles with 8088 (to get 8088 MPH to properly produce sound) and the original 8 cycles for the 8086. 8088MPH credits now seem to work fine, with a low LFO-like sound going through it.

I tried it with 8 cycles, but 9 cycles results in better sound quality with 8088 MPH. 10 cycles makes it barely hearable (falling into the background of (white) noise).

Check out my latest x86EMU release for this version (x86EMU build 2016/02/22 14:52).

(Although according to a simple calculation, 288 cycles both divides by 8 and by 9 cycles, so both could theoretically work. It doesn't divide cleanly by 10, so that would explain the quality loss? I don't know yet why 9 cycles gives better audio quality than 8 cycles? Maybe the 8088 MPH demo takes a bit longer than 8 cycles to perform it's PWM loop? 8 cycles most of the time with sometimes 9 cycles on average? This would explain why 9 cycles gives better audio.)

Scali, recorded playback of 8088 MPH: http://www.filedropper.com/recording10
Is this what you meant by the carrier that's filtered out on Youtube? (PWM starts at 10:40)? Or is this just my high pass filter messing up?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 11 of 49, by SoftPCMuseum_

User metadata
Rank Newbie
Rank
Newbie
Scali wrote:
It will be far more than 4 on average. It takes 4 cycles just to load a single byte from memory. Since most instructions are mor […]
Show full quote

It will be far more than 4 on average. It takes 4 cycles just to load a single byte from memory.
Since most instructions are more than 1 byte, you tend to spend more than 4 cycles on just loading the instructions from memory.
And that's not including the access to memory that the instructions themselves might do... or the address calculation they may perform.

Problem is, it's a fundamental mistake to take an 'average' amount of cycles. Some instructions are WAY slower than others. The whole reason why all PC emulators suck so badly is because they are not representative of actual performance at all.
A single mul or div instruction can take 100 to 160+ cycles, while the fastest possible instruction would be a mov that runs from the prefetch buffer, at 2 cycles. Problem is, all instructions that can execute in 2 cycles are 2 bytes or more, so you won't fit a whole lot of them in your prefetch buffer. And as soon as you need to fetch a 2-byte instruction from memory, it adds 8 extra cycles.

So, there's no such thing as 'average', or 'one size fits all'.

Quite correct - certain instructions take more clock cycles, while others can execute in fewer cycles. It also depends on the CPU model and mode of operation - in writing my own 80386 CPU emulator, and from reading the Intel 80386 Programmer's Reference in the process, I have found that instructions can just as easily have different clock cycles depending upon whether they happen to be running in Real Mode, Protected Mode, or Virtual 8086 Mode.

PCE, the emulator that I am basing my own emulator on, is what is called a cycle-accurate emulator. That is, it implements the correct clock cycles for the 8086 and 80186 CPUs, and in just the same way, my own projects will likewise implement the correct clock cycles for the 80286, 80386, and higher CPUs. PCjs is also cycle-accurate for the 8086, 80286, and 80386 CPUs.

Reply 12 of 49, by Scali

User metadata
Rank l33t
Rank
l33t
SoftPCMuseum_ wrote:

PCE, the emulator that I am basing my own emulator on, is what is called a cycle-accurate emulator. That is, it implements the correct clock cycles for the 8086 and 80186 CPUs, and in just the same way, my own projects will likewise implement the correct clock cycles for the 80286, 80386, and higher CPUs. PCjs is also cycle-accurate for the 8086, 80286, and 80386 CPUs.

I hate to break it to you, but, no, they aren't cycle-accurate. Namely:
1) Although these emulators make some effort in incorporating the cost of each instruction, they are just ballpark figures. They do not account for all the variables that affect the speed of execution of each instruction.
2) The only thing they try to emulate in terms of clock-cycles is the CPU itself, and only a fixed cost per executed instruction. They do not emulate things like the prefetch buffer, the data bus, or other hardware which may steal cycles or add wait states (such as DMA refresh or CGA memory).

A proper cycle-accurate emulator emulates the system as a whole. For good cycle-accurate emulators, take a look at VICE or UAE.
No such thing exists for PC yet (and it can't exist yet, because the exact behaviour of the 8088 CPU and the interaction with the supporting hardware at the cycle-level has never been fully documented so far).
Which is why 8088 MPH breaks all emulators.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 13 of 49, by Scali

User metadata
Rank l33t
Rank
l33t
superfury wrote:

I tried it with 8 cycles, but 9 cycles results in better sound quality with 8088 MPH. 10 cycles makes it barely hearable (falling into the background of (white) noise).

The thing with PWM is that the amplitude is a function of the replay speed (unlike PCM). You program the duty cycle of your wave for every PWM sample you emit.
Now, the duty cycle here of each sample here is fixed... But the replay rate is not. The result is that when your CPU runs too quickly, it will reprogram the timer before the previous sample finished playing. This can lead to all sorts of weird aliasing.
If the CPU runs too slowly, then the timer will sit idle for longer than you anticipated when you converted your samples to duty cycle values. As a result, the amplitude is lower than expected.

superfury wrote:

Scali, recorded playback of 8088 MPH: http://www.filedropper.com/recording10

I can't seem to download this file.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 14 of 49, by superfury

User metadata
Rank l33t++
Rank
l33t++

It was still there when I uploaded it. It seems to have been removed for inactivity or something like that? Unfortunately the laptop cooling ventilation I'm using this weekend seems to go wrong(it makes metallic jumping noises, like some loose metal's gotten in the fan, with little smoke coming out of it. It smelled like a bit of solder vaporizing(which is poisonous). I immediately shut down the laptop, maybe my father can look at it next week in order to repair it. Although the laptop's getting old(6-8 years old Intel Core Duo x64@2.0Ghz), so it might be better to start looking for a new one?). This happened first about 2 weeks ago, when the laptop fell from about 0.5m on it's right side(ventilator is on the left side). It's a Acer Aspire 7741ZG laptop.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 15 of 49, by SoftPCMuseum_

User metadata
Rank Newbie
Rank
Newbie
Scali wrote:
I hate to break it to you, but, no, they aren't cycle-accurate. Namely: 1) Although these emulators make some effort in incorpor […]
Show full quote
SoftPCMuseum_ wrote:

PCE, the emulator that I am basing my own emulator on, is what is called a cycle-accurate emulator. That is, it implements the correct clock cycles for the 8086 and 80186 CPUs, and in just the same way, my own projects will likewise implement the correct clock cycles for the 80286, 80386, and higher CPUs. PCjs is also cycle-accurate for the 8086, 80286, and 80386 CPUs.

I hate to break it to you, but, no, they aren't cycle-accurate. Namely:
1) Although these emulators make some effort in incorporating the cost of each instruction, they are just ballpark figures. They do not account for all the variables that affect the speed of execution of each instruction.
2) The only thing they try to emulate in terms of clock-cycles is the CPU itself, and only a fixed cost per executed instruction. They do not emulate things like the prefetch buffer, the data bus, or other hardware which may steal cycles or add wait states (such as DMA refresh or CGA memory).

A proper cycle-accurate emulator emulates the system as a whole. For good cycle-accurate emulators, take a look at VICE or UAE.
No such thing exists for PC yet (and it can't exist yet, because the exact behaviour of the 8088 CPU and the interaction with the supporting hardware at the cycle-level has never been fully documented so far).
Which is why 8088 MPH breaks all emulators.

First, you realize of course that when developing a "cycle-accurate" emulator, the clock cycles are normally taken directly from the CPU reference manuals for each instruction, which at least is what I do when developing my own emulated Intel 80386 CPU from the PCE source code.

Second of all, PCE actually does emulate the prefetch queue (and so does IBMulator too for that matter), among other things, so you should really check the source code out yourself before making such statements.

And finally, at least for the 80386, you can easily find the individual clock cycles in the Intel 80386 Programmer's Reference Manual (1987).

Reply 16 of 49, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
SoftPCMuseum_ wrote:

First, you realize of course that when developing a "cycle-accurate" emulator, the clock cycles are normally taken directly from the CPU reference manuals for each instruction, which at least is what I do when developing my own emulated Intel 80386 CPU from the PCE source code.

Second of all, PCE actually does emulate the prefetch queue (and so does IBMulator too for that matter), among other things, so you should really check the source code out yourself before making such statements.

And finally, at least for the 80386, you can easily find the individual clock cycles in the Intel 80386 Programmer's Reference Manual (1987).

That is not what I see in the source code of your release (from cpu/e8086/pqueue.c)

/*
* Before an instruction is executed the prefetch buffer (pq_buf) is
* filled up to pq_fill bytes. After the instruction is executed
* the instruction bytes are discarded and the remaining bytes up
* to pq_size are copied to the front of the queue. Bytes between
* pq_size and pq_fill are discarded (and possibly read again from
* RAM).
*
* The prefetch buffer is filled with pq_fill instead of pq_size bytes
* so that there is always at least one entire instruction in the
* prefetch buffer. Yes, this is ugly.
*/

That is not how the prefetch buffer is working at all. Like Scali said, I also do not know of any cycle accurate emulator (maybe MESS, does MESS do a better job?).

The other problem is, on 8088 (especially IBM PC and XT) it is somewhat easier to develop an almost cycle accurate emulator but when it comes to 80386 all bets are off, as by that point the number of motherboards was pretty large each with its own peculiarities, with and without cache, memory types, ISA/EISA bus, etc such that a proper cycle accurate emulator is nigh-impossible.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 17 of 49, by Scali

User metadata
Rank l33t
Rank
l33t
SoftPCMuseum_ wrote:

Second of all, PCE actually does emulate the prefetch queue (and so does IBMulator too for that matter), among other things, so you should really check the source code out yourself before making such statements.

I checked the code, and didn't see anything that actually emulates the data bus. The prefetch queue does its prefetching in cycles where the data bus is idle. If you do not emulate the data bus, you do not emulate idle cycles, and as such, you do not actually emulate the prefetch queue.
Much like how you don't actually do cycle-exact emulation if you just use hardcoded values for each instruction (the manuals do explain how data access and calculation of EA's etc affect timing, so most of what I said is already in the CPU manuals. Not that those manuals are any kind of excuse not to emulate the rest of the system's effect on the CPU, such as DMA refresh, wait states from memory or other devices etc).

Come back when your emulator can run 8088 MPH, *then* we're talking about cycle-exact emulation.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 18 of 49, by Jepael

User metadata
Rank Oldbie
Rank
Oldbie

Of course a perfectly written CPU manual is the absolute truth for the CPU. But only for the CPU.

From what you say in your first point, it appears you think cycle accurate means only the CPU. To me and many others, cycle accurate emulation means the whole computer system (PC, arcade cabinet, console, handheld) as a whole, including other peripherals as well, like video and sound generation, that have to be in sync with CPU execution, or the screen drawing or sound generation could fail spectacularly.

Different memory areas can have different wait states (fetching from BIOS ROM could be slower than RAM, writing to RAM could be faster than video memory), so accessing different memory areas can cause different execution times with same instruction.
The CPU is not also the only thing that uses the bus, as there will be things that occasionally prevent the CPU from accessing the bus which causes extra cycles to instructions. For example things like periodic DRAM memory refresh cycles, and DMA cycles if a sound card is using DMA to play something.

I bet there is no exact information in the manuals how a given CPU works, if for an opcode X running on the Yth cycle the bus is busy for Z cycles. Normally these of course don't have to be emulated, but to fully run 8088 MPH, just emulating the CPU in a cycle accurate way is not enough, it needs to emulate at least timer hardware (for memory refresh timing and sound generation) and video adapter timing as well. Why? Because it is built to run on specific hardware configuration, so that a given code block executes at given point of drawing the screen for example.

edit: had a too long break in writing, many people said approx the same things.

Reply 19 of 49, by SoftPCMuseum_

User metadata
Rank Newbie
Rank
Newbie

Well for one thing, this topic DID say "8086 average clocks per instruction", which implies that it was talking only about the CPU alone (by implying the Intel 8086 specifically by name). That was why I replied the way that I did. If the title made it clear that it was talking about the entire machine, then I would have replied differently.

Second of all, nowhere was I denying that the PCE code had issues that needed to be addressed. And nowhere did I say that its emulation of the prefetch queue was necessarily perfect either. What I simply said, was that it emulates the prefetch queue (as opposed to simply not having one at all).

Third, I'm already considering adding in support for the Intel 8288 bus controller once I get to that area of the IBM PC Technical Reference manual, but right now, I still at the ROM BIOS listing.