How to make my PC speaker emulation support PCM audio? \ VOGONS

Reply 1 of 57, by Scali

Posted on 2016-01-08, 14:13

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

PWM can be done on a PC speaker by using the single-shot timer mode on channel 2.
So if you see channel 2 on the 8253 being set to single-shot, you should assume that your PC speaker simulation needs to go into 'PWM' mode.

The PWM works as follows:
In single-shot mode, the signal to the PC speaker starts low (or was it high?), and when the counter reaches 0, it flips.
This allows you to generate a pulse of a given length.
By setting a new single-shot pulse at a fixed frequency, you can perform PWM. This fixed frequency is usually done from the timer interrupt. However, in the case of 8088 MPH, it is done by cycle-counted code.

But you don't have to worry about that part really. You just need to simulate the pulse, so you need to flip your PC speaker output when the PIT reaches 0, and again when a new counter is written.
This will give you a 1-bit sample stream at 1.19 MHz. You need to downsample this to eg a 16-bit sample stream at 44.1 KHz, to play it back on the host audio device (basically you need to measure the number of samples of value 1 over a total of N samples, in other words, the average signal). Technically it shouldn't be different from what you do for regular PC speaker emulation (this should also emulate the actual sample stream generated by the PIT at 1.19 MHz to avoid aliasing). But looking at your code, it seems you took some shortcuts there.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 2 of 57, by superfury

Posted on 2016-01-08, 14:24

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

There's one problem generating it at 1.19MHz: The function for rendering the samples (at 44.1kHz) is called from the same thread as the CPU emulation, before every instruction is executed.

At my 4.0GHz PC I'm using, it's running at about 0.6-0.7MIPS (with CPU prefetching enabled), so the signal would generate about 700KHz out of 1.19MHz. Thus it won't generate samples fast enough.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 57, by Scali

Posted on 2016-01-08, 14:30

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

You don't have to run the emulation function at 1.19 MHz.
You just have to generate a signal of 1.19 MHz.
Which implies that the emulation function needs to process multiple samples at a time.
This shouldn't be too difficult, since the PIT only generates a 0 or a 1, and you know from the count-value when it switches from 0 to 1 or vice versa.

You wouldn't need to generate every sample individually... You could do a form of run-length compression by simply recording the number of 0s and 1s (basically the number of timer ticks between each flip).
Then you can process that with a simple downsampling filter to make it 44.1 KHz.
The source is 1.19 MHz, so you could do a naive filter where you average every 11932 / 441 = 27 ticks.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 4 of 57, by superfury

Posted on 2016-01-08, 21:30

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

I implemented a basic signal generation for mode 1.
https://bitbucket.org/superfury/x86emu/src/b7 … ker.c?at=master

It generates a 1.19Mhz output signal and calculates the duty cycle by averaging and using the duty as input.

Is this correct?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 5 of 57, by Jepael

Posted on 2016-01-08, 23:22

Jepael Offline

Rank Oldbie

Rank: Oldbie
Posts: 1195
Joined: 2005-06-15, 19:28
Location: Finland

That's the theoretically correct way to do it - emulate the PIT output waveform at 1.19 MHz, lowpass filter to remove everything over 20kHz it and decimate to 44100 sampling rate.

The theoretically incorrect way to do it would be to just check if the PIT is in single shot mode and if it is, then do the same thing as with Sound Source emulation - keep outputing the last value loaded to PIT at 44.1 kHz until new value is loaded to PIT. In case of Sound Source, you know it works at 7kHz, but the speaker PWM could use any rate. I recall DosBox does this.

Reply 6 of 57, by superfury

Posted on 2016-01-09, 00:43

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

It currently generates a 1.19MHz output stream(1-bit) with mode 1, then converts samplerate and calculate the 44100Hz sample duty (total number of bits set in it's equivalent 1.19Hz input stream it previously generated divided by the amount of input bits) which is then converted to 16-bit PCM and added to the rendering buffer.

The other mode (Square Wave) still currently uses the old float calculation method atm until I convert it.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 57, by Scali

Posted on 2016-01-09, 09:56

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Perhaps a better place for this is to put it in the PIT-simuation? You need to simulate the PIT at full 1.19 MHz accuracy anyway, which I suppose you trigger for every instruction you emulate, to check for timer interrupts or whatnot.
You could just make it update the PC speaker buffer in there. Then the PC speaker emulation doesn't have to do all the PIT-emulation over again, but just process the collected samples every now and then.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 8 of 57, by superfury

Posted on 2016-01-09, 22:06

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

I've updated the PC Speaker emulation with full mode 1&3(Square Wave) rendering.
https://bitbucket.org/superfury/x86emu/src/7a … ker.c?at=master

Is this correct? I made the Square Wave a simple flipflop and simplified the FIFO Buffer protection to increase speed.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 9 of 57, by superfury

Posted on 2016-01-11, 02:19

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

I've improved my emulation:
https://bitbucket.org/superfury/x86emu/src/b8 … pit.c?at=master

Now modes 0, 1, 3 & 7 should be supported. Modes 3&7 seem to work correctly according to my ears(sound output). PCM output by 8088MPH (with the CPU a bit below 8088 speed) sounds like ticks only? Anyone knows what's going wrong here? It seems to be using mode 0(Interrupt on Terminal Count)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 57, by superfury

Posted on 2016-01-11, 12:49

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

I've managed to fix most of the PIT emulation:
https://bitbucket.org/superfury/x86emu/src/a7 … le-view-default

8088MPH credits at the end seem to give some bleepy sound instead of correct music? Anyone knows about it?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 11 of 57, by Scali

Posted on 2016-01-11, 13:31

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

I just took a quick look at your PIT code, but I'm not sure if I understand correctly.
You seem to only record a single status for the PIT. So you have one status, and one reload point.
That doesn't work if tickPIT() is not called often enough.
Worst case, the PIT is set to a count of 1, which means it resets every 4 CPU cycles on an 8088 4.77 MHz system. Which means the PIT can actually change state multiple times during the execution of a single instruction.
In the case of 8088 MPH, the PIT is reinitialized every 288 CPU cycles, so you should at least have that much accuracy in your emulation to get the PWM sound to play properly.

Of course, this doesn't really work if your CPU core isn't cycle-exact anyway, but we've found that DOSBox, which is notoriously non cycle-exact, still plays back the music reasonably well, as long as you set the cycle-count correctly. There will be jitter because the instruction timing will be off compared to a real machine, but it doesn't mess up the music too badly.

In theory anyway, tickPIT() should be called at 1.19 MHz realtime, and you should remove the loops.
In practice you can predict after how many cycles a state change will occur in the PIT, so you know when to call it. Either when one of its counters goes 0, or when a CPU command resets the PIT state.

Alternatively, you can record a log of PIT commands with time stamps from the instruction stream, and 'play back' this log afterwards.
Because if you know when the PIT is reprogrammed, you always know in which mode it is, so you can always generate the proper states afterwards, generating a cycle-accurate stream of 1-bit 1.19 MHz sample data.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 12 of 57, by superfury

Posted on 2016-01-11, 15:06

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

Actually it's 3 states: one state for each channel (port 40 channel 0, 41 channel 1 and 42 channel 2).

Channel 1 is discarded. Channel 0 is used to generate interrupts and channel 2 is downsampled (using duty cycle calculation) and added to the output buffer of the sound renderer.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 13 of 57, by Scali

Posted on 2016-01-11, 15:29

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

superfury wrote:
Actually it's 3 states: one state for each channel (port 40 channel 0, 41 channel 1 and 42 channel 2).

Yea, but that's what I mean. Only one state per counter per timespan. Read what I said again, in that context. I was only looking at a single counter, because it's the same problem for all counters.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 14 of 57, by superfury

Posted on 2016-01-11, 23:36

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

Doesn't the current code already generate a correct 1.19MHz output signal for all three channels? I've based it on the OSDev wiki article on the PIT.

It doesn't make a difference if I process a loop calling tickPIT 1.19 million times per second or do it within the function itself? It will only generate more function overhead compared to the current method? (Function being called ~1190000 times a second)

About CPU synchronization: I just need to add cycle counting to the 8086/8088 to all instructions and synchronize the instructions using the high resolution clock by delaying until cycles match/precede current time(about the same way the Dosbox-style 'cycles' is applied in my emulator? Only with clock cycles used on the instruction converted to time elapsed(ns passed), added to last instruction time(starts at 0us for the first emulated instruction executed), next delay until current time(in ns) goes past or matches the new timestamp. Then you've got cycle-accurate PIT and 8086/8088 timing that matches (cycle wise, although 1.19MHz vs 4MHz).

Currently, the clock is based on a high resolution clock counting nanoseconds from the start of emulation(initEMU function). That is converted to the amount of ticks that have passed(time elapsed since last tickPIT call), added to the current time. Then the current time is divided by the 1.19MHz tick time(~1/1190000 second) to get the amount of PIT ticks passed since last call. Then those ticks are processed and added to the rawbuffer datastream.
The PIT0 functionality reads the channel 0 rawbuffer and raises an IRQ0 flag is it goes from 0 to 1 at any point in the FIFO. This flag is used for the current instruction to raise the correct IRQ(as dictated by the PIC chip), which in turn executes an interrupt(INT08 by default?). Then it proceeds to execute the instruction at the interrupt adress and onwards until the IRET(depending on software). Just plain IRQ handling.

The PIT1 output buffer is always cleared, since it's only connected to DRAM refresh(not needed to emulate the refresh afaik).

The PIT2 output buffer is downsampled to 44.1kHz by calculating duty cycle from the raw output stream of PIT2 for it's equivalent samples (each 44.1kHz sample uses 11900/441 raw buffer samples as input(added together, then dividing by the number of rawbuffer samples to get the average duty cycle. This is multiplied and substracted to convert from range 0.0-1.0 to range -32768 - 32767(16-bit PCM sample)).

Is this correct? Although I'm not sure on converting the 1-bit stream with duty cycle to 44.1kHz 16-bit PCM samples.

Edit: The problem with the PIT being faster than processor interrupts at 1.19MHz interrupts per second is also on real PCs? Since the interrupt handler takes longer than a PIT tick(with IF==0)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 15 of 57, by Scali

Posted on 2016-01-12, 08:05

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

superfury wrote:
It doesn't make a difference if I process a loop calling tickPIT 1.19 million times per second or do it within the function itself? It will only generate more function overhead compared to the current method? (Function being called ~1190000 times a second)

It does if you only record one PIT state per channel between calls while the PIT state can be changed more than once during that time.
As I say, in 8088 MPH the PIT one-shot is reprogrammed every 288 CPU cycles, so that is every 72 PIT ticks.

superfury wrote:
About CPU synchronization: I just need to add cycle counting to the 8086/8088 to all instructions and synchronize the instructions using the high resolution clock by delaying until cycles match/precede current time(about the same way the Dosbox-style 'cycles' is applied in my emulator? Only with clock cycles used on the instruction converted to time elapsed(ns passed), added to last instruction time(starts at 0us for the first emulated instruction executed), next delay until current time(in ns) goes past or matches the new timestamp. Then you've got cycle-accurate PIT and 8086/8088 timing that matches (cycle wise, although 1.19MHz vs 4MHz).

It's most important to synchronize the different components of your emulator to eachother, so the CPU, the PIT, the CRTC and possibly other hardware.
You also need to emulate bus cycles properly, so that the CGA hardware will generate wait states for the CPU. Which also includes emulating the bus cycles for instruction fetches (so that the prefetch buffer is only filled when there are bus cycles available).

That is what is required for accurate emulation of the hardware, so that code like 8088 MPH works as expected. Synchronizing it with the high resolution timer is not important for emulation accuracy itself, but only for the user, so the system runs more or less at the proper speed.

On a real machine, you have a 14.318 MHz clock to which everything is synchronized.
It is divided by 3 for the 4.77 MHz CPU core, and by 12 for the 1.19 MHz PIT.
I think for an emulator it would be enough to have a 4.77 MHz base clock and divide it by 3 for the PIT.
You also need to feed 4.77 MHz to the CRTC I believe (in a real system it also takes this from the clock signal on the ISA bus, so it is synchronized. MDA/Hercules, EGA and VGA are not synchronized to the bus, but have their own crystal, so they work asynchronously. This means that you cannot write cycle-accurate code on them, because different machines have different crystals, and you get slight deviations).

superfury wrote:
Edit: The problem with the PIT being faster than processor interrupts at 1.19MHz interrupts per second is also on real PCs? Since the interrupt handler takes longer than a PIT tick(with IF==0)?

Yes, basically if you set your timer interrupt too fast (or make your interrupt handler too long), you will hang the machine.

Edit: it might be useful to study the source of a C64 or Amiga emulator sometime.
As far as I know, nobody ever wrote a proper PC emulator. All emulators I've seen just emulate the different parts of a PC to a certain extent, but they make no attempt to emulate a *machine*, as in, how the different parts interact. There is no attempt to perform overall synchronization.
On a C64 or Amiga, you NEED to have synchronization, else a lot of things simply will not work. A lot of software depends on the video chip performing a certain operation at a certain CPU cycle, so your video chip emulation and CPU emulation have to be in perfect sync at all times in order to run this software.
8088 MPH might well be the only software in existence that does the same thing on the PC. It puts the same demands on emulation as C64/Amiga do (as well as various other home computers and consoles).
So looking at other PC emulators won't help you much, but looking at eg WinVice or WinUAE should give you some good ideas how they implemented the whole system (which is more than the sum of its parts). They emulate the individual chips, and also the internal buses, so the memory accesses of one chip can cause wait states for another chip.

The interesting thing is that if your PC emulator can do that, you can even emulate CGA snow accurately. CGA snow is basically CPU data that 'leaks' to the output circuitry in 80-column mode, because the access to memory is mutually exclusive, and the CPU overrides the data bus. The CGA card just sees whatever byte the CPU is trying to access at that moment, rather than the byte at the address it was trying to generate for its output.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 16 of 57, by superfury

Posted on 2016-01-13, 11:14

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

I'll make the 8086/8088 cycle accurate someday (but not for now). I think I will change the CPU to set its ticks passed to 1 tick per instruction for now and modify the speed correction routines to use the count of executed ticks and current running time to synchronize it's instructions 'cycle'-exact. Although with the CPU always reporting 1 tick passed it's cycle setting will just become instructions per millisecond instead of cycles per millisecond for now.

I do notice that while it's counting EMS memory (The official drivers for the Lo-tech EMS 2MB board), the PC speaker seems to make beeps when I press ESC during counting(skipping the memory test)? Why would this happen? (Bit 1 of port 0x60 shouldn't be set? Or does the driver actually set port 0x60 bit 1?)

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 17 of 57, by Scali

Posted on 2016-01-13, 11:31

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

superfury wrote:
I'll make the 8086/8088 cycle accurate someday (but not for now). I think I will change the CPU to set its ticks passed to 1 tick per instruction for now and modify the speed correction routines to use the count of executed ticks and current running time to synchronize it's instructions 'cycle'-exact. Although with the CPU always reporting 1 tick passed it's cycle setting will just become instructions per millisecond instead of cycles per millisecond for now.

I think the framerate may be a good synchronization point? The emulator doesn't have to run in 'realtime' at more accuracy than the framerate, because the user can't see the difference anyway.
So that's how I would do it: run the emulation loop until your CRTC simulation has reached the end of the screen. Generate the output image for the host OS and wait until enough 'real time' has passed to display it at approximately the right framerate. Then start the next emulation loop.

superfury wrote:
I do notice that while it's counting EMS memory (The official drivers for the Lo-tech EMS 2MB board), the PC speaker seems to make beeps when I press ESC during counting(skipping the memory test)? Why would this happen? (Bit 1 of port 0x60 shouldn't be set? Or does the driver actually set port 0x60 bit 1?)

I don't have experience with that board, but that may be deliberate. I know various BIOSes had audible feedback for the POST memory test. You'd hear a 'click' at every 64K that it tested, for example. My 286 also does this.
When you pressed ESC, it would skip the rest of the test, which was implemented as a sort of 'fast forward', so you saw the counter on the screen go up to the max memory quickly, and also heard very quick clicks, so a 'bzzzzz' sound.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 18 of 57, by superfury

Posted on 2016-01-13, 13:01

superfury Offline

Rank l33t++

Rank: l33t++
Posts: 5823
Joined: 2014-03-08, 11:25
Location: Netherlands

One problem: If I would synchronize it at the framerate (60Hz or 70Hz depending on VGA screen resolution and pixel clock(25/28MHz clock)), it would only execute 60 blocks of instructions each second. So it would execute lots of instructions in 1/60th second, then wait. Since the PC speaker refreshes at a much higher rate, it still would have rendered much more than CPU instructions expect (in the case of 8088MPH)?

I adjusted the latest code online to synchronize the CPU clocks to realtime(using the high resolution clock, just as the PIT and other hardware act in their "tickXXX" functions (tickPIT, tickDMA etc.)).
The only problem that's still left is if the CPU time counted exceeds the maximum double accuracy(e.g. goes from DOUBLE_MAX to DOUBLE_MIN by addition). Although getnspassed_k might also have overflown by then (64-bit value overflow).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 19 of 57, by Scali

Posted on 2016-01-13, 13:12

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

superfury wrote:
One problem: If I would synchronize it at the framerate (60Hz or 70Hz depending on VGA screen resolution and pixel clock(25/28MHz clock)), it would only execute 60 blocks of instructions each second. So it would execute lots of instructions in 1/60th second, then wait. Since the PC speaker refreshes at a much higher rate, it still would have rendered much more than CPU instructions expect (in the case of 8088MPH)?

There are two types of synchronization in an emulator:
1) Synchronizing the different components in the machine to a common clock signal
2) Synchronizing the emulated machine to 'realtime'

I was talking about 2).
For 1), you don't need to work in realtime, you just need to have some concept of clock ticks, and your emulator core needs to process all the machine states relative to this clock. You can execute this code as fast as the system allows, until you reach a point where you want to perform 2).
Various emulators also include a 'warp' setting, where they do not sync to realtime, but just run the emulation as quickly as possible. This is handy for 'fast-forwarding'. Eg, with a C64 emulator, you don't want to wait for the floppy to load and decrunch in realtime. Just warp it to the point where the software is ready for use. As far as the software running in the emulator is concerned, everything is still running cycle-exact to a real machine.

So my idea was to run the emulation at maximum speed ('warp') until a frame is rendered. Then synchronize to 'realtime'.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Main menu