Thread experimentation, threaded audio emulation and mixing. \ VOGONS

Thread experimentation, threaded audio emulation and mixing.

Topic actions

First post, by awgamer

Posted on 2020-08-19, 06:26

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

Thought others might find this experimentation of interest. Some games are currently not happy like msfs 5.x, stunt island, xwing, but works without a hitch for many. synthesized audio aka fm etc., dac processing, and mixer mixing should all be offloaded to the thread. Threading overhead can be dropped to nill but initial methods were heavy. Read comments in the diff. If it weren't for xwing which will start mute but occasionally plays fine, fm playback would be error free from what testing I've done, the other two have an issue with dac playback.

audio interfacing is done through the mixer, callback handler for an audio device is installed to mixer, the pic runs mixer_mix, mixer_mix runs the handlers after peeling away a couple of layers, callbacks all conform to the same pattern of generate synthesized data, mixer addsample. exception to the rule is mpu401 which does things differently, ignore that guy.

1adlib
2 mixerchan.install
3   opl_callback
4     generate
5       opl2
6         adlib_getsample
7         addsamples_m16
8       opl3
9         adlib_getsample
10         addsamples_s16
11       mameopl2
12         ym3812_update_one
13         addsamples_m16
14       mameopl3
15         ymf262_update_one
16         addsamples_s16
17dbopl  DosBox opl? 
18  generate
19    opl3
20      generateblock2
21      addsamples_m32
22    else
23      generateblock3
24      addsamples_s32
25disney
26  mixerchan.install
27    disney_callback
28      stereo 
29        disney_playstereo
30          buffer=
31          addsamples_s8
32      mono 
33        buffer=
34        addsamples_m8        
35gameblaster
36  mixerchan.install
37    cms_callback
38      sound_stream_update
39      addsamples_s32      
40gus
41  GUS_DMA_Callback
42    r/w between system and gus ram
43  mixerchan.install
44    gus_callback
45      generatesamples
46      addsamples_s32            
47mpu401
48  operates diff from the rest
49opl  
50  adlib_getsample
51pcspeaker
52  mixerchan.install
53    pcspeaker_callback
54      stream(mixtemp buffer)=
55      addsamples_m16      
56sblaster
57  mixerchan.install
58    sblaster_callback
59      none/pause/masked 
60        mixer addsilence

…Show last 46 lines

61      dac 
62        mixer addstretched 
63      dma 
64        generatedmasound 
65          dsp_dma_2
66            decode_adpcm_2_sample
67            addsamples_m8
68          dsp_dma_3
69            decode_adpcm_3_sample
70            addsamples_m8
71          dsp_dma_4
72            decode_adpcm_4_sample
73            addsamples_m8
74          dsp_dma_8
75            stereo
76              dma.chan->read
77              !signed              
78                addsamples_s8
79              signed
80                addsamples_s8s
81            mono
82              dma.chan->read
83              !signed
84                addsamples_m8
85              signed
86                addsamples_m8s
87          dsp_dma_16
88          dsp_dma_16_aliased
89            stereo
90              dma.chan->read
91              !signed
92                addsamples_s16
93              signed
94                addsamples_s16u
95            mono
96              dma.chan->read
97              !signed
98                addsamples_m16
99              singed
100                addsamples_m16u            
101tandy
102 mixerchan.install
103   SN76496Update
104     sound_stream_update
105     addsamples_m16

Reply 1 of 30, by awgamer

Posted on 2020-08-20, 03:32

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

No comments? Not as much interest after all. fm overhead used to be a topic of discussion, but I guess not so much anymore. A solution to xwing being mute was to reduce cycles, as if it were an issue with sound card detection, perhaps pic cycling too fast with audio in a separate thread. As for dac playback, thinking dma might need some mutexs, when I added a log_msg to the addsample msfs5.1 was using, it added enough delay to keep msfs from wigging out, so just need to figure out where to put another hold so things don't stomp over each other in that state. edit: another way to get xwing to prevent the mute issue but without reducing cycles is using the dosidle utility: https://archive.org/details/dosidle

Reply 2 of 30, by jmarsh

Posted on 2020-08-20, 04:06

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1806
Joined: 2014-01-04, 09:17

I gave it a quick test but it didn't seem to raise cpu usage beyond the limit of one core / execute any work in parallel.

Reply 3 of 30, by awgamer

Posted on 2020-08-20, 04:12

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

Yes, what I note in the diff comments. I need to work out why, if I did the threading wrong, missed something(s) that are linked that keep from processing simultaniously, somehow still processing in main, or just that processing audio isn't significant anymore.

Reply 4 of 30, by jmarsh

Posted on 2020-08-20, 04:29

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1806
Joined: 2014-01-04, 09:17

It could definitely blow up video recording because MIXER_MixData() sends audio via CAPTURE_AddWave(), which would be bad if it happened when the main thread was inside CAPTURE_AddImage().

Reply 5 of 30, by awgamer

Posted on 2020-08-20, 04:36

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

A mutex lock or wait on the capture_addwave end should take care that potential issue, assuming my threading understanding is up to speed, and already mutex locked on the mixdata side.

Reply 6 of 30, by awgamer

Posted on 2020-08-20, 11:16

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

Checking mixer_mix(main thread), addsamples, dma read, pic_runqueue, normal_loop, and callbacks(ex. LOG_MSG("mythreadID sblaster GenerateDMASound %d",SDL_GetThreadID(NULL));) shows the audio handling is running in mixer_mix_thread as desired, with pic_runqueue and normal_loop in main shown to be running asynchronous, threaded.

Reply 7 of 30, by krcroft

Posted on 2020-08-21, 02:48

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

awgamer,

I think your approach is the right direction given actual hardware behavior:

1[  Hardware CPU -> IO BUS Event =-= HW CPU continues ... ]
2                                 `-[  .. ISA hardware processes audio in parallel ... ]

vs.

1[  Dosbox CPU -> IO BUS Event | | DOSBox CPU unblocked ... ]
2                               ^
3                               `- Emulating the ISA hardware (OPL/GUS/SB/...) 
4                                  blocks the Dosbox core to generate 1 ms of audio. This
5                                  won't interrupt the frame rate or cause audio hickups
6                                  provided the host CPU has enough spare headroom available.

If you're running DOSBox on a CPU that doesn't have enough headroom then the serial time to generate the audio will introduces a gap in the stream, stuttering the audio.

All of us seasoned users know that means, "I've given this game too many cycles..time to back it off a bit." Eventually you find the sweet spot where there's enough headroom to absorb the audio-generation bursts without breaking the audio stream. This is a pretty rare situation on run of the mill x86 hardware, but becomes very common on the Pi3 and 4 running some of the more demanding 1995+ era games where framerates can be borderline.

If that audio generation could be performed asynchronously on another core, then you could run your cycle count much closer to the CPU's maximum, while the second core would be more than sufficient to handle the audio generation.

(Sorry for repeating what you already intuitively understand; just wanted to drop my basis for adding my thumbs up to the effort!)

Reply 8 of 30, by krcroft

Posted on 2020-08-21, 04:46

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

awgamer,

The existing loop spins pretty hard; here are a couple changes that trying to mitigate that.
Also moved domix bool to a counter mix_queue, so the mix queue can pile up - when we need to run the mixer back-to-back without another wait cycle.

This can be the cases with PC Speaker PIT-mode audio such as the intro music to "Space Racer"; unfortunately even the queue doesn't fix it.. it still sounds muddy with micro-dropouts.

MIXER: mixing queue 1 MIXER: mixing queue 1 MIXER: mixing queue 2 MIXER: mixing queue 1 MIXER: mixing queue 1 MIXER: mixing queu […]
Show full quote

MIXER: mixing queue 1
MIXER: mixing queue 1
MIXER: mixing queue 2
MIXER: mixing queue 1
MIXER: mixing queue 1
MIXER: mixing queue 6
MIXER: mixing queue 5
MIXER: mixing queue 4
MIXER: mixing queue 3
MIXER: mixing queue 2
MIXER: mixing queue 1

1static SDL_mutex *queue_lock = SDL_CreateMutex();
2static uint16_t mix_queue = 0;
3
4static int MIXER_Mix_Thread(void *)
5{
6	while (1) {
7		SDL_LockMutex(queue_lock);
8		if (!mix_queue) {
9			SDL_UnlockMutex(queue_lock);
10			std::this_thread::sleep_for(std::chrono::microseconds(30));
11		}
12		else {
13			// LOG_MSG("MIXER: mixing queue %d", mix_queue);
14			--mix_queue;
15			SDL_UnlockMutex(queue_lock);
16			MIXER_MixData(mixer.needed);
17			mixer.tick_counter += mixer.tick_add;
18			mixer.needed += (mixer.tick_counter >> TICK_SHIFT);
19			mixer.tick_counter &= TICK_MASK;
20		}
21	}
22	return 0;
23}
24
25SDL_Thread *threadID = SDL_CreateThread(MIXER_Mix_Thread, "Mixer", (void *)NULL);
26static void MIXER_Mix()
27{
28	SDL_LockMutex(queue_lock);
29	mix_queue++;
30	SDL_UnlockMutex(queue_lock);
31}

Feel free to take and remix as desired 😀

Reply 9 of 30, by awgamer

Posted on 2020-08-21, 06:36

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

polling/domix was intended as a temp test to remove thread overhead as much as possible and workaound, but not really nice blasting a core at full speed, hilariously so for a light load, just to get around sdl's poor running routine, ideally would come up with something that does the same as condwait, just not implemented as badly, or put more nicely, more sensitive, quicker to react that doesn't cause the same performance drop seen in the frame rate. Maybe sdl's solution is as good as it gets but I doubt it.

What I've been testing with doesn't seem to be affected one way or another with your additions, no harm no foul, and the thought of buffering the mixing for such a pile up did cross my mind. I may not be hearing what you do since I have tinnitus, in my case I may have to take your word for it though others may be able to confirm, any other examples that highlights what it solves for?

Any particular reason to change the mutex to a pointer? Way I wrote it I just pulled from sdl examples and tutorials.

Low end would more likely benefit and threading devices is a match for real world system behavior, but dosbox audio handling is looking pretty light. Rather than just threading mixer, threading pic would match more, though more to wrap the head around and dependencies to track down and what other events pic handles are lighter still, keyboard, joystick, rtc, timer, .. picrunqueue and everything that hangs off it in one thread running like a southbridge might give an improvement, but recompiler, draw, rendering(if not already taken care of by one of the various offloading opengl/directx implementations? I'm not sure, my eyes glaze over from so many solutions, I don't have my bearings on what's what) and compiled code is where it's at.

FYI, I noticed dosidle keeps speaker playback from being warbly and studdery in testdrive(can be exhibited by cycling through the car selection using up/down arrow) with cycles at max. Can also resolve by setting cycles to a fixed amount but something to look at to improve auto cycle handling since dosidle source is included.

Reply 10 of 30, by jmarsh

Posted on 2020-08-21, 07:19

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1806
Joined: 2014-01-04, 09:17

You don't hear the glitches? I assume it's because SDL_LockAudio/UnlockAudio got tossed out and the mixer is writing to the audio buffers at the same time as SDL calls MIXER_CallBack, although it could also be caused by the mixing thread and the main thread touching pic stuff at the same time.

Reply 11 of 30, by awgamer

Posted on 2020-08-21, 08:24

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

If you look at the sdl source sdl_lock/unlockaudio are just mutex locks, redundant as far my understanding. I had removed the audio locks from adding the locks for condwait handling and just kept those with polling. I had played around with removing the locks, and when removed and trampling each other it wasn't subtle. As for hearing glitches, I'm having a hard time distinguishing between what's supposed to be glitching or just the low quality playback and samples that I'm hearing.

P.S. going by system monitor, max cycles is currently configured rather conservatively, "max" being far from true max by a large margin, could update it taking threaded mixing into account.

Reply 12 of 30, by jmarsh

Posted on 2020-08-21, 10:38

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1806
Joined: 2014-01-04, 09:17

They're not redundant because SDL runs audio in a separate thread that calls MIXER_CallBack() whenever more audio data is needed.

Reply 13 of 30, by krcroft

Posted on 2020-08-21, 17:05

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

... ideally would come up with something that does the same as condwait, just not implemented as badly, or put more nicely, more sensitive, quicker to react that doesn't cause the same performance drop seen in the frame rate.

Yup; that what I attempted at top of the loop: read the condition and if it's false then release the lock on the condition and wait a tiny bit of time (at which point, the condition is unlocked so the main thread is free to adjust it). I'm using a relatively fine-grained timer (admittedly C++11; which won't fly) to keep it quick and reactive but still almost entirely idle CPU-wise.

1	SDL_LockMutex(queue_lock);
2	if (!mix_queue) {
3		SDL_UnlockMutex(queue_lock);
4		std::this_thread::sleep_for(std::chrono::microseconds(30));

Regarding the queue vs. bool:

The queue prevents under-generating audio, which would otherwise be lost (more specifically, N - 1 milliseconds) would be lost.

In Space Racer, the the main thread can rapidly make 6 mix calls practically back-to-back, all while the mix-thread is in the middle of a single pass. When I use a bool (instead of a queue), the mix-thread finishes up its pass, toggles the bool to back to false, and then think its job is done for this round (meanwhile it actually lost 5ms of audio). So I feel some form of queue or backlog is needed to prevent this (perhaps there are smarter ways to catch/manage this!).

I think critical work lies in the areas mentioned by jmarsh; as this is no doubt where chunks of of audio are falling off the truck.

I understand you may not hear the degradation that jmarsh and I mentioned (I've also been told I'm unable to hear some dynamic range differences between recordings.. mid-40s is no fun; close up vision is also starting to change! argh).

Highly suggest using headphones or in-ear buds, and do side-by-side testing using "RealSound" (PIC-timer-music/effect) PC-speaker games: Mean Streets, Count Down, Digger (music), Karateka, and Space Racer. At least to me, it is extremely apparent and I hope can you reproduce it.

Reply 14 of 30, by awgamer

Posted on 2020-08-21, 17:58

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

It looked like sdl_lock/unlockaudio were being used as a generic mutex to me on internal mixer struct rather than sdl audio's obtained struct, in a sense they were, but internal mixer struct variables are being used in the sdl audio callback handler, not a problem adding them back, give the mix thread something to do, but now I want to thread the sdl audio callback handler to see what that does, as at the moment that should currently be processed in the main thread. Notably lighten main thread load? One way to find out(well, could profile to find out that way, but I digress.)

Reply 15 of 30, by jmarsh

Posted on 2020-08-21, 23:17

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1806
Joined: 2014-01-04, 09:17

It's already run on a separate thread by SDL, that's the reason why SDL_LockAudio/UnlockAudio exist (the callback function always runs with the audio lock owned). If you hand it off to yet another thread the audio hardware will play garbage because SDL expects the samples to be ready for playback when the callback returns.

Reply 16 of 30, by awgamer

Posted on 2020-08-22, 04:11

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

For whatever reason this was giving me improved fps in quake. It's not threaded, basically stock. margin of error?

Reply 17 of 30, by latalante

Posted on 2020-08-22, 09:58

latalante Offline

Rank Newbie

Rank: Newbie
Posts: 50
Joined: 2018-11-01, 22:47

I did a few tests and didn't notice the difference.
Then I repeated them using linux perf.

1220    0.05%  dosbox [.] MIXER_CallBack
2261    0.04%  dosbox [.] MIXER_CallBack

Yes, with this fix, MIXER is the 261 function in the performance race. Without 220.

The change from 0.05% to 0.04% cannot have a noticeable effect on the benchmark. Absolutely.

Edit:
Maybe on fast equipment with low resolution and high FPS, this difference becomes more visible. For me, everything oscillates around statistical error.

Edit2:
More accurate measurement.
perf record -e cycles:pp --call-graph dwarf src/dosbox -c 'quake.exe +timedemo demo1 +exec mode13.cfg -noipx -nolan -nocdaudio' -c 'exit' #800x600

dosbox-r4356

1perf report --no-inline
2Children|Self |Command|Shared Object|Symbol
30,17%     0,00%  dosbox  dosbox  [.] MIXER_Mix
40,15%     0,02%  dosbox  dosbox  [.] MIXER_MixData
50,05%     0,05%  dosbox  dosbox  [.] MIXER_CallBack
60,00%     0,00%  dosbox  dosbox  [.] MIXER_Init

dosbox-r4356 + mixer2.diff

1Children|Self |Command|Shared Object|Symbol
20,14%     0,00%  dosbox  dosbox  [.] MIXER_Mix
30,12%     0,02%  dosbox  dosbox  [.] MIXER_MixData
40,04%     0,04%  dosbox  dosbox  [.] MIXER_CallBack
50,00%     0,00%  dosbox  dosbox  [.] MIXER_Init
60,00%     0,00%  dosbox  dosbox  [.] MIXER_CallBack

Reply 18 of 30, by awgamer

Posted on 2020-08-22, 18:11

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 805
Joined: 2014-07-26, 07:42

krcroft, I've found that setting microseconds to 1000 is the magic number with your solution to condwait, system monitor showing the core running mix thread going from running at full tilt to using only what it needs like a sane process, with no fps hit seen from benching like with condwait, and no lag or dropped samples that I perceived, but you know the drill on that, for you guys the difference may make your ears bleed, but come on, we're talking 0.001 of a second here. Check to confirm it's not just me seeing this, but otherwise, congrats, you've built a better mouse trap.

Reply 19 of 30, by krcroft

Posted on 2020-08-22, 18:46

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

Good stuff all round awgamer; is there a combined patch you can post that includes your performance improvement that latalante confirmed?
Regarding 1000 microseconds, in theory you could then swap in SDL_Delay(1) and avoid the C++11'isms, as both delay one millisecond.

Go to top of page Go to top of page

Back to DOSBox Development