DOSBox-X branch

Reply 2180 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 04:49

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

Given the overflow does not occur, would this work:

1    if (sizeof(Type) == 1) {
2 	const Bit8u xr = signeddata ? 0x00 : 0x80;
3 + 	Bit8s d;
4
5   	len--;
6 -	current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
7 +	d = (Bit8s)((*data++) ^ xr);
8 +      current[0] = d > xr ? (d << 8) | (2 * d + 1) : d << 8;
9 -	if (stereo)
10 +	if (stereo) {
11 -	    current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
12 +	    d = (Bit8s)((*data++) ^ xr);
13 +	    current[1] = d > xr ? (d << 8) | (2 * d + 1) : d << 8;
14 +	}
15	else
16	    current[1] = current[0];

This is based on kcgen's work at https://github.com/dosbox-staging/dosbox-staging/pull/1005.

Tested with modified test code from kcgen:

1// gcc convert-8bit-16bit.c -o convert-8bit-16bit.exe
2short scale_uto16(unsigned char val) {
3    if (val > 128)
4	return (((char)(val ^0x80)) << 8) | (2 * ((char)(val ^0x80)) + 1);
5    else
6        return (char)(val ^0x80) << 8;
7}
8
9short scale_sto16(char val) {
10    if (val > 0)
11        return (val << 8) | (2 * val + 1);
12    else
13	return (val << 8);
14}
15
16void main()
17{
18    int x;
19
20    printf("%8s %12s\n", "value", "int-shift");
21
22    unsigned char unsigned_val[] = { 0, 15, 16, 31, 32, 33, 96, 127, 128, 129, 168, 254, 255 };
23
24    for (x = 0; x < 13; x++)
25        printf("%8u %12d\n", unsigned_val[x], scale_uto16(unsigned_val[x]));
26
27    printf("\n%8s %12s\n", "value", "int-shift");
28
29    char signed_val[] = { -128, -96, -63, -64, -65, -33, -32, -1, 0, 1, 32, 33, 63, 64, 65, 96, 126, 127 };
30
31    for (x = 0; x < 18; x++)
32        printf("%8d %12d\n", signed_val[x], scale_sto16(signed_val[x]));
33}

Reply 2181 of 2419, by krcroft

Posted on 2021-05-13, 06:33

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

Thanks, hail-to-the-ryzen.
Here's the side-by-side in GodBolt: https://godbolt.org/z/5onrGxWPe

Last edited by krcroft on 2021-05-13, 07:13. Edited 1 time in total.

Reply 2182 of 2419, by jmarsh

Posted on 2021-05-13, 07:10

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1699
Joined: 2014-01-04, 09:17

Given the input is only 8-bit it's probably best to use a look-up table.

Reply 2183 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 09:21

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

That is a powerful tool, krcroft. Thank you for the instructions, too.

A look-up table is a great idea. Below is a modification of krcroft's code to output the look-up table in human readable format.

1// gcc convert-8bit-16bit.c -o convert-8bit-16bit.exe
2short scale_uto16(unsigned char val) {
3    if (val > 128)
4	return (((char)(val ^0x80)) << 8) | (2 * ((char)(val ^0x80)) + 1);
5    else
6        return (char)(val ^0x80) << 8;
7}
8
9short scale_sto16(char val) {
10    if (val > 0)
11        return (val << 8) | (2 * val + 1);
12    else
13	return (val << 8);
14}
15
16void main()
17{
18    int x;
19
20    printf("0 to 255\n");
21
22    for (x = 0; x < 256; x++)
23        printf("%u=%d\n", x, scale_uto16(x));
24
25    printf("\n-128 to 127\n");
26
27    for (x = -128; x < 128; x++)
28        printf("%d=%d\n", x, scale_sto16(x));
29}

Reply 2184 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 10:17

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

This code is untested, but it seems like it is structured correctly for inclusion in the mixer.cpp file:

1static Bit16s Sample_16_Table[256];
2
3// call function during mixer init
4void Mixer_SetSample16Table(void) {
5    for (Bitu i=0;i<256;i++) {
6        if (i > 128)
7            Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
8        else
9            Sample_16_Table[i]=(i-128) << 8;
10    }
11}
12
13    if (sizeof(Type) == 1) {
14 	const Bit8u xr = signeddata ? 0x00 : 0x80;
15 + 	Bit8s d;
16
17   	len--;
18 -	current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
19 +	d = (Bit8s)((*data++) ^ xr);
20 +      current[0] = Sample_16_Table[d+128-xr]; 
21 -	if (stereo)
22 +	if (stereo) {
23 -	    current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
24 +	    d = (Bit8s)((*data++) ^ xr);
25 +	    current[1] = Sample_16_Table[d+128-xr];
26 +	}
27	else
28	    current[1] = current[0];

Could save a calculation for the stereo case by setting a variable to d+128-xr.

Reply 2185 of 2419, by jmarsh

Posted on 2021-05-13, 10:49

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1699
Joined: 2014-01-04, 09:17

I think you could make the table have 384 entries with the first 128 entries being duplicated in the last 128, then access it through a pointer to either element 0 or element 128 depending if the source is signed/unsigned.

Reply 2186 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 11:47

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

That is a better idea. Thanks. I will try that tomorrow if someone else does not implement it first.

Reply 2187 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 23:07

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

This is not yet compiled, but I followed your suggestion as best that I could and tried to fix the calculations. Sample_16_Table[] holds the sorted values, from low to high of the "scaled" 16-bit signed integer.

1static Bit16s Sample_16_Table[256];
2
3// call function during mixer init
4void Mixer_SetSample16Table(void) {
5    for (Bitu i=0;i<256;i++) {
6        if (i > 128)
7            Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
8        else
9            Sample_16_Table[i]=(i-128) << 8;
10    }
11}
12
13    if (sizeof(Type) == 1) {
14 + 	Bit8s d;
15 +
16        const Bit8u xr = signeddata ? 0x00 : 0x80;
17 +	Bit16s *tablePtr;
18 +	tablePtr = &Sample_16_Table[128];
19
20   	len--;
21 -	current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
22 +	d = (Bit8s)((*data++) ^ xr);
23 +	current[0] = *tablePtr[d]; 
24 -	if (stereo)
25 +	if (stereo) {
26 -	    current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
27 +	    d = (Bit8s)((*data++) ^ xr);
28 +	    current[1] = *tablePtr[d];
29 +	}
30	else
31	    current[1] = current[0];

If this seems correct, then I guess I could extend the array as suggested? That would save a couple more instructions during runtime?

Last edited by hail-to-the-ryzen on 2021-05-13, 23:38. Edited 1 time in total.

Reply 2188 of 2419, by Wengier

Posted on 2021-05-13, 23:36

Wengier Offline

Rank Member

Rank: Member
Posts: 249
Joined: 2014-09-03, 19:56

K.A.R.R.: I assume you used the SDL1 version. Try the SDL2 version instead in your case which will likely solve the problem you encountered.

As for the gustype, the DOSBox SVN version of gustype is never fixed, i.e. there does not exist a single "dosboxsvn" gustype I think.

Reply 2189 of 2419, by Wengier

Posted on 2021-05-13, 23:43

Wengier Offline

Rank Member

Rank: Member
Posts: 249
Joined: 2014-09-03, 19:56

hail-to-the-ryzen: Thank you (and kcgen) for working on the value-shifting code. I see the code may still being improved, but when ready please feel free to implement it in the source code.

Reply 2190 of 2419, by hail-to-the-ryzen

Posted on 2021-05-13, 23:47

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

Thank you. I think that the code could be verified and possibly extended to save a few more instructions at the machine level. It is simpler now to read, but I think that xr can be removed and another set of table entries added (as per jmarsh's recommendation), but I think that it requires an independent table of entries. The xor operation in the unsigned operation creates values that I can't confirm as fitting with the current above table. I think that it is incompatible. I should actually test it with sample code first.

Reply 2191 of 2419, by krcroft

Posted on 2021-05-14, 01:55

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

jmarsh, hail-to-the-ryzen - the lookup table suggestion is flying and generating unbiased results to boot. Thanks all round!

Code Explorer: https://godbolt.org/z/f6xddKarf

Google Benchmarks: https://quick-bench.com/q/FO-kOkUG-atr8vmaoJ6KMyW9srM

The attachment 2021-05-13_18-12.png is no longer available

The attachment 2021-05-13_18-15.png is no longer available

Last edited by krcroft on 2021-05-16, 01:12. Edited 5 times in total.

Reply 2192 of 2419, by hail-to-the-ryzen

Posted on 2021-05-14, 02:16

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

That is great news, krcroft. Thank you for your work. I think there is a possible way, as I think jmarsh suggested, to avoid the xor operation on the data values. Something like this:

1tablePtr = signeddata ? &Sample_16_Table[128] : &Sample_16_Table[0];

I verified that the values are correct, but I believe the Bit8s cast will truncate the unsigned values in the code, unless there are additional changes to the code. Here is another possibility, but I don't know if the change to the cast will cause any issue:

1static Bit16s Sample_16_Table[256];
2
3// call function during mixer init
4void Mixer_SetSample16Table(void) {
5    for (Bitu i=0;i<256;i++) {
6        if (i > 128)
7            Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
8        else
9            Sample_16_Table[i]=(i-128) << 8;
10    }
11}
12
13    if (sizeof(Type) == 1) {
14 + 	Bit16s d;
15 +
16 -      const Bit8u xr = signeddata ? 0x00 : 0x80;
17 +	Bit16s *tablePtr;
18 +	tablePtr = signeddata ? &Sample_16_Table[128] : &Sample_16_Table[0];
19
20   	len--;
21 -	current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
22 +	d = (Bit16s)(*data++);
23 +	current[0] = *tablePtr[d]; 
24 -	if (stereo)
25 +	if (stereo) {
26 -	    current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
27 +	    d = (Bit16s)(*data++);
28 +	    current[1] = *tablePtr[d];
29 +	}
30	else
31	    current[1] = current[0];

Reply 2193 of 2419, by krcroft

Posted on 2021-05-14, 03:59

krcroft Offline

Rank Oldbie

Rank: Oldbie
Posts: 589
Joined: 2017-04-29, 15:07

hail-to-the-ryzen, right, with two lookup tables defined (one for unsigned and the other signed), the need to adjust the 8-bit values (with xor or offsetting by 128) can be avoided and directly looked up as jmash suggested.

Ref: lines 369 and 370 https://github.com/dosbox-staging/dosbox-stag … f965b7ae43a590d#

Reply 2194 of 2419, by hail-to-the-ryzen

Posted on 2021-05-14, 04:08

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

Thank you, krcroft, for the help with the use of a second array to handle the 8-bit values. I wonder whether that would show better performance than the use of the xor operations that are used in the version below.

I think that this code is now functional. I had to fix the dereferencing of the pointer since the previous versions were untested and incorrect. This code shows the unbiased values for the conversion from 8-bit to signed 16-bit, but I have not yet verified the boundary cases for the values inside the emulator (-128 and 127). I don't see why the boundary cases would not work. For incorporation in dosbox-x, should just require a change of variable type names, such as from Bit32u to uint32_t. Also, the initialization of the array by Mixer_SetSample16Table() is in Mixer Init, but that may be moved (use extern if called that array table from outside the mixer.cpp file). I credit jmarsh and krcroft for the calculations and references, and I can only claim credit for any errors or mistakes in the code. 😀

1diff -rupN dosbox-original/src/hardware/mixer.cpp dosbox/src/hardware/mixer.cpp
2--- dosbox-original/src/hardware/mixer.cpp
3+++ dosbox/src/hardware/mixer.cpp	
4@@ -54,6 +54,8 @@
5 #define MIXER_SSIZE 4
6 #define MIXER_VOLSHIFT 13
7 
8+static Bit16s Sample_16_Table[256];
9+
10 static INLINE Bit16s MIXER_CLIP(Bits SAMP) {
11 	if (SAMP < MAX_AUDIO) {
12 		if (SAMP > MIN_AUDIO)
13@@ -379,12 +381,20 @@ inline void MixerChannel::loadCurrentSam
14 	last[1] = current[1];
15 
16 	if (sizeof(Type) == 1) {
17+		Bit8s d;
18+
19+		Bit16s *tablePtr = &Sample_16_Table[128];
20 		const Bit8u xr = signeddata ? 0x00 : 0x80;
21 
22 		len--;
23-		current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
24-		if (stereo)
25-			current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
26+
27+		d = (Bit8s)((*data++) ^ xr);
28+		current[0] = *(tablePtr + d);
29+	
30+		if (stereo) {
31+			d = (Bit8s)((*data++) ^ xr);
32+			current[1] = *(tablePtr + d);
33+		}
34 		else
35 			current[1] = current[0];
36 	}
37@@ -888,6 +898,15 @@ void MIXER_Controls_Init() {
38 	MAPPER_AddHandler(MAPPER_RecVolumeDown,MK_kpminus,MMOD1|MMOD2,"recvoldown","RecVolDn");
39 }
40 
41+void Mixer_SetSample16Table(void) {
42+    for (Bitu i=0;i<256;i++) {
43+        if (i > 128)
44+            Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
45+        else
46+            Sample_16_Table[i]=(i-128) << 8;
47+    }
48+}
49+
50 void MIXER_Init(Section* sec) {
51 	sec->AddDestroyFunction(&MIXER_Stop);
52 
53@@ -913,6 +932,8 @@ void MIXER_Init(Section* sec) {
54 	mixer.recordvol[0]=1.0f;
55 	mixer.recordvol[1]=1.0f;
56 
57+	Mixer_SetSample16Table();
58+
59 	/* Start the Mixer using SDL Sound at 22 khz */
60 	SDL_AudioSpec spec;

…Show last 2 lines

61 	SDL_AudioSpec obtained;

Edit: alternatively, it should be possible to declare the pointer outside the function:

@@ -54,6 +54,9 @@ #define MIXER_SSIZE 4 #define MIXER_VOLSHIFT 13 +static Bit16s Sample_16_Table[256]; +Bit16s *tablePtr = & […]
Show full quote
@@ -54,6 +54,9 @@
#define MIXER_SSIZE 4
#define MIXER_VOLSHIFT 13

+static Bit16s Sample_16_Table[256];
+Bit16s *tablePtr = &Sample_16_Table[128];
+
static INLINE Bit16s MIXER_CLIP(Bits SAMP) {
if (SAMP < MAX_AUDIO) {
if (SAMP > MIN_AUDIO)

Reply 2195 of 2419, by hail-to-the-ryzen

Posted on 2021-05-15, 22:11

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

Here is an updated version because the previous version of the code seemed to have lower performance than expected. I noticed this where patching pcem with the same. This version was additionally tested against the boundary values of -128, 127, and all testing was for the signed 8-bit mono sound format.

1--- dosbox-original/src/hardware/mixer.cpp
2+++ dosbox/src/hardware/mixer.cpp
3@@ -54,6 +54,9 @@
4 #define MIXER_SSIZE 4
5 #define MIXER_VOLSHIFT 13
6 
7+static Bit16s Sample_16_Table[256];
8+Bit16s *tablePtr = Sample_16_Table + 128;
9+
10 static INLINE Bit16s MIXER_CLIP(Bits SAMP) {
11 	if (SAMP < MAX_AUDIO) {
12 		if (SAMP > MIN_AUDIO)
13@@ -382,9 +385,9 @@ inline void MixerChannel::loadCurrentSam
14 		const Bit8u xr = signeddata ? 0x00 : 0x80;
15 
16 		len--;
17-		current[0] = ((Bit8s)((*data++) ^ xr)) << 8;
18+		current[0] = *(tablePtr + (Bit8s)((*data++) ^ xr));
19 		if (stereo)
20-			current[1] = ((Bit8s)((*data++) ^ xr)) << 8;
21+			current[1] = *(tablePtr + (Bit8s)((*data++) ^ xr));
22 		else
23 			current[1] = current[0];
24 	}
25@@ -888,6 +891,15 @@ void MIXER_Controls_Init() {
26 	MAPPER_AddHandler(MAPPER_RecVolumeDown,MK_kpminus,MMOD1|MMOD2,"recvoldown","RecVolDn");
27 }
28 
29+void Mixer_SetSample16Table(void) {
30+    for (Bitu i=0;i<256;i++) {
31+        if (i > 128)
32+            Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
33+        else
34+            Sample_16_Table[i]=(i-128) << 8;
35+    }
36+}
37+
38 void MIXER_Init(Section* sec) {
39 	sec->AddDestroyFunction(&MIXER_Stop);
40 
41@@ -913,6 +925,8 @@ void MIXER_Init(Section* sec) {
42 	mixer.recordvol[0]=1.0f;
43 	mixer.recordvol[1]=1.0f;
44 
45+	Mixer_SetSample16Table();
46+
47 	/* Start the Mixer using SDL Sound at 22 khz */
48 	SDL_AudioSpec spec;
49 	SDL_AudioSpec obtained;

Reply 2196 of 2419, by awgamer

Posted on 2021-05-15, 22:26

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

Curious, would the compiler optimize the if statement out of the for loop or would the following optimization be needed to do so? Also, the original >128 if compare makes the split lopsided by 1, 0-128, 129-256, intentional?

1void Mixer_SetSample16Table(void) {
2  for (Bitu i=0;i<=128;i++)
3    Sample_16_Table[i]=(i-128) << 8;
4  
5  for (Bitu i=129;i<256;i++)     
6    Sample_16_Table[i]=((i-128) << 8) | (2 * (i-128) + 1);
7}

Reply 2197 of 2419, by hail-to-the-ryzen

Posted on 2021-05-15, 23:17

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

That is a good analysis. The table function runs at the start of emulation only, so it is not as crucial to optimize there. And as you said, the compiler may help where optimizing. I also had verified that function for accuracy in its calculation and then moved to the other part. That avoids the additional uncertainty when verifying the calculations.

On the other question, in the for loop, where i=128, then the table holds a value of 0. From 0 to 127 are the first set of non-zero values. And then from 129 to 255 are the second set of non-zero values. That is the 8-bit range here, from 0 to 255. Those end up as -128 to -1, and 1 to 127 (and 0). At least from memory that seems correct.

Reply 2198 of 2419, by awgamer

Posted on 2021-05-16, 00:00

awgamer Offline

Rank Oldbie

Rank: Oldbie
Posts: 808
Joined: 2014-07-26, 07:42

For a 0-127, 128-255 split the if should be > 127, not > 128 I would think.

Reply 2199 of 2419, by hail-to-the-ryzen

Posted on 2021-05-16, 00:10

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

If that was set to >127 instead of 128, then i = 128 would access the path for correcting the bias in the conversion. The way it is currently written, the i = 128 results in a table value of 0 which is expected.

Main menu

Topic actions

Reply 2180 of 2419, by hail-to-the-ryzen

Reply 2181 of 2419, by krcroft

Reply 2182 of 2419, by jmarsh

Reply 2183 of 2419, by hail-to-the-ryzen

Reply 2184 of 2419, by hail-to-the-ryzen

Reply 2185 of 2419, by jmarsh

Reply 2186 of 2419, by hail-to-the-ryzen

Reply 2187 of 2419, by hail-to-the-ryzen

Reply 2188 of 2419, by Wengier

Reply 2189 of 2419, by Wengier

Reply 2190 of 2419, by hail-to-the-ryzen

Reply 2191 of 2419, by krcroft

Reply 2192 of 2419, by hail-to-the-ryzen

Reply 2193 of 2419, by krcroft

Reply 2194 of 2419, by hail-to-the-ryzen

Reply 2195 of 2419, by hail-to-the-ryzen

Reply 2196 of 2419, by awgamer

Reply 2197 of 2419, by hail-to-the-ryzen

Reply 2198 of 2419, by awgamer

Reply 2199 of 2419, by hail-to-the-ryzen