DOSBox-X branch

Reply 2020 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 08:47

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

In other news, I sat down with one of the new ARM64-based M1 Apple Macbooks and got DOSBox-X to run on those too. There are some considerations to make for those though, which I thought I'd share so SVN can compile for them too.

One is how to compile with SDL2. You're going to need to modify configure.ac and then autogen.sh it, because their configure script assumes that Darwin and ARM means iOS (that you're compiling for the iPad or iPhone), which is wrong. Remove the part of the case statement that tries to match *arm*darwin* leaving only the *ios* part.

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

I recall, though I can't confirm, that ARM64-based Linux distributions for the Raspberry Pi have the same W^X policy.

Hopefully this information will help DOSBox SVN improve itself for these new environments.

I'm well aware x86 builds also run on the M1 Macbooks (as demonstrated by LGR on Twitter), but still, there's better performance to be had as native ARM OS X code.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2021 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 08:52

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

ThankYou wrote on 2020-11-13, 20:44:

Quick 'thank you' for this, Jon and the other contributors.

Neither VirtualBox or VMPlayer would let me install Win98SE in a virtual machine on my Ryzen PC - both were rather faster but both had illegal operation / invalid page fault failures in Regsvr32 (and something else in VirtualBox) so didn't actually complete the install.

I've found the only way to install and run Windows 95 and Windows 98 in VirtualBox without crashing is to turn OFF the CPU-based virtualization extensions, forcing VirtualBox to use software emulation. VT-x, VirtualBox, and Windows 95/98 don't mix for some reason.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2022 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 09:01

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

Ringding wrote on 2020-11-11, 18:55:

Which compiler do you use for building? If it works on a Pentium 4, it’s likely built for SSE2 (-mfpmath=sse in gcc, might be set by default in recent mingw releases).

If you compile with build-mingw-lowend, does it help?

That script was originally designed to enable compiling on lower end systems by disabling the MT32 emulation (which tended to use SSE instructions).

There is code in src/gui/render.cpp to conditionally use SSE2 to speed up previous/current frame comparisons, but that should be conditional on CPUID reporting SSE2.

Can you use a debugger to point at the code that is faulting on Pentium III systems.

I've never really tested DOSBox-X on anything below a Pentium 4 though.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2023 of 2421, by hail-to-the-ryzen

Posted on 2020-11-28, 11:19

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.

Here is another vs2019 issue with type conversion (probably also optimization related):
https://developercommunity.visualstudio.com/c … ting-to-21.html

Here is a hint about x86_64 builds running in Apple M1:
"The system prevents you from mixing arm64 code and x86_64 code in the same process. Rosetta translation applies to an entire process, including all code modules that the process loads dynamically."

Reply 2024 of 2421, by jmarsh

Posted on 2020-11-28, 12:38

jmarsh Online

Rank Oldbie

Rank: Oldbie
Posts: 1725
Joined: 2014-01-04, 09:17

TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

TheGreatCodeholio wrote on 2020-11-28, 08:47:

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

Constantly updating the mapping every time a block is written is slow.
Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.
The current ARM64 dynrec isn't really worth bothering with anyway, running the dyn_x86 core using emulation will give better performance.

(There is still a glaring bug regardless, being that the cache memory is allocated with malloc - meaning it is undefined behavior to use mprotect on it.)

Reply 2025 of 2421, by Arthandas

Posted on 2020-11-28, 13:36

Arthandas Offline

Rank Newbie

Rank: Newbie
Posts: 10
Joined: 2020-11-28, 13:21

How do you set up DOSBox-X to show integer scaled window in fullscreen? I can do that easily in original DOSBox but I can't get it to work here.
To be perfectly clear what I want to accomplish: I want to scale 320x200 game to 1280x800 so it's pixel perfect with even pixels and I want to display that 1280x800 window in 1920x1080 fullscreen with black borders around it. So far no matter what fullscreen resolution I set in the config file, or whether I use autofit or aspect ratio correction it never scales that way.

The official DOSBox let's me do it by using OpenGLnb and setting fullresolution=1280x800 in dosboxTex1.conf. If I do the same in X it just displays 1280x800 window.

Reply 2026 of 2421, by latalante

Posted on 2020-11-28, 15:33

latalante Offline

Rank Newbie

Rank: Newbie
Posts: 50
Joined: 2018-11-01, 22:47

jmarsh wrote on 2020-11-28, 12:38:

Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.

Unfortunately not yet. I was running qemu-5.2.0-rc1 on linux 10 days ago. Exits with the following message.

1systemd-run --user --pty -p MemoryDenyWriteExecute=yes /opt/qemu-git/bin/qemu-system-x86_64 -machine pc,accel=tcg -net none

Could not allocate dynamic translator buffer

Same as OpenBSD with W^X active by default.
TCG requires WX access.
Disable W^X on OpenBSD. https://git.qemu.org/?p=qemu.git;a=commit;h=7 … ed8a6d94bd73db8

Edit:
There are actually patches for qemu for iOS/Darwin and W^X.
https://lists.nongnu.org/archive/html/qemu-de … 1/msg01766.html

Last edited by latalante on 2020-11-28, 20:12. Edited 1 time in total.

Reply 2027 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 16:48

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

Arthandas wrote on 2020-11-28, 13:36:

How do you set up DOSBox-X to show integer scaled window in fullscreen? I can do that easily in original DOSBox but I can't get it to work here.
To be perfectly clear what I want to accomplish: I want to scale 320x200 game to 1280x800 so it's pixel perfect with even pixels and I want to display that 1280x800 window in 1920x1080 fullscreen with black borders around it. So far no matter what fullscreen resolution I set in the config file, or whether I use autofit or aspect ratio correction it never scales that way.

The official DOSBox let's me do it by using OpenGLnb and setting fullresolution=1280x800 in dosboxTex1.conf. If I do the same in X it just displays 1280x800 window.

To be fair, DOSBox-X disabled changing the video mode in fullscreen because modern displays seem to take their time to re-display the desktop on mode changes. They aren't the CRTs of old.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2028 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 16:52

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

jmarsh wrote on 2020-11-28, 12:38:
Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (elimina […]
Show full quote

TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

TheGreatCodeholio wrote on 2020-11-28, 08:47:

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

Constantly updating the mapping every time a block is written is slow.
Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.
The current ARM64 dynrec isn't really worth bothering with anyway, running the dyn_x86 core using emulation will give better performance.

(There is still a glaring bug regardless, being that the cache memory is allocated with malloc - meaning it is undefined behavior to use mprotect on it.)

Constantly updating the map is fast enough on the Macbook so far. I seem to get a 5-10% CPU load reduction according to top.

It looks like there is a Darwin/mach-specific task remap function to make exactly that kind of split mapping, because Apple themselves uses it on iOS to JIT compile JavaScript. However portability is a concern and there are Linux systems with the same restriction, so I may have to either implement both or just use the shmget() shared memory file handle mmap trick that Linux supports.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2029 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 16:53

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

jmarsh wrote on 2020-11-28, 12:38:

TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

Masking should be optimized properly by the compiler if the mask accomplishes the same thing, at least GCC seems to be smart enough about it.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2030 of 2421, by Ringding

Posted on 2020-11-28, 17:10

Ringding Offline

Rank Member

Rank: Member
Posts: 214
Joined: 2016-01-05, 21:02
Location: Wien

hail-to-the-ryzen wrote on 2020-11-28, 11:19:

Here is a hint about x86_64 builds running in Apple M1:
"The system prevents you from mixing arm64 code and x86_64 code in the same process. Rosetta translation applies to an entire process, including all code modules that the process loads dynamically."

Yes, obviously. I doubt anyone would expect to be able to do something of this kind. You can also not mix i686 and x86_64 inside one process.

Reply 2031 of 2421, by Dominus

Posted on 2020-11-28, 17:14

Dominus Offline

Rank DOSBox Moderator

Rank: DOSBox Moderator
Posts: 9208
Joined: 2002-10-03, 09:54
Location: Ludwigsburg

As I've been banging heads with Jmarsh against the ARM64 code, there is also a big problem with Apple's tightened security on the M1. While the compiled binary will work nicely on your own machine, an actual app bundle needs entitlements and codesigning with a developer account to make it work on other's machines. This was not fun to test again...
Based on Re: dynrec vs. secure platforms - opinions wanted we've been testing https://github.com/DominusExult/buildbot/blob … dosbox_wx.patch (with the entitlements at https://github.com/DominusExult/buildbot/blob … itlements.plist). Eventually it seems the way the patch handles the cache tempfile is never going to work on the Apple Silicon from an app bundle.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 2032 of 2421, by hail-to-the-ryzen

Posted on 2020-11-28, 21:00

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

Because the Apple M1 emulation will not mix and match between architectures, it may be worthwhile to improve the voodoo emulation, perhaps even add banshee. Here is also a 3dfx voodoo patch against previous code in dosbox-x for hints. There was a recent update to that code in dosbox-x that reintroduced the faulty mipmapping emulation for the non-opengl path. It may be worthwhile to pursue other patches given the above comments.

1diff -rupN 3dfx/voodoo_data.h 3dfx-2/voodoo_data.h
2--- 3dfx/voodoo_data.h
3+++ 3dfx-2/voodoo_data.h
4@@ -115,6 +115,7 @@ enum
5 #define RECIP_OUTPUT_PREC		15
6 #define LOG_OUTPUT_PREC			8
7 
8+#define UNEXPECTED(exp) __builtin_expect(!!(exp), 0)
9 
10 
11 /*************************************
12@@ -133,10 +134,10 @@ static const UINT8 dither_matrix_4x4[16]
13 
14 static const UINT8 dither_matrix_2x2[16] =
15 {
16-	 2, 10,  2, 10,
17-	14,  6, 14,  6,
18-	 2, 10,  2, 10,
19-	14,  6, 14,  6
20+	 8, 10,  8, 10,
21+	11,  9, 11,  9,
22+	 8, 10,  8, 10,
23+	11,  9, 11,  9
24 };
25 
26 
27@@ -859,25 +860,24 @@ INLINE UINT8 count_leading_zeros(UINT32
28  *
29  *************************************/
30 
31-INLINE INT64 fast_reciplog(INT64 value, INT32 *log2)
32+INLINE INT32 fast_reciplog(INT64 value)
33 {
34 	extern UINT32 voodoo_reciplog[];
35-	UINT32 temp, rlog;
36+	UINT32 temp, recip;
37 	UINT32 interp;
38 	UINT32 *table;
39-	UINT64 recip;
40-	bool neg = false;
41+	int neg = FALSE;
42 	int lz, exp = 0;
43 
44 	/* always work with unsigned numbers */
45 	if (value < 0)
46 	{
47 		value = -value;
48-		neg = true;
49+		neg = TRUE;
50 	}
51 
52 	/* if we've spilled out of 32 bits, push it down under 32 */
53-	if (value & LONGTYPE(0xffff00000000))
54+	if (value & 0xffff00000000ull)
55 	{
56 		temp = (UINT32)(value >> 16);
57 		exp -= 16;
58@@ -886,9 +886,8 @@ INLINE INT64 fast_reciplog(INT64 value,
59 		temp = (UINT32)value;
60

…Show last 143 lines

61 	/* if the resulting value is 0, the reciprocal is infinite */
62-	if (GCC_UNLIKELY(temp == 0))
63+	if (UNEXPECTED(temp == 0))
64 	{
65-		*log2 = 1000 << LOG_OUTPUT_PREC;
66 		return neg ? 0x80000000 : 0x7fffffff;
67 	}
68 
69@@ -907,15 +906,8 @@ INLINE INT64 fast_reciplog(INT64 value,
70 
71 	/* do a linear interpolatation between the two nearest table values */
72 	/* for both the log and the reciprocal */
73-	rlog = (table[1] * (0x100 - interp) + table[3] * interp) >> 8;
74 	recip = (table[0] * (0x100 - interp) + table[2] * interp) >> 8;
75 
76-	/* the log result is the fractional part of the log; round it to the output precision */
77-	rlog = (rlog + (1 << (RECIPLOG_LOOKUP_PREC - LOG_OUTPUT_PREC - 1))) >> (RECIPLOG_LOOKUP_PREC - LOG_OUTPUT_PREC);
78-
79-	/* the exponent is the non-fractional part of the log; normally, we would subtract it from rlog */
80-	/* but since we want the log(1/value) = -log(value), we subtract rlog from the exponent */
81-	*log2 = (INT32)((((int)exp - ((int)31 - (int)RECIPLOG_INPUT_PREC)) << (int)LOG_OUTPUT_PREC) - (int)rlog);
82 
83 	/* adjust the exponent to account for all the reciprocal-related parameters to arrive at a final shift amount */
84 	exp += (RECIP_OUTPUT_PREC - RECIPLOG_LOOKUP_PREC) - (31 - RECIPLOG_INPUT_PREC);
85@@ -927,7 +919,7 @@ INLINE INT64 fast_reciplog(INT64 value,
86 		recip <<= exp;
87 
88 	/* on the way out, apply the original sign to the reciprocal */
89-	return neg ? -(INT64)recip : (INT64)recip;
90+	return neg ? -recip : recip;
91 }
92 
93 
94@@ -1725,8 +1717,7 @@ do																				\
95 {																				\
96 	INT32 blendr, blendg, blendb, blenda;										\
97 	INT32 tr, tg, tb, ta;														\
98-	INT32 s, t, lod, ilod;														\
99-	INT64 oow;																	\
100+	INT32 oow, s, t, lod, ilod;														\
101 	INT32 smax, tmax;															\
102 	UINT32 texbase;																\
103 	rgb_union c_local;															\
104@@ -1734,15 +1725,15 @@ do																				\
105 	/* determine the S/T/LOD values for this texture */							\
106 	if (TEXMODE_ENABLE_PERSPECTIVE(TEXMODE))									\
107 	{																			\
108-		oow = fast_reciplog((ITERW), &lod);										\
109-		s = (INT32)((oow * (ITERS)) >> 29);										\
110-		t = (INT32)((oow * (ITERT)) >> 29);										\
111-		lod = (LODBASE);														\
112+		oow = fast_reciplog((ITERW));										\
113+		s = ((INT64)oow * (ITERS)) >> 29;										\
114+		t = ((INT64)oow * (ITERT)) >> 29;										\
115+		lod = (LODBASE);							\
116 	}																			\
117 	else																		\
118 	{																			\
119-		s = (INT32)((ITERS) >> 14);												\
120-		t = (INT32)((ITERT) >> 14);												\
121+		s = (ITERS) >> 14;												\
122+		t = (ITERT) >> 14;												\
123 		lod = (LODBASE);														\
124 	}																			\
125 																				\
126diff -rupN 3dfx/voodoo_emu.cpp 3dfx-2/voodoo_emu.cpp
127--- 3dfx/voodoo_emu.cpp	2020-07-25 22:34:11 -0400
128+++ 3dfx-2/voodoo_emu.cpp	2020-07-25 22:34:15 -0400
129@@ -1098,7 +1108,6 @@ void recompute_texture_params(tmu_state
130 INLINE INT32 prepare_tmu(tmu_state *t)
131 {
132 	INT64 texdx, texdy;
133-	INT32 lodbase;
134 
135 	/* if the texture parameters are dirty, update them */
136 	if (t->regdirty)
137@@ -1114,23 +1123,7 @@ INLINE INT32 prepare_tmu(tmu_state *t)
138 				ncc_table_update(n);
139 		}
140 	}
141-
142-	/* compute (ds^2 + dt^2) in both X and Y as 28.36 numbers */
143-	texdx = (INT64)(t->dsdx >> 14) * (INT64)(t->dsdx >> 14) + (INT64)(t->dtdx >> 14) * (INT64)(t->dtdx >> 14);
144-	texdy = (INT64)(t->dsdy >> 14) * (INT64)(t->dsdy >> 14) + (INT64)(t->dtdy >> 14) * (INT64)(t->dtdy >> 14);
145-
146-	/* pick whichever is larger and shift off some high bits -> 28.20 */
147-	if (texdx < texdy)
148-		texdx = texdy;
149-	texdx >>= 16;
150-
151-	/* use our fast reciprocal/log on this value; it expects input as a */
152-	/* 16.32 number, and returns the log of the reciprocal, so we have to */
153-	/* adjust the result: negative to get the log of the original value */
154-	/* plus 12 to account for the extra exponent, and divided by 2 to */
155-	/* get the log of the square root of texdx */
156-	(void)fast_reciplog(texdx, &lodbase);
157-	return (-lodbase + (12 << 8)) / 2;
158+	return 0;
159 }
160 
161 
162@@ -1307,7 +1302,7 @@ static void update_statistics(voodoo_sta
163 
164 void register_w(UINT32 offset, UINT32 data) {
165 //	voodoo_reg reg;
166-	UINT32 regnum;
167+	UINT32 regnum  = (offset) & 0xff;
168 	UINT32 chips   = (offset>>8) & 0xf;
169 //	reg.u = data;
170 
171@@ -2779,14 +2788,18 @@ UINT32 register_r(UINT32 offset)
172 			//result |= v->fbi.vblank << 6;
173 			result |= (Voodoo_GetRetrace() ? 0x40u : 0u);
174 
175-			if (v->pci.op_pending) {
176-				/* bit 7 is FBI graphics engine busy */
177+
178+			/* bit 7 is FBI graphics engine busy */
179+			if (v->pci.op_pending)
180 				result |= 1 << 7;
181-				/* bit 8 is TREX busy */
182+
183+			/* bit 8 is TREX busy */
184+			if (v->pci.op_pending)
185 				result |= 1 << 8;
186-				/* bit 9 is overall busy */
187+
188+			/* bit 9 is overall busy */
189+			if (v->pci.op_pending)
190 				result |= 1 << 9;
191-			}
192 
193 			/* bits 11:10 specifies which buffer is visible */
194 			result |= (UINT32)(v->fbi.frontbuf << 10);
195@@ -3011,7 +3025,6 @@ void voodoo_init(int type) {
196 	{
197 		UINT32 value = (1 << RECIPLOG_LOOKUP_BITS) + val;
198 		voodoo_reciplog[val*2 + 0] = (1u << (RECIPLOG_LOOKUP_PREC + RECIPLOG_LOOKUP_BITS)) / value;
199-		voodoo_reciplog[val*2 + 1] = (UINT32)(LOGB2((double)value / (double)(1u << RECIPLOG_LOOKUP_BITS)) * (double)(1u << RECIPLOG_LOOKUP_PREC));
200 	}
201 
202 	for (UINT32 val = 0; val < RASTER_HASH_SIZE; val++)

Reply 2033 of 2421, by hail-to-the-ryzen

Posted on 2020-11-28, 21:49

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

These are fixes from recent SDL12 commits to provide audio compatibility with builds for macOS 11.0 ("Big Sur"):

1--- a/src/audio/macosx/SDL_coreaudio.c	Sun Jul 05 22:11:10 2020 +0300
2+++ b/src/audio/macosx/SDL_coreaudio.c	Fri Jul 17 17:44:34 2020 -0400
3@@ -193,7 +208,7 @@
4         return;
5     }
6 
7-    result = CloseComponent(outputAudioUnit);
8+    result = AudioComponentInstanceDispose_fn (outputAudioUnit);
9     if (result != noErr) {
10         SDL_SetError("Core_CloseAudio: CloseComponent");
11         return;
12@@ -212,8 +227,8 @@
13 int Core_OpenAudio(_THIS, SDL_AudioSpec *spec)
14 {
15     OSStatus result = noErr;
16-    Component comp;
17-    ComponentDescription desc;
18+    AudioComponent_t comp;
19+    AudioComponentDesc_t desc;
20     struct AURenderCallbackStruct callback;
21     AudioStreamBasicDescription requestedDesc;
22 
23@@ -233,23 +248,23 @@
24     requestedDesc.mBytesPerFrame = requestedDesc.mBitsPerChannel * requestedDesc.mChannelsPerFrame / 8;
25     requestedDesc.mBytesPerPacket = requestedDesc.mBytesPerFrame * requestedDesc.mFramesPerPacket;
26 
27-
28     /* Locate the default output audio unit */
29+    SDL_memset(&desc, '\0', sizeof (desc));
30     desc.componentType = kAudioUnitType_Output;
31     desc.componentSubType = kAudioUnitSubType_DefaultOutput;
32     desc.componentManufacturer = kAudioUnitManufacturer_Apple;
33     desc.componentFlags = 0;
34     desc.componentFlagsMask = 0;
35     
36-    comp = FindNextComponent (NULL, &desc);
37+    comp = AudioComponentFindNext_fn (NULL, &desc);
38     if (comp == NULL) {
39-        SDL_SetError ("Failed to start CoreAudio: FindNextComponent returned NULL");
40+        SDL_SetError ("Failed to start CoreAudio: AudioComponentFindNext returned NULL");
41         return -1;
42     }
43     
44     /* Open & initialize the default output audio unit */
45-    result = OpenAComponent (comp, &outputAudioUnit);
46-    CHECK_RESULT("OpenAComponent")
47+    result = AudioComponentInstanceNew_fn (comp, &outputAudioUnit);
48+    CHECK_RESULT("AudioComponentInstanceNew")
49 
50     result = AudioUnitInitialize (outputAudioUnit);
51     CHECK_RESULT("AudioUnitInitialize")
52
53--- a/src/audio/macosx/SDL_coreaudio.h	Sat Nov 14 14:02:50 2020 +0300
54+++ b/src/audio/macosx/SDL_coreaudio.h	Sun Nov 15 03:51:10 2020 +0300
55@@ -29,8 +29,25 @@
56 /* Hidden "this" pointer for the video functions */
57 #define _THIS	SDL_AudioDevice *this
58 
59+#if (MAC_OS_X_VERSION_MIN_REQUIRED < 1060) || \
60+    (!defined(AUDIO_UNIT_VERSION) || ((AUDIO_UNIT_VERSION + 0) < 1060))

…Show last 22 lines

61+typedef struct ComponentDescription	AudioComponentDesc_t;
62+typedef Component			AudioComponent_t;
63+typedef AudioUnit			AudioComponentInstance_t;
64+#define AudioComponentInstanceNew_fn		OpenAComponent
65+#define AudioComponentInstanceDispose_fn	CloseComponent
66+#define AudioComponentFindNext_fn		FindNextComponent
67+#else
68+typedef AudioComponentDescription	AudioComponentDesc_t;
69+typedef AudioComponent			AudioComponent_t;
70+typedef AudioComponentInstance		AudioComponentInstance_t;
71+#define AudioComponentInstanceNew_fn		AudioComponentInstanceNew
72+#define AudioComponentInstanceDispose_fn	AudioComponentInstanceDispose
73+#define AudioComponentFindNext_fn		AudioComponentFindNext
74+#endif
75+
76 struct SDL_PrivateAudioData {
77-	AudioUnit outputAudioUnit;
78+	AudioComponentInstance_t outputAudioUnit;
79 	void *buffer;
80 	UInt32 bufferOffset;
81 	UInt32 bufferSize;

Reply 2034 of 2421, by hail-to-the-ryzen

Posted on 2020-11-28, 21:53

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

And a bug concerning recent SDL12 ARM assembly code for the blit and fill routines:
https://bugzilla.libsdl.org/show_bug.cgi?id=4365

Maybe this will be enough patches for now.

Reply 2035 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 22:18

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

I already modified SDL1.2 in-tree to do almost exactly that for Big Sur to fix the lack of audio. It took a bit of browsing around Apple's developer site, but I figured it out in about half an hour.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2036 of 2421, by hail-to-the-ryzen

Posted on 2020-11-28, 22:38

hail-to-the-ryzen Offline

Rank Member

Rank: Member
Posts: 441
Joined: 2017-03-09, 01:34

In that case, it seems that they deprecated the old API for a new API by renaming the functions. 😀 Seems that the native build for Big Sur has no major practical advantage yet over the Intel build in its emulation layer, although your native build is important to maintain even if that is the case.

Reply 2037 of 2421, by jmarsh

Posted on 2020-11-28, 22:43

jmarsh Online

Rank Oldbie

Rank: Oldbie
Posts: 1725
Joined: 2014-01-04, 09:17

TheGreatCodeholio wrote on 2020-11-28, 16:52:

It looks like there is a Darwin/mach-specific task remap function to make exactly that kind of split mapping, because Apple themselves uses it on iOS to JIT compile JavaScript. However portability is a concern and there are Linux systems with the same restriction, so I may have to either implement both or just use the shmget() shared memory file handle mmap trick that Linux supports.

The vm_remap call is only required on macos because it pretty much disallows mmap'ing anything (shared memory, tempfile, etc.) as executable, and that is the *nix portable way to map the same memory space at two different locations.
Shared memory doesn't work because apple mounts it with no-exec (and several linuxes are doing the same).
The patch from the other thread has been tested and works on SELinux, although it has a few cosmetic issues that need fixing (and could be made more secure by eliminating the offset variable).

Reply 2038 of 2421, by jmarsh

Posted on 2020-11-28, 22:55

jmarsh Online

Rank Oldbie

Rank: Oldbie
Posts: 1725
Joined: 2014-01-04, 09:17

Ringding wrote on 2020-11-28, 17:10:

You can also not mix i686 and x86_64 inside one process.

You can on windows. All 32-bit apps get loaded with a 64-bit code segment accessible to them, for communicating with the kernel (it's how WoW64 works). Since the segment value is hardcoded and x86 has unprivileged instructions to get a segment's base and limit, it's trivial to find it and map new code into it...

Reply 2039 of 2421, by TheGreatCodeholio

Posted on 2020-11-28, 23:49

TheGreatCodeholio Offline

Rank Oldbie

Rank: Oldbie
Posts: 819
Joined: 2011-08-18, 20:15
Location: Seattle, WA

jmarsh wrote on 2020-11-28, 22:55:

Ringding wrote on 2020-11-28, 17:10:

You can also not mix i686 and x86_64 inside one process.

You can on windows. All 32-bit apps get loaded with a 64-bit code segment accessible to them, for communicating with the kernel (it's how WoW64 works). Since the segment value is hardcoded and x86 has unprivileged instructions to get a segment's base and limit, it's trivial to find it and map new code into it...

Back in the Windows 95/98/ME days there were hidden ordinal entry points in KERNEL32.DLL that allowed Win32 code to call Win16 functions. Not everything was 32-bit at the time, so the API was there to let Win32 call down to the Win16 underworld when needed. So at least under Windows 9x/ME, you *could* mix 16-bit and 32-bit code, or at least call 16-bit code.

On the Windows NT kernel side of things, a 16-bit Windows 3.x application running under NTVDM.EXE could easily make calls to Win32 code using the WOW interface.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Main menu

Topic actions

Reply 2020 of 2421, by TheGreatCodeholio

Reply 2021 of 2421, by TheGreatCodeholio

Reply 2022 of 2421, by TheGreatCodeholio

Reply 2023 of 2421, by hail-to-the-ryzen

Reply 2024 of 2421, by jmarsh

Reply 2025 of 2421, by Arthandas

Reply 2026 of 2421, by latalante

Reply 2027 of 2421, by TheGreatCodeholio

Reply 2028 of 2421, by TheGreatCodeholio

Reply 2029 of 2421, by TheGreatCodeholio

Reply 2030 of 2421, by Ringding

Reply 2031 of 2421, by Dominus

Reply 2032 of 2421, by hail-to-the-ryzen

Reply 2033 of 2421, by hail-to-the-ryzen

Reply 2034 of 2421, by hail-to-the-ryzen

Reply 2035 of 2421, by TheGreatCodeholio

Reply 2036 of 2421, by hail-to-the-ryzen

Reply 2037 of 2421, by jmarsh

Reply 2038 of 2421, by jmarsh

Reply 2039 of 2421, by TheGreatCodeholio