VOGONS

Common searches


DOSBox-X branch

Topic actions

Reply 2020 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie

In other news, I sat down with one of the new ARM64-based M1 Apple Macbooks and got DOSBox-X to run on those too. There are some considerations to make for those though, which I thought I'd share so SVN can compile for them too.

One is how to compile with SDL2. You're going to need to modify configure.ac and then autogen.sh it, because their configure script assumes that Darwin and ARM means iOS (that you're compiling for the iPad or iPhone), which is wrong. Remove the part of the case statement that tries to match *arm*darwin* leaving only the *ios* part.

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

I recall, though I can't confirm, that ARM64-based Linux distributions for the Raspberry Pi have the same W^X policy.

Hopefully this information will help DOSBox SVN improve itself for these new environments.

I'm well aware x86 builds also run on the M1 Macbooks (as demonstrated by LGR on Twitter), but still, there's better performance to be had as native ARM OS X code.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2021 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
ThankYou wrote on 2020-11-13, 20:44:

Quick 'thank you' for this, Jon and the other contributors.

Neither VirtualBox or VMPlayer would let me install Win98SE in a virtual machine on my Ryzen PC - both were rather faster but both had illegal operation / invalid page fault failures in Regsvr32 (and something else in VirtualBox) so didn't actually complete the install.

I've found the only way to install and run Windows 95 and Windows 98 in VirtualBox without crashing is to turn OFF the CPU-based virtualization extensions, forcing VirtualBox to use software emulation. VT-x, VirtualBox, and Windows 95/98 don't mix for some reason.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2022 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
Ringding wrote on 2020-11-11, 18:55:

Which compiler do you use for building? If it works on a Pentium 4, it’s likely built for SSE2 (-mfpmath=sse in gcc, might be set by default in recent mingw releases).

If you compile with build-mingw-lowend, does it help?

That script was originally designed to enable compiling on lower end systems by disabling the MT32 emulation (which tended to use SSE instructions).

There is code in src/gui/render.cpp to conditionally use SSE2 to speed up previous/current frame comparisons, but that should be conditional on CPUID reporting SSE2.

Can you use a debugger to point at the code that is faulting on Pentium III systems.

I've never really tested DOSBox-X on anything below a Pentium 4 though.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2023 of 2397, by hail-to-the-ryzen

User metadata
Rank Member
Rank
Member
TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.

Here is another vs2019 issue with type conversion (probably also optimization related):
https://developercommunity.visualstudio.com/c … ting-to-21.html

Here is a hint about x86_64 builds running in Apple M1:
"The system prevents you from mixing arm64 code and x86_64 code in the same process. Rosetta translation applies to an entire process, including all code modules that the process loads dynamically."

Reply 2024 of 2397, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

TheGreatCodeholio wrote on 2020-11-28, 08:47:

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

Constantly updating the mapping every time a block is written is slow.
Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.
The current ARM64 dynrec isn't really worth bothering with anyway, running the dyn_x86 core using emulation will give better performance.

(There is still a glaring bug regardless, being that the cache memory is allocated with malloc - meaning it is undefined behavior to use mprotect on it.)

Reply 2025 of 2397, by Arthandas

User metadata
Rank Newbie
Rank
Newbie

How do you set up DOSBox-X to show integer scaled window in fullscreen? I can do that easily in original DOSBox but I can't get it to work here.
To be perfectly clear what I want to accomplish: I want to scale 320x200 game to 1280x800 so it's pixel perfect with even pixels and I want to display that 1280x800 window in 1920x1080 fullscreen with black borders around it. So far no matter what fullscreen resolution I set in the config file, or whether I use autofit or aspect ratio correction it never scales that way.

The official DOSBox let's me do it by using OpenGLnb and setting fullresolution=1280x800 in dosboxTex1.conf. If I do the same in X it just displays 1280x800 window.

Reply 2026 of 2397, by latalante

User metadata
Rank Newbie
Rank
Newbie
jmarsh wrote on 2020-11-28, 12:38:

Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.

Unfortunately not yet. I was running qemu-5.2.0-rc1 on linux 10 days ago. Exits with the following message.

systemd-run --user --pty -p MemoryDenyWriteExecute=yes /opt/qemu-git/bin/qemu-system-x86_64 -machine pc,accel=tcg -net none

Could not allocate dynamic translator buffer

Same as OpenBSD with W^X active by default.
TCG requires WX access.
Disable W^X on OpenBSD. https://git.qemu.org/?p=qemu.git;a=commit;h=7 … ed8a6d94bd73db8

Edit:
There are actually patches for qemu for iOS/Darwin and W^X.
https://lists.nongnu.org/archive/html/qemu-de … 1/msg01766.html

Last edited by latalante on 2020-11-28, 20:12. Edited 1 time in total.

Reply 2027 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
Arthandas wrote on 2020-11-28, 13:36:

How do you set up DOSBox-X to show integer scaled window in fullscreen? I can do that easily in original DOSBox but I can't get it to work here.
To be perfectly clear what I want to accomplish: I want to scale 320x200 game to 1280x800 so it's pixel perfect with even pixels and I want to display that 1280x800 window in 1920x1080 fullscreen with black borders around it. So far no matter what fullscreen resolution I set in the config file, or whether I use autofit or aspect ratio correction it never scales that way.

The official DOSBox let's me do it by using OpenGLnb and setting fullresolution=1280x800 in dosboxTex1.conf. If I do the same in X it just displays 1280x800 window.

To be fair, DOSBox-X disabled changing the video mode in fullscreen because modern displays seem to take their time to re-display the desktop on mode changes. They aren't the CRTs of old.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2028 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote on 2020-11-28, 12:38:
Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (elimina […]
Show full quote
TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

TheGreatCodeholio wrote on 2020-11-28, 08:47:

The other is how to get dynamic core (dynrec) to work on ARM64. In it's default state, dynamic core will crash because ARM-based Mac OS X enforces a policy known as "write-xor-execute", meaning that mprotect() will allow you to map read + execute, or read + write, but not read + write + execute. DOSBox-X was able to work around this by attempting the read+write+execute first (as SVN does), and if that fails, does two mprotect() tests to determine whether it's because of "write-xor-execute" or because it's flat out not allowed. If its W^X, then it sets a flag and maps it read+write for the initial dynamic core work before then mapping read+execute. From that point on, whenever it needs to add a cache block, it remembers that W^X is in effect and mprotects it back to read+write during the modification before mapping it back to read+execute. With that modification, dynrec works on the Macbook just fine.

Constantly updating the mapping every time a block is written is slow.
Using two mappings, one using RW protection and the other using RX like this patch, is more efficient. QEMU recently adopted the same method referencing a presentation from Apple's head of security engineering.
The current ARM64 dynrec isn't really worth bothering with anyway, running the dyn_x86 core using emulation will give better performance.

(There is still a glaring bug regardless, being that the cache memory is allocated with malloc - meaning it is undefined behavior to use mprotect on it.)

Constantly updating the map is fast enough on the Macbook so far. I seem to get a 5-10% CPU load reduction according to top.

It looks like there is a Darwin/mach-specific task remap function to make exactly that kind of split mapping, because Apple themselves uses it on iOS to JIT compile JavaScript. However portability is a concern and there are Linux systems with the same restriction, so I may have to either implement both or just use the shmget() shared memory file handle mmap trick that Linux supports.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2029 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote on 2020-11-28, 12:38:
TheGreatCodeholio wrote on 2020-11-28, 08:40:

This typecast is optimized out by VS2019's compiler in Release builds, causing only the low 32 bits to be written in any case and a crash when executing the dynamically generated code.
DOSBox-X fixes the issue by explicitly masking the pointer by 0xFFFFFFFF for that test instead of relying on typecasting.

Casting to use sign/zero-extension is always preferable over masking because it puts the result in a different register (eliminating a move) to setup the comparison.
I've checked that code with values > 4GB before (by forcing the base address of the binary) and it worked, so it must be a VS regression.

Masking should be optimized properly by the compiler if the mask accomplishes the same thing, at least GCC seems to be smart enough about it.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2030 of 2397, by Ringding

User metadata
Rank Member
Rank
Member
hail-to-the-ryzen wrote on 2020-11-28, 11:19:

Here is a hint about x86_64 builds running in Apple M1:
"The system prevents you from mixing arm64 code and x86_64 code in the same process. Rosetta translation applies to an entire process, including all code modules that the process loads dynamically."

Yes, obviously. I doubt anyone would expect to be able to do something of this kind. You can also not mix i686 and x86_64 inside one process.

Reply 2031 of 2397, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

As I've been banging heads with Jmarsh against the ARM64 code, there is also a big problem with Apple's tightened security on the M1. While the compiled binary will work nicely on your own machine, an actual app bundle needs entitlements and codesigning with a developer account to make it work on other's machines. This was not fun to test again...
Based on Re: dynrec vs. secure platforms - opinions wanted we've been testing https://github.com/DominusExult/buildbot/blob … dosbox_wx.patch (with the entitlements at https://github.com/DominusExult/buildbot/blob … itlements.plist). Eventually it seems the way the patch handles the cache tempfile is never going to work on the Apple Silicon from an app bundle.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 2032 of 2397, by hail-to-the-ryzen

User metadata
Rank Member
Rank
Member

Because the Apple M1 emulation will not mix and match between architectures, it may be worthwhile to improve the voodoo emulation, perhaps even add banshee. Here is also a 3dfx voodoo patch against previous code in dosbox-x for hints. There was a recent update to that code in dosbox-x that reintroduced the faulty mipmapping emulation for the non-opengl path. It may be worthwhile to pursue other patches given the above comments.

diff -rupN 3dfx/voodoo_data.h 3dfx-2/voodoo_data.h
--- 3dfx/voodoo_data.h
+++ 3dfx-2/voodoo_data.h
@@ -115,6 +115,7 @@ enum
#define RECIP_OUTPUT_PREC 15
#define LOG_OUTPUT_PREC 8

+#define UNEXPECTED(exp) __builtin_expect(!!(exp), 0)


/*************************************
@@ -133,10 +134,10 @@ static const UINT8 dither_matrix_4x4[16]

static const UINT8 dither_matrix_2x2[16] =
{
- 2, 10, 2, 10,
- 14, 6, 14, 6,
- 2, 10, 2, 10,
- 14, 6, 14, 6
+ 8, 10, 8, 10,
+ 11, 9, 11, 9,
+ 8, 10, 8, 10,
+ 11, 9, 11, 9
};


@@ -859,25 +860,24 @@ INLINE UINT8 count_leading_zeros(UINT32
*
*************************************/

-INLINE INT64 fast_reciplog(INT64 value, INT32 *log2)
+INLINE INT32 fast_reciplog(INT64 value)
{
extern UINT32 voodoo_reciplog[];
- UINT32 temp, rlog;
+ UINT32 temp, recip;
UINT32 interp;
UINT32 *table;
- UINT64 recip;
- bool neg = false;
+ int neg = FALSE;
int lz, exp = 0;

/* always work with unsigned numbers */
if (value < 0)
{
value = -value;
- neg = true;
+ neg = TRUE;
}

/* if we've spilled out of 32 bits, push it down under 32 */
- if (value & LONGTYPE(0xffff00000000))
+ if (value & 0xffff00000000ull)
{
temp = (UINT32)(value >> 16);
exp -= 16;
@@ -886,9 +886,8 @@ INLINE INT64 fast_reciplog(INT64 value,
temp = (UINT32)value;

Show last 143 lines
 	/* if the resulting value is 0, the reciprocal is infinite */
- if (GCC_UNLIKELY(temp == 0))
+ if (UNEXPECTED(temp == 0))
{
- *log2 = 1000 << LOG_OUTPUT_PREC;
return neg ? 0x80000000 : 0x7fffffff;
}

@@ -907,15 +906,8 @@ INLINE INT64 fast_reciplog(INT64 value,

/* do a linear interpolatation between the two nearest table values */
/* for both the log and the reciprocal */
- rlog = (table[1] * (0x100 - interp) + table[3] * interp) >> 8;
recip = (table[0] * (0x100 - interp) + table[2] * interp) >> 8;

- /* the log result is the fractional part of the log; round it to the output precision */
- rlog = (rlog + (1 << (RECIPLOG_LOOKUP_PREC - LOG_OUTPUT_PREC - 1))) >> (RECIPLOG_LOOKUP_PREC - LOG_OUTPUT_PREC);
-
- /* the exponent is the non-fractional part of the log; normally, we would subtract it from rlog */
- /* but since we want the log(1/value) = -log(value), we subtract rlog from the exponent */
- *log2 = (INT32)((((int)exp - ((int)31 - (int)RECIPLOG_INPUT_PREC)) << (int)LOG_OUTPUT_PREC) - (int)rlog);

/* adjust the exponent to account for all the reciprocal-related parameters to arrive at a final shift amount */
exp += (RECIP_OUTPUT_PREC - RECIPLOG_LOOKUP_PREC) - (31 - RECIPLOG_INPUT_PREC);
@@ -927,7 +919,7 @@ INLINE INT64 fast_reciplog(INT64 value,
recip <<= exp;

/* on the way out, apply the original sign to the reciprocal */
- return neg ? -(INT64)recip : (INT64)recip;
+ return neg ? -recip : recip;
}


@@ -1725,8 +1717,7 @@ do \
{ \
INT32 blendr, blendg, blendb, blenda; \
INT32 tr, tg, tb, ta; \
- INT32 s, t, lod, ilod; \
- INT64 oow; \
+ INT32 oow, s, t, lod, ilod; \
INT32 smax, tmax; \
UINT32 texbase; \
rgb_union c_local; \
@@ -1734,15 +1725,15 @@ do \
/* determine the S/T/LOD values for this texture */ \
if (TEXMODE_ENABLE_PERSPECTIVE(TEXMODE)) \
{ \
- oow = fast_reciplog((ITERW), &lod); \
- s = (INT32)((oow * (ITERS)) >> 29); \
- t = (INT32)((oow * (ITERT)) >> 29); \
- lod = (LODBASE); \
+ oow = fast_reciplog((ITERW)); \
+ s = ((INT64)oow * (ITERS)) >> 29; \
+ t = ((INT64)oow * (ITERT)) >> 29; \
+ lod = (LODBASE); \
} \
else \
{ \
- s = (INT32)((ITERS) >> 14); \
- t = (INT32)((ITERT) >> 14); \
+ s = (ITERS) >> 14; \
+ t = (ITERT) >> 14; \
lod = (LODBASE); \
} \
\
diff -rupN 3dfx/voodoo_emu.cpp 3dfx-2/voodoo_emu.cpp
--- 3dfx/voodoo_emu.cpp 2020-07-25 22:34:11 -0400
+++ 3dfx-2/voodoo_emu.cpp 2020-07-25 22:34:15 -0400
@@ -1098,7 +1108,6 @@ void recompute_texture_params(tmu_state
INLINE INT32 prepare_tmu(tmu_state *t)
{
INT64 texdx, texdy;
- INT32 lodbase;

/* if the texture parameters are dirty, update them */
if (t->regdirty)
@@ -1114,23 +1123,7 @@ INLINE INT32 prepare_tmu(tmu_state *t)
ncc_table_update(n);
}
}
-
- /* compute (ds^2 + dt^2) in both X and Y as 28.36 numbers */
- texdx = (INT64)(t->dsdx >> 14) * (INT64)(t->dsdx >> 14) + (INT64)(t->dtdx >> 14) * (INT64)(t->dtdx >> 14);
- texdy = (INT64)(t->dsdy >> 14) * (INT64)(t->dsdy >> 14) + (INT64)(t->dtdy >> 14) * (INT64)(t->dtdy >> 14);
-
- /* pick whichever is larger and shift off some high bits -> 28.20 */
- if (texdx < texdy)
- texdx = texdy;
- texdx >>= 16;
-
- /* use our fast reciprocal/log on this value; it expects input as a */
- /* 16.32 number, and returns the log of the reciprocal, so we have to */
- /* adjust the result: negative to get the log of the original value */
- /* plus 12 to account for the extra exponent, and divided by 2 to */
- /* get the log of the square root of texdx */
- (void)fast_reciplog(texdx, &lodbase);
- return (-lodbase + (12 << 8)) / 2;
+ return 0;
}


@@ -1307,7 +1302,7 @@ static void update_statistics(voodoo_sta

void register_w(UINT32 offset, UINT32 data) {
// voodoo_reg reg;
- UINT32 regnum;
+ UINT32 regnum = (offset) & 0xff;
UINT32 chips = (offset>>8) & 0xf;
// reg.u = data;

@@ -2779,14 +2788,18 @@ UINT32 register_r(UINT32 offset)
//result |= v->fbi.vblank << 6;
result |= (Voodoo_GetRetrace() ? 0x40u : 0u);

- if (v->pci.op_pending) {
- /* bit 7 is FBI graphics engine busy */
+
+ /* bit 7 is FBI graphics engine busy */
+ if (v->pci.op_pending)
result |= 1 << 7;
- /* bit 8 is TREX busy */
+
+ /* bit 8 is TREX busy */
+ if (v->pci.op_pending)
result |= 1 << 8;
- /* bit 9 is overall busy */
+
+ /* bit 9 is overall busy */
+ if (v->pci.op_pending)
result |= 1 << 9;
- }

/* bits 11:10 specifies which buffer is visible */
result |= (UINT32)(v->fbi.frontbuf << 10);
@@ -3011,7 +3025,6 @@ void voodoo_init(int type) {
{
UINT32 value = (1 << RECIPLOG_LOOKUP_BITS) + val;
voodoo_reciplog[val*2 + 0] = (1u << (RECIPLOG_LOOKUP_PREC + RECIPLOG_LOOKUP_BITS)) / value;
- voodoo_reciplog[val*2 + 1] = (UINT32)(LOGB2((double)value / (double)(1u << RECIPLOG_LOOKUP_BITS)) * (double)(1u << RECIPLOG_LOOKUP_PREC));
}

for (UINT32 val = 0; val < RASTER_HASH_SIZE; val++)

Reply 2033 of 2397, by hail-to-the-ryzen

User metadata
Rank Member
Rank
Member

These are fixes from recent SDL12 commits to provide audio compatibility with builds for macOS 11.0 ("Big Sur"):

--- a/src/audio/macosx/SDL_coreaudio.c	Sun Jul 05 22:11:10 2020 +0300
+++ b/src/audio/macosx/SDL_coreaudio.c Fri Jul 17 17:44:34 2020 -0400
@@ -193,7 +208,7 @@
return;
}

- result = CloseComponent(outputAudioUnit);
+ result = AudioComponentInstanceDispose_fn (outputAudioUnit);
if (result != noErr) {
SDL_SetError("Core_CloseAudio: CloseComponent");
return;
@@ -212,8 +227,8 @@
int Core_OpenAudio(_THIS, SDL_AudioSpec *spec)
{
OSStatus result = noErr;
- Component comp;
- ComponentDescription desc;
+ AudioComponent_t comp;
+ AudioComponentDesc_t desc;
struct AURenderCallbackStruct callback;
AudioStreamBasicDescription requestedDesc;

@@ -233,23 +248,23 @@
requestedDesc.mBytesPerFrame = requestedDesc.mBitsPerChannel * requestedDesc.mChannelsPerFrame / 8;
requestedDesc.mBytesPerPacket = requestedDesc.mBytesPerFrame * requestedDesc.mFramesPerPacket;

-
/* Locate the default output audio unit */
+ SDL_memset(&desc, '\0', sizeof (desc));
desc.componentType = kAudioUnitType_Output;
desc.componentSubType = kAudioUnitSubType_DefaultOutput;
desc.componentManufacturer = kAudioUnitManufacturer_Apple;
desc.componentFlags = 0;
desc.componentFlagsMask = 0;

- comp = FindNextComponent (NULL, &desc);
+ comp = AudioComponentFindNext_fn (NULL, &desc);
if (comp == NULL) {
- SDL_SetError ("Failed to start CoreAudio: FindNextComponent returned NULL");
+ SDL_SetError ("Failed to start CoreAudio: AudioComponentFindNext returned NULL");
return -1;
}

/* Open & initialize the default output audio unit */
- result = OpenAComponent (comp, &outputAudioUnit);
- CHECK_RESULT("OpenAComponent")
+ result = AudioComponentInstanceNew_fn (comp, &outputAudioUnit);
+ CHECK_RESULT("AudioComponentInstanceNew")

result = AudioUnitInitialize (outputAudioUnit);
CHECK_RESULT("AudioUnitInitialize")

--- a/src/audio/macosx/SDL_coreaudio.h Sat Nov 14 14:02:50 2020 +0300
+++ b/src/audio/macosx/SDL_coreaudio.h Sun Nov 15 03:51:10 2020 +0300
@@ -29,8 +29,25 @@
/* Hidden "this" pointer for the video functions */
#define _THIS SDL_AudioDevice *this

+#if (MAC_OS_X_VERSION_MIN_REQUIRED < 1060) || \
+ (!defined(AUDIO_UNIT_VERSION) || ((AUDIO_UNIT_VERSION + 0) < 1060))
Show last 22 lines
+typedef struct ComponentDescription	AudioComponentDesc_t;
+typedef Component AudioComponent_t;
+typedef AudioUnit AudioComponentInstance_t;
+#define AudioComponentInstanceNew_fn OpenAComponent
+#define AudioComponentInstanceDispose_fn CloseComponent
+#define AudioComponentFindNext_fn FindNextComponent
+#else
+typedef AudioComponentDescription AudioComponentDesc_t;
+typedef AudioComponent AudioComponent_t;
+typedef AudioComponentInstance AudioComponentInstance_t;
+#define AudioComponentInstanceNew_fn AudioComponentInstanceNew
+#define AudioComponentInstanceDispose_fn AudioComponentInstanceDispose
+#define AudioComponentFindNext_fn AudioComponentFindNext
+#endif
+
struct SDL_PrivateAudioData {
- AudioUnit outputAudioUnit;
+ AudioComponentInstance_t outputAudioUnit;
void *buffer;
UInt32 bufferOffset;
UInt32 bufferSize;

Reply 2034 of 2397, by hail-to-the-ryzen

User metadata
Rank Member
Rank
Member

And a bug concerning recent SDL12 ARM assembly code for the blit and fill routines:
https://bugzilla.libsdl.org/show_bug.cgi?id=4365

Maybe this will be enough patches for now.

Reply 2035 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie

I already modified SDL1.2 in-tree to do almost exactly that for Big Sur to fix the lack of audio. It took a bit of browsing around Apple's developer site, but I figured it out in about half an hour.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 2036 of 2397, by hail-to-the-ryzen

User metadata
Rank Member
Rank
Member

In that case, it seems that they deprecated the old API for a new API by renaming the functions. 😀 Seems that the native build for Big Sur has no major practical advantage yet over the Intel build in its emulation layer, although your native build is important to maintain even if that is the case.

Reply 2037 of 2397, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
TheGreatCodeholio wrote on 2020-11-28, 16:52:

It looks like there is a Darwin/mach-specific task remap function to make exactly that kind of split mapping, because Apple themselves uses it on iOS to JIT compile JavaScript. However portability is a concern and there are Linux systems with the same restriction, so I may have to either implement both or just use the shmget() shared memory file handle mmap trick that Linux supports.

The vm_remap call is only required on macos because it pretty much disallows mmap'ing anything (shared memory, tempfile, etc.) as executable, and that is the *nix portable way to map the same memory space at two different locations.
Shared memory doesn't work because apple mounts it with no-exec (and several linuxes are doing the same).
The patch from the other thread has been tested and works on SELinux, although it has a few cosmetic issues that need fixing (and could be made more secure by eliminating the offset variable).

Reply 2038 of 2397, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
Ringding wrote on 2020-11-28, 17:10:

You can also not mix i686 and x86_64 inside one process.

You can on windows. All 32-bit apps get loaded with a 64-bit code segment accessible to them, for communicating with the kernel (it's how WoW64 works). Since the segment value is hardcoded and x86 has unprivileged instructions to get a segment's base and limit, it's trivial to find it and map new code into it...

Reply 2039 of 2397, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
jmarsh wrote on 2020-11-28, 22:55:
Ringding wrote on 2020-11-28, 17:10:

You can also not mix i686 and x86_64 inside one process.

You can on windows. All 32-bit apps get loaded with a 64-bit code segment accessible to them, for communicating with the kernel (it's how WoW64 works). Since the segment value is hardcoded and x86 has unprivileged instructions to get a segment's base and limit, it's trivial to find it and map new code into it...

Back in the Windows 95/98/ME days there were hidden ordinal entry points in KERNEL32.DLL that allowed Win32 code to call Win16 functions. Not everything was 32-bit at the time, so the API was there to let Win32 call down to the Win16 underworld when needed. So at least under Windows 9x/ME, you *could* mix 16-bit and 32-bit code, or at least call 16-bit code.

On the Windows NT kernel side of things, a 16-bit Windows 3.x application running under NTVDM.EXE could easily make calls to Win32 code using the WOW interface.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.