VOGONS


MMX

Topic actions

First post, by Zorbid

User metadata
Rank Member
Rank
Member

I may actually give a crack at implementing MMX, at least for the normal core.

The instruction set is self-contained, the only flags affected by MMX instruction is the FPU tag register. Any MMX opcode will set it to 0x0000 while the EMMS instruction sets it to 0xFFFF. At first sight, I think I may be able to do it; for intel hosts to begin with (using SSE2 intrinsics).

My (probably naive) plan is the following:

1) create a union type for the 8 mm registers.
2) memory fetch/write functions for 64 bit words
3) implement the various opcodes (prefix 0x0f, it should be straightforward with the intrinsics for most operations. I'll simulate the register allocation manually because, as far as I can tell, it's not possible to do register allocation from C++ intrinsics (though it's probably possible by using inline ASM but I prefer not to do that ATM))

I have 2 questions ATM:

1) I haven't tried (yet) to decipher the logic of the dynamic core. Can I add the instructions to the normal core without changing the dynamic one, or do I have to modify it too? I know that the dynamic core switches back to the normal one in some circumstances, but I don't know if it would be automatic in this case.

2) Is it possible to compile DOSBox for an x64 target. There's no support for 64 bit SSE2 intrinsics in MSVC++ (an arbitrary decision from microsoft), so I'd have to use the emulated instructions from the MMX library instead.

If I manage to get it working on the x86 host, I'll try to make a cross-platform version too. , and perhaps other CPUs later on. I'm afraid that this MMX library is Intel-only though... asm/sigcontext.h, which is included somewhere is architecture specific. There are alternatives, I'll have to check.

P.S. I expect this to be slow...

Reply 1 of 20, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Yes, dynamic core will fallback to normal core on any unimplemented instructions. So that's safe.

x64 targets use dynrec core, which is somewhat different from dynamic. Dynrec core calls C functions to do the stuff so any mmx support would need to have C-alternatives.

Only dynamic core should be x86 specific, normal core should be made with some C routines or similar...

I can take a look (if time permits) at the code, I have some experience with dynrec core writing (actually wd wrote most of it 😜) x86_64 backend 😀

http://www.si-gamer.net/gulikoza

Reply 2 of 20, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

Switch asm/sigcontext.h to glibc's signal.h, you change it in inc/context.h, to just #include <signal.h>

Whittled down the error messages to this so far.

$ make
make -C src
make[1]: Entering directory `/home/Administrator/mmx-emu-0.6/src'
gcc -g -O2 -Wall -funroll-loops -fomit-frame-pointer -I../include -c sigill_handler.c
sigill_handler.c:22: warning: parameter has incomplete type
sigill_handler.c:25: error: parameter `sigcontext' has incomplete type
sigill_handler.c: In function `detect_emmx':
sigill_handler.c:33: error: storage size of 'act' isn't known
sigill_handler.c:35: warning: implicit declaration of function `sigemptyset'
sigill_handler.c:38: warning: implicit declaration of function `sigaction'
sigill_handler.c:33: warning: unused variable `act'
sigill_handler.c: In function `mmx_emu_init':
sigill_handler.c:86: error: storage size of 'act' isn't known
sigill_handler.c:86: warning: unused variable `act'
sigill_handler.c: At top level:
sigill_handler.c:142: error: parameter `sigcontext' has incomplete type
sigill_handler.c: In function `mmx_ill_handler':
sigill_handler.c:166: error: storage size of 'act' isn't known
sigill_handler.c:166: warning: unused variable `act'
make[1]: *** [sigill_handler.o] Error 1
make[1]: Leaving directory `/home/Administrator/mmx-emu-0.6/src'
make: *** [all] Error 2

Reply 3 of 20, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

P.S. I expect this to be slow...

Me too 😀
But forget about the normal core as it uses the host fpu so mmx would get a mess for sure.

Both recompilers use the normal core for unhandled opcodes, see core_dynrec.cpp:

		case BR_Opcode:
// some instruction has been encountered that could not be translated
// (thus it is not part of the code block), the normal core will
// handle this instruction
CPU_CycleLeft+=CPU_Cycles;
CPU_Cycles=1;
return CPU_Core_Normal_Run();

Reply 4 of 20, by Zorbid

User metadata
Rank Member
Rank
Member

Thanks for the feedback/support.

gulikoza:

This is more a proof of concept than anything else. I'll first write a 32bit x86 solution (the easiest for me), and, if it works, will try to go cross platform (generic C, endian aware code, etc...).

ih8registrations:

Thank you for the cleanup. Keep me updated if you make more progress (or please post what you've done so far if you ever get bored).

wd:

SSE2 is a strict superset of MMX, but it uses different registers (xmm, which are 128bit wide although they can be used for MMX-like 64bit code as well), so I don't think it will interact with the FPU emulation, unless the latter directly uses the TAG word and registers of the host, without checking the emulated ones first, in which case it would indeed be problematic. Since I'm not fluent in ASM, I can't tell if it's the case ATM (I'd have to check each routine with an instructions reference for most steps, and I'd still probably miss some subtleties).

Any MMX instruction will set all FPU TAG flags to TAG_Valid, except EMMS that will set them to TAG_Empty. I think I remember that MMX instructions set the 16 unused bits of the FPU registers to 1.(*) I'll have to do some RE to get be sure that these instructions don't have other side effects, but, from what I've read so far, they don't.

The only tricky thing would be to properly map the MMX/FPU registers, with endian safe accessors. Otherwise, hackish code that relies on intewoven MMX/FPU instructions would fail.

Initially, I will implement the MMX registers separately (I don't want to modify the existing code too heavily), and will think about overlaying them on the FPU ones if problems arise.

(*) I'll probably use a function pointer to implement this, to get better performance. Each MMX instruction will call it before doing anything else. The first function will initialize the registers/Tag word and set the pointer to an empty function. Calling EMMS will do the cleanup and restore the pointer to the original function.

Reply 5 of 20, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

SSE2 is a strict superset of MMX, but it uses different registers (xmm, which are 128bit wide although they can be used for MMX-like 64bit code as well), so I don't think it will interact with the FPU emulation

On the host-side, yes (emulating the mmx instructions in sse(2)).

On the emulation side, you'd have to (talking about the dynamic core) move
the contents of the emulated mmx registers into the host fpu ones as the
registers are shared and the recompiler uses the host fpu.
If not talking about the recompiler, this part should be easier, though still
fully needed as the OS (emulated part ie. win9x or whatever) uses either
of the instruction sets to save the fpu/mmx state.

Reply 6 of 20, by Zorbid

User metadata
Rank Member
Rank
Member

Ok, I hadn't thought about multi-threaded code running in the host... Couldn't I get away with just dealing with it in the state saving/setting instructions (for the interpreter, at least)?

-- edit: I think it would then be better to overlay everything then, indeed, in the interpreter at least.

For the dynamic x86 code, wouldn't it make sense to use the host's MMX functions as well?

Reply 7 of 20, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Ok, I hadn't thought about multi-threaded code running in the host...

Well that shouldn't matter if you're using some C-library or whatever,
and even SSEx should be fine in asm blocks.
But for the emulation part you'll have to define a unified register set,
as afaik the registers are shared between the fpu and mmx (maybe check
bochs or so to be sure).

For the dynamic x86 code, wouldn't it make sense to use the host's MMX functions as well?

Yes, but that's not easy, promised 😉 Especially the state changing stuff might
be troublesome, but don't really know.

Reply 8 of 20, by DOSGuy

User metadata
Rank Newbie
Rank
Newbie

I know that the policy is not to add anything to DOSBox unless a DOS game needs it, and I had always thought that there were no MMX DOS games. I was going through my recent acquisitions today and found an "Designed for Intel MMX" logo with the text "Enhanced for MMX equiped computers" on JetFighter Full Burn. Check out the specs on this game!

Required:

Pentium 133 MHz
16 MB RAM
4x CD-ROM drive
DOS 5.0 or above
1 MB SVGA VESA-compatible video card
50 MB hard drive space
Mouse

Recommended:

Pentium 200 MHz
20 MB RAM
3Dfx video card
IPX/LAN
Joystick

So, I hate to be one of those people who asks for DOSBox to do all kinds of things that it was never intended to do, but there does seem to be a DOS game justification for including MMX support in the official version, should someone be able to implement it. JF:FB doesn't require MMX, but it can make use of it.

"Today entirely the maniac there is no excuse with the article." Get free BeOS, DOS, OS/2, and Windows games at RGB Classic Games

Reply 10 of 20, by DOSGuy

User metadata
Rank Newbie
Rank
Newbie

The Need for Speed won't run in SVGA mode unless it detects a Pentium processor. NFS doesn't require a Pentium processor, but DOSBox's ability to report a Pentium CPUID allows you to take full advantage of NFS (thank you for adding that feature).

I don't know if MMX improves the graphics/unlocks any gameplay modes in JF:FB, or simply speeds up gameplay. Even if it's only the latter, it's a tall order for DOSBox to perform like a Pentium 200. Maybe emulating MMX instructions would speed up the emulation for smoother gameplay.

"Today entirely the maniac there is no excuse with the article." Get free BeOS, DOS, OS/2, and Windows games at RGB Classic Games

Reply 12 of 20, by DOSGuy

User metadata
Rank Newbie
Rank
Newbie

Yes, it would be counterproductive to emulate instructions that aren't needed in the name of speed, since emulating them would actually be slower. I'm not sure if MMX does anything that can't be done without MMX on JF:FB, so that may not be a great example. I'll see if I can find an example of a DOS game that uses MMX to improve the graphics or unlock gameplay options.

Extreme Assault also carries the MMX logo on the box. Extreme Assault apparently uses MMX to do bilinear filtering on the explosion graphics.

"Today entirely the maniac there is no excuse with the article." Get free BeOS, DOS, OS/2, and Windows games at RGB Classic Games

Reply 13 of 20, by robertmo

User metadata
Rank l33t++
Rank
l33t++

MMX is raquired to play Rebel Moon Rising, but the game is for Windows and works on modern computers.

A different game "Rebel Moon" is a DOS game but requires (depending on version) either a 3D Blaster (based on Gaming Glint) or Rendition Verite V1000, so it won't work in DOSBox too. But there was also a PlayStation version, so nothing is lost I guess.

Reply 14 of 20, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

The 3D Blaster version of Rebel Moon does work in DOSBox if you change the -3dblaster parameter in REB3D.BAT to -vga; although some might still insist that hardware acceleration is "required" because of abysmal framerate without it.

Reply 15 of 20, by cfoesch

User metadata
Rank Newbie
Rank
Newbie
DOSGuy wrote:

Yes, it would be counterproductive to emulate instructions that aren't needed in the name of speed, since emulating them would actually be slower.

Emulating vector instructions actually does typically result in a speed increase compared to simple scalar instructions emulated at the same level (i.e. interpreter vs. interpreter, and dynamic recompiler vs dynamic recompiler.) The reason is that typically the vectorized instructions don't track flags, and they usually result in fewer loads/stores (thus, typically reducing the amount of address calculation which is a significant overhead to all memory accesses in emulation.)

When I wrote the Altivec emulation for PearPC, performance increased notably, not just from parallel ALU instructions, but also because a lot of library-based memcpy and memset instructions used Altivec as well.

However, implementing the MMX instructions with SSE2 will end up being twice as slow as implementing it with MMX for some processors, because on those processors the SSE2 versions actually perform two MMX ALU operations sequentially rather than doing them both in parallel. (Optimization guides recommend using MMX unless you're going to use the whole XMM register.)

As an example of my points, PXOR run as an MMX instruction does not affect flags at all, while XOR does. And since most MMX instructions work worse on unaligned memory accesses, almost all programs are written to ensure that it works on aligned memory accesses, which means a guarantee of no page-overlapping, and so you can simply perform one address translation for the PXOR memory access, rather than the 2 32-bit memory accesses in a scalar equivalent, which would naively require to address calculations.

Now, I want to be clear, I'm not saying it WILL be faster, what I'm saying is that it would be incorrect to assume that it will be slower.

Reply 17 of 20, by DOSGuy

User metadata
Rank Newbie
Rank
Newbie

Impressive analysis, cfoesch! If someone coded MMX emulation for DOSBox, I'm sure it would end up in the mega builds, but I wanted to point out that there are DOS games that use MMX for one reason or another, which might justify inclusion in the official build.

At any rate, thanks for the educational analysis.

"Today entirely the maniac there is no excuse with the article." Get free BeOS, DOS, OS/2, and Windows games at RGB Classic Games

Reply 19 of 20, by cfoesch

User metadata
Rank Newbie
Rank
Newbie
DOSGuy wrote:

Impressive analysis, cfoesch! ... At any rate, thanks for the educational analysis.

Well, I did "cheat" by having done all of it years ago... 😀