Boxedwine (Wine on multiple platforms)

Reply 140 of 184, by kjliew

Posted on 2021-03-02, 20:47

kjliew Offline

Rank Oldbie

Rank: Oldbie
Posts: 1304
Joined: 2004-01-08, 03:03

danoon wrote on 2021-03-02, 20:29:

(Requiem) ... but the D3D version always complains about DirectX 6.1.

You can update the game with 1.3 patch to remove DirectX version check, or check out here to patch out DirectX version check for the more stable 1.2 version or the free playable demo. I recommend version 1.2 for Direct3D renderer. The Direct3D version can render any native resolutions including 16:10 and 16:9 widescreen, so that makes it a better version than 3Dfx Glide. I had tried 1440x900 and 1280x960.

The Direct3D version should work better with Boxedwine as it takes out the additional layer of translation from nGlide->Wine.

Reply 141 of 184, by danoon

Posted on 2021-03-02, 21:54

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

kjliew wrote on 2021-03-02, 20:47:

danoon wrote on 2021-03-02, 20:29:

(Requiem) ... but the D3D version always complains about DirectX 6.1.

You can update the game with 1.3 patch to remove DirectX version check, or check out here to patch out DirectX version check for the more stable 1.2 version or the free playable demo. I recommend version 1.2 for Direct3D renderer. The Direct3D version can render any native resolutions including 16:10 and 16:9 widescreen, so that makes it a better version than 3Dfx Glide. I had tried 1440x900 and 1280x960.

The Direct3D version should work better with Boxedwine as it takes out the additional layer of translation from nGlide->Wine.

Thanks for that, the hex edits worked well. The demo seems a bit touchy, after exiting, if I go back in it gives me a path error. Maybe I need to implement snapshots in Boxedwine. All it would do is zip up the container folder. 😀

https://github.com/danoon2/Boxedwine

Reply 142 of 184, by alberthamik

Posted on 2021-03-03, 02:35

alberthamik Offline

Rank Newbie

Rank: Newbie
Posts: 40
Joined: 2017-07-09, 13:08

Yeah I did in fact install the nglide wrapper and the Glide renderer still performed terribly. Now I have heard of the D3D renderer being buggy from even a former developer themselves, but I was unaware of people finding the 1.2 version being less buggy. I might try that (potential differences of game functionality aside as I speedrun the game), but one issue that will crop up regardless is the fact that the mouse loses focus when the game creates a new window, and I do remember this actually happening for me in the past when testing the game with dxwnd and also in Linux with regular Wine. It happens regardless of the renderer used, which is quite annoying. But I vaguely remember there being an option in regular old Wine to be able to tell it not to lose focus of the mouse regardless of a change in window state, so long as the window was in focus at least. I think I'm remembering correctly anyways.

Oh right, and also on the note of things I can do in regular wine, for games like Requiem I use a winmm wrapper to allow for playing ripped CD audio, and in regular wine I'm able to change the dynamic library settings to tell Wine to use my native winmm.dll. Is there a way of accessing winecfg in Boxedwine?

Reply 143 of 184, by danoon

Posted on 2021-03-03, 16:57

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

alberthamik wrote on 2021-03-03, 02:35:

Yeah I did in fact install the nglide wrapper and the Glide renderer still performed terribly. Now I have heard of the D3D renderer being buggy from even a former developer themselves, but I was unaware of people finding the 1.2 version being less buggy. I might try that (potential differences of game functionality aside as I speedrun the game), but one issue that will crop up regardless is the fact that the mouse loses focus when the game creates a new window, and I do remember this actually happening for me in the past when testing the game with dxwnd and also in Linux with regular Wine. It happens regardless of the renderer used, which is quite annoying. But I vaguely remember there being an option in regular old Wine to be able to tell it not to lose focus of the mouse regardless of a change in window state, so long as the window was in focus at least. I think I'm remembering correctly anyways.

Oh right, and also on the note of things I can do in regular wine, for games like Requiem I use a winmm wrapper to allow for playing ripped CD audio, and in regular wine I'm able to change the dynamic library settings to tell Wine to use my native winmm.dll. Is there a way of accessing winecfg in Boxedwine?

Boxedwine is really a Linux kernel emulator that happens to run Wine. For the most part Wine is just a normal 32-bit build. I wrote my own winex11.drv replacement, but other than that, its just a stock version of Wine. So everything you can do with Wine on Linux, hopefully you can do with Wine inside of Boxedwine. With that said, playing CD audio won't work because I haven't implemented support for hardware CD's or ISO. But if the winmm wrapper loads the ISO directly, then that could work, the ISO would just need to be inside the emulated file system. Each "Container" in the UI is it's own file system for Linux which contains the .wine prefix in "root/home/username/.wine". You are free to modify this file system when Boxedwine is not running. To run winecfg from the UI, just go to the container tab, select the container on the left, then hit the "Run Wine App" on the right. See the screen shots below.

The attachment wine-app.jpg is no longer available

The attachment chose-app.jpg is no longer available

https://github.com/danoon2/Boxedwine

Reply 144 of 184, by alberthamik

Posted on 2021-03-08, 01:11

alberthamik Offline

Rank Newbie

Rank: Newbie
Posts: 40
Joined: 2017-07-09, 13:08

Forgot to reply, but I did test that winecfg finally as you mentioned it, though it had been awhile since I'd done the dynamic library override and at the time I'd not figured out which dlls to get music working if at all. It could be boxedwine needs me to override more than just one dll, like the ogg vorbis ones too perhaps. Regardless, I didn't test further as I was still having performance issues with Requiem. I have hopes your planned tweak to the 64-bit executable might help me with Gunmetal.

Reply 145 of 184, by danoon

Posted on 2021-04-20, 17:15

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

After a few hiccups with gog.com testing on Windows 7 (SDL audio didn't like Win 7), Boxedwine is now being used on gog.com and Steam to bring an old game back to life, The Voodoo Kid. I'm pretty happy to see this, software preservation was always a goal of Boxedwine.

Steam: https://store.steampowered.com/app/1378260/Voodoo_Kid/
GOG: https://www.gog.com/game/voodoo_kid

https://github.com/danoon2/Boxedwine

Reply 146 of 184, by jmarsh

Posted on 2021-04-20, 22:25

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1694
Joined: 2014-01-04, 09:17

Am I blind or does your website (boxedwine.org) not link to your repo? It mentions being GPL but there's no sign of code anywhere.
The reason I was looking for it is because DOSBox is bugged for bittest opcodes (bt/bts/btr/btc) when the bitbase is a 16-bit memory address and the (register) bitoffset points outside of the segment. It looks like you may have the same problem.

Reply 147 of 184, by danoon

Posted on 2021-04-20, 22:41

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

https://sourceforge.net/p/boxedwine/source/ci/master/tree/

https://github.com/danoon2/Boxedwine

Yeah, I should make that more clear. Thanks for that.

I don't any segment bounds checking, I just blindly add the address. I have seen a few Win16 games that I just can't seem to get to work. I wonder if that has anything to do with it.

code for calculating address is
https://github.com/danoon2/Boxedwine/blob/mas … n/cpu/decoder.h

1cpu->seg[op->base].address + (U16)(cpu->reg[op->rm].u16 + (S16)cpu->reg[op->sibIndex].u16 + op->disp)

The bit instructions then just blindly use the address
https://github.com/danoon2/Boxedwine/blob/mas … /common_bit.cpp

1void common_bte16r16(CPU* cpu, U32 address, U32 reg) {
2    U16 mask=1 << (cpu->reg[reg].u16 & 15);
3    U16 value;
4    cpu->fillFlagsNoCF();
5    address+=(((S16)cpu->reg[reg].u16)>>4)*2;
6    value = readw(address);
7    cpu->setCF(value & mask);
8}

Last edited by danoon on 2021-04-21, 01:00. Edited 1 time in total.

https://github.com/danoon2/Boxedwine

Reply 148 of 184, by jmarsh

Posted on 2021-04-21, 01:00

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1694
Joined: 2014-01-04, 09:17

It's not protected mode segment bounds/limit related, but rather the cases when BaseOffset+(BitOffset/8) is negative or exceeds 64k. So yeah, you have the same problem as DOSBox; the BaseOffset is added to the segment base followed separately by the BitOffset, when they need to be combined first and masked to 16 bits.

Looking at the rest of that file, another issue might be that you're setting the carry flag before the result is written to memory. If the store triggers a page fault it would need to be reverted to the original value to match real hardware.

Last edited by jmarsh on 2021-04-21, 01:09. Edited 1 time in total.

Reply 149 of 184, by danoon

Posted on 2021-04-21, 01:04

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

jmarsh wrote on 2021-04-21, 01:00:

It's not protected mode segment bounds/limit related, but rather the cases when BaseOffset+(BitOffset/8) is negative or exceeds 64k.

So are you saying something like

address+=(((S16)cpu->reg[reg].u16)>>4)*2;

should be masked off like

address+= (S16) ( (((S16)cpu->reg[reg].u16)>>4)*2);

https://github.com/danoon2/Boxedwine

Reply 150 of 184, by jmarsh

Posted on 2021-04-21, 01:11

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1694
Joined: 2014-01-04, 09:17

No, it needs to be combined with the base offset and the whole thing limited to 16 bits.

address = cpu->seg[op->base].address + (U16)(cpu->reg[op->rm].u16 + (S16)cpu->reg[op->sibIndex].u16 + op->disp + ((((S16)cpu->reg[reg].u16)>>4)*2));

Those opcodes basically let you use three registers for memory addressing instead of the usual two, which nothing else does (until you reach AVX opcodes...)

Reply 151 of 184, by danoon

Posted on 2021-04-21, 01:57

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

jmarsh wrote on 2021-04-21, 01:11:

No, it needs to be combined with the base offset and the whole thing limited to 16 bits.

address = cpu->seg[op->base].address + (U16)(cpu->reg[op->rm].u16 + (S16)cpu->reg[op->sibIndex].u16 + op->disp + ((((S16)cpu->reg[reg].u16)>>4)*2));

Those opcodes basically let you use three registers for memory addressing instead of the usual two, which nothing else does (until you reach AVX opcodes...)

Thank you for that. As soon as I read that I knew that made sense. Now I will have to refactor my code to not precaclulate the addresses for these bit instructions.

https://github.com/danoon2/Boxedwine

Reply 152 of 184, by danoon

Posted on 2021-04-21, 03:18

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

jmarsh, have you seen any Windows games that call these instructions? I was disappointed this didn't just magically fix CivNet for me. 😀

https://github.com/danoon2/Boxedwine

Reply 153 of 184, by jmarsh

Posted on 2021-04-21, 05:38

jmarsh Offline

Rank Oldbie

Rank: Oldbie
Posts: 1694
Joined: 2014-01-04, 09:17

No. I was told there is a decompressor that uses them but I had to write my own test app to actually confirm correct behaviour on real hardware.

I'm not sure how wine/linux do things but win95/98 uses lazy fpu saving/restoring via the TS flag in CR0 (control register 0). Running any app in win95 under DOSBox that uses the FPU on multiple threads causes all sorts of weird problems because it ignores the TS flag.

Reply 154 of 184, by danoon

Posted on 2021-04-21, 15:22

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

jmarsh wrote on 2021-04-21, 05:38:

No. I was told there is a decompressor that uses them but I had to write my own test app to actually confirm correct behaviour on real hardware.

I'm not sure how wine/linux do things but win95/98 uses lazy fpu saving/restoring via the TS flag in CR0 (control register 0). Running any app in win95 under DOSBox that uses the FPU on multiple threads causes all sorts of weird problems because it ignores the TS flag.

Boxedwine doesn't implement any of the control registers because it only emulates the app, Boxedwine can't be used to run an OS.

I found one use of the bit instructions, but it was a 32-bit version, so as far as I understand this issue only matters for the 16-bit instruction (or at least using 16-bit addressing). If you are curious, it was with the roller coaster tycoon demo, which isn't surprising since I think that is all hand coded ASM.

eip = 00554547

Bt DWORD PTR [DS:EBX<<2+7D71F8],ECX

EAX=0000001A ECX=0000001A EDX=0063CE08 EBX=00000000 ESP=0032FD7C EBP=00420280 ESI=0063CE5C EDI=00000000 SS=00000000 DS=00000000

https://github.com/danoon2/Boxedwine

Reply 155 of 184, by kjliew

Posted on 2021-05-15, 22:30

kjliew Offline

Rank Oldbie

Rank: Oldbie
Posts: 1304
Joined: 2004-01-08, 03:03

danoon wrote on 2021-02-23, 17:15:

I'm also curious how QEMU TCG ARM -> x86 will perform. I looked into TCG when starting Boxedwine. I'm sure its come a long ways since I last looked at it 10 years ago. It was pretty simple then, but the benefit of that was it was quick when generating the code. Since my ARMv8 code always translates every instruction (not JIT), I also try to keep the decoder/encoder as fast as possible. But even so, there is a lot of code to run when starting up Wine so it takes several seconds just to start the game/app on the Pi. Maybe in the future I could cache the generated code to speed up start times.

There is definitely a lot to work with on ARMv8 now. After the Pi, I plan to get Boxedwine running on the M1 chip. I can't wait to see how the M1 performs compared to my x64 core on my desktop machine.

Some Apple M1 results from the latest QEMU TCG for your reference.
Re: DOSBox SVN - Qemu on M1 benchmark comparison discussion

Reply 156 of 184, by danoon

Posted on 2021-05-16, 15:45

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

221 for MDK is pretty good, my x64 build running via Rosetta on the M1 gives about 130. My simple native ARM JIT (basically just inlines normal emulation), gets abouts about 50. The interesting thing is that on the Raspberry Pi4, my newer ARM binary translator gets 170 which is 6 about times faster than the ARM JIT which got 27, so maybe I will get better performance than QEMU. But right now the ARM translator doesn't work on M1 because of a page size issue. My code assumes 4k pages, and the M1 does 16K pages by default. I saw that the M1 supports total store ordering (TSO) which Rosetta uses to help with x86 emulation and as part of that it allows 4K pages. I will have to investigate if I can use TSO.

https://github.com/danoon2/Boxedwine

Reply 157 of 184, by kjliew

Posted on 2021-05-16, 19:11

kjliew Offline

Rank Oldbie

Rank: Oldbie
Posts: 1304
Joined: 2004-01-08, 03:03

I think you're right. The MDK DOS score of 170 on RPi4 is very good indeed. The M1 at 5nm process node and 3.2GHz is at least 2X more performance than RPi4 without even accounting for other enhancements such as PDDR/on-package unified memory. If your new ARMv8 binary translator worked on M1, then it could definitely match or exceed QEMU TCG performance.

Reply 158 of 184, by danoon

Posted on 2021-05-19, 19:54

danoon Offline

Rank Member

Rank: Member
Posts: 226
Joined: 2011-01-04, 19:12

kjliew wrote on 2021-05-16, 19:11:

I think you're right. The MDK DOS score of 170 on RPi4 is very good indeed. The M1 at 5nm process node and 3.2GHz is at least 2X more performance than RPi4 without even accounting for other enhancements such as PDDR/on-package unified memory. If your new ARMv8 binary translator worked on M1, then it could definitely match or exceed QEMU TCG performance.

I recently discovered something on my MacBook Pro (2019) 2.6 GHz Core i7

I was seeing 240 plus/minus on the Windows MDK Performance using Boxedwine even after changing some things that should have made a performance difference. This made me think that it could limited in how fast it updates the screen. I have the ability to skip frame updates in Boxedwine and when I turned this on, I now see 1100 for the performance on my Intel Mac. Since your numbers for QEMU was in the low 200's too, I wonder if you are coming up on this.

https://github.com/danoon2/Boxedwine

Reply 159 of 184, by kjliew

Posted on 2021-05-19, 22:12

kjliew Offline

Rank Oldbie

Rank: Oldbie
Posts: 1304
Joined: 2004-01-08, 03:03

I don't quite know well in such matter. Displaying frame buffer is one complicated area in QEMU which I do not understand very well. It does seem to use synchronous update based on one-shot display timer. This implementation very much favors virtualization performance to keep the VMCPU busy without taking VMEXIT for display update until the timer expired. This can be observed with PCPBENCH that the insane FPS from QEMU does not even produce any tearing. On the other hand, there is noticeable screen tearing once the FPS exceeded the panel refresh rate with DOSBox. Some would call this as "optimization" but then it makes comparison "not quite fair" as DOSBox was paying the price of updating frame buffer and display it while QEMU was updating the frame buffer but only display it at specific intervals.

If you skipped frames, then benchmark timing loops will timed faster and then score would improve, I guess. I had seen inflated 3DMark2000 scores when my GL pass-through was mishandling some vertex calls and basically the calls would end up faster. Fortunately, there were rendering artifacts that was easily noticed. When those were fixed, 3DMark2000 scores tanked.

Modern display software architecture requires predictable V-SYNC. It is a waste in both power and bandwidth to produce more frames than can be displayed in time.

Main menu