VOGONS


Windows 95, 3Dnow, and SSE

Topic actions

Reply 40 of 47, by Falcosoft

User metadata
Rank l33t
Rank
l33t

Hi,
I have made a video about using SSE/3DNow under 1st edition of Win95. For testing I have used my MandelX fractal generator and SoftIce:
https://youtu.be/ivJxALS7JyA
Conclusion:
If you start sse.com in autoexec.bat (without loading EMM386 in config.sys) then you can use SSE/SSE2/3 under Win95, but only with 1 program. Using more than one SSE capable programs simultaneously can cause problems/ crash. 3DNow can be used without any restrictions and you do not need special tools like sse.com. If OS supports saving/restoring FPU registers (Win95 does) then it automatically also supports MMX registers that 3DNow uses.
For confirming if a program really uses a given instruction set under Win95 I think SoftIce is the best tool.

Last edited by Falcosoft on 2019-05-03, 07:56. Edited 1 time in total.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 41 of 47, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie

Wow, I didn’t realize Falcosoft had a Vogons account and was still active. What a cool turn of events.

Check out DOSBox Distro:

https://sites.google.com/site/dosboxdistro/ [*]

a lightweight Linux distro (tinycore) which boots off a usb flash drive and goes straight to DOSBox.

Make your dos retrogaming experience portable!

Reply 43 of 47, by Falcosoft

User metadata
Rank l33t
Rank
l33t
Scali wrote:

Thanks Falcosoft! Saves me the trouble of making a proof-of-concept myself 😀

You're welcome! The pioneer helps where he can, and volunteers the community. (It's a real slogan from communist era Hungary. It's the 5th point from the 12 points of Pioneers.)
https://translate.google.hu/translate?hl=en&t … C3%25A9t_pontja
😀

Orkay wrote:

Either way, thanks for helping with clearing up all doubts about SSE handling in Windows 95.

I'm glad it helped. I have seen you have 450MHz K6-2 so just for the fun try MandelX with it both in FPU and in 3DNow mode. I think you will be surprised as I was when I had finished and tried the code. It's expected that 3DNow can be faster but it's not that optimized 3DNow can be more than 5x faster than also well optimized FPU code on K6-2/3. The SIMD nature of 3DNow cannot explain this difference. So I think it's rather the dual pipelined 3DNow execution units vs. the non-pipelined FPU of K6-2. On an Athlon 3DNow execution is proportionally slower and the FPU is much faster so the difference is not even ~2x.
One can only imagine how K6-2/3 could have worked with float intensive software if there had been more hand optimized 3DNow code at that time.

BinaryDemon wrote:

Wow, I didn’t realize Falcosoft had a Vogons account and was still active.

Yep, FSMP and related software are still actively developed and the 'support forum' is here on Vogons:
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 44 of 47, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie

One more thing about SSE. Even if the CPU supports it, the OS has to enable it for the instructions to work, otherwise it's treated as a #UD.

https://github.com/joncampbell123/doslib/blob … u/cpusse.c#L489

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 45 of 47, by Falcosoft

User metadata
Rank l33t
Rank
l33t
TheGreatCodeholio wrote on 2023-04-18, 15:31:

One more thing about SSE. Even if the CPU supports it, the OS has to enable it for the instructions to work, otherwise it's treated as a #UD.

https://github.com/joncampbell123/doslib/blob … u/cpusse.c#L489

Or you can enable SSE before the non-SSE aware OS is loaded as demonstrated in the above posts.
In case of DOS you can even enable SSE while the OS is running.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 46 of 47, by TheGreatCodeholio

User metadata
Rank Oldbie
Rank
Oldbie
Falcosoft wrote on 2023-04-18, 17:17:
TheGreatCodeholio wrote on 2023-04-18, 15:31:

One more thing about SSE. Even if the CPU supports it, the OS has to enable it for the instructions to work, otherwise it's treated as a #UD.

https://github.com/joncampbell123/doslib/blob … u/cpusse.c#L489

Or you can enable SSE before the non-SSE aware OS is loaded as demonstrated in the above posts.
In case of DOS you can even enable SSE while the OS is running.

Exactly, just like the source code I linked to which can enable from DOS assuming EMM386.EXE doesn't prevent it from writing control register CR4.

DOSBox-X project: more emulation better accuracy.
DOSLIB and DOSLIB2: Learn how to tinker and hack hardware and software from DOS.

Reply 47 of 47, by Scali

User metadata
Rank l33t
Rank
l33t
Falcosoft wrote on 2019-05-03, 16:17:

I'm glad it helped. I have seen you have 450MHz K6-2 so just for the fun try MandelX with it both in FPU and in 3DNow mode. I think you will be surprised as I was when I had finished and tried the code. It's expected that 3DNow can be faster but it's not that optimized 3DNow can be more than 5x faster than also well optimized FPU code on K6-2/3. The SIMD nature of 3DNow cannot explain this difference. So I think it's rather the dual pipelined 3DNow execution units vs. the non-pipelined FPU of K6-2. On an Athlon 3DNow execution is proportionally slower and the FPU is much faster so the difference is not even ~2x.
One can only imagine how K6-2/3 could have worked with float intensive software if there had been more hand optimized 3DNow code at that time.

One thing that 3DNow! has is fast approximations (Newton-Raphson) of 1/x and 1/sqrt(x) (which you can turn into a proper sqrt(x) by using the rule x*(1/sqrt(x)) == sqrt(x)).
If you use those, they are way faster than the x87 equivalents, as fdiv and fsqrt are some of the slowest instructions.
SSE offers similar instructions.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/