VOGONS


MMX/3DNow!/SSE usage

Topic actions

Reply 40 of 59, by 386SX

User metadata
Rank l33t
Rank
l33t
noshutdown wrote:

i have tried that 3dnow version of quake2, and i would say that it only works with glide or software renderer, but not for opengl. with 3dnow opengl there is only about 5% improvement in speed, and the colored lightmaps became very ugly. also the sounds became distorted, and i dunno why.

quake3 is listed as with 3dnow support, however a k6-3+550 still got smoked by celeron300a oc 450, at 6.6fps vs 9.6fps. it managed to slightly edge out the pentium mmx though.

I rememberd that the 3dnow! beta version of Quake2 with or without 3dfx support increased quiet a lot the frame rate, I was impressed by some results on benchmarks.

Reply 41 of 59, by villeneuve

User metadata
Rank Member
Rank
Member

Two retail non-game software titles with MMX support were the pretty famous Kai's Supergoo and Kai's Photo Soap.
Stuff like WinDVD also made good use of MMX and I remember some input plugins for Winamp did, too.
Where MMX also gave a good performance boost once implemented was with descrambling analog Pay-TV via softwares called MoreTV, FreeTV, Pubs3 etc..

Reply 42 of 59, by senrew

User metadata
Rank Oldbie
Rank
Oldbie

The box for Starfleet Academy has the MMX logo in the corner. Or at least the copy I got for Christmas 97 did. I remember back when I didn't know shit about any of this stuff that I was thrilled that my p200mmx was able to play the game.

Halcyon: PC Chips M525, P100, 64MB, Millenium 1, Voodoo1, AWE64, DVD, Win95B

Reply 43 of 59, by xjas

User metadata
Rank l33t
Rank
l33t

Impulse Tracker uses MMX for real-time resonant filters that can be applied to any track or instrument independently. A nice DAW-like feature for such an oldschool tracker. I use the hell out of those filters. 😀

twitch.tv/oldskooljay - playing the obscure, forgotten & weird - most Tuesdays & Thursdays @ 6:30 PM PDT. Bonus streams elsewhen!

Reply 45 of 59, by kool kitty89

User metadata
Rank Member
Rank
Member

I'm still trying to work out whether the Unreal Engine relies on MMX for the software renderer or not, or rather, how much it relies on it. Since it does make heavy use of MMX for the sound processing, that at least means it's unappealing to use FPU operations in parallel (especially a Quake-like pipelined FP-heavy, register-swap heavy mechanism) and could simply instead be an ALU-optimized renderer (possibly pentium-specific, or maybe more generalized for the range of AMD, Cyrix, and multi-generational Intel CPUs out there: though mostly the P55C and PII/Celeron given their MMX target).
They could use MMX for 3D matrix math and such while the actual pixel/texel rasterizer is on the straight ALU end, but I'd think the hardware SIMD functionality of MMX would make that appealing for MMX's 16bpp optimized RGB format with pack and unpack functionality and the fact 16-bit (or 15-bit) RGB doesn't allign with bytes or nybbles, so you need a lot of 5 and 6 bit shifts to pack and unpack RGB elements for shading, lighting, reflection, and alpha blending effects. (and only some alpha is dithered by default, namely 3D alpha/transluency effects, plus that can be disabled in the command line with little/no significant performance loss when I tried it)

Plus, the software renderer appears to disable the dithered texture filtering effect on non-MMX CPUs and seems to have minor rendering errors (my attempt with a Cyrix 6x86L had mipmap-error ish looking random red/green/blue pixel dot shimmering sorts of artifacts). Though that may just been an overall attempt to boost performance on presumed-slow processors. It may also resort to 16-bit integer math to try and speed things up. (I'm not sure on the P5 vs P6 vs Cyrix vs AMD speed advantages for actual 16-bit integer math over 32-bit, but if nothing else you'd be able to make use of 16-bit split registers and some speed gains with the larger number of physical registers available ... also a bigger deal for the P5 family vs CPUs with extended register sets with renaming ... also a big deal for the 486 class CPUs, though I don't think actual DX4/x5 CPUs were ever intended to be realistic platforms for Unreal's software renderer ... maybe they had the Winchip C6 or Winchip 2 in mind, especially since it at least supported MMX and the Winchip 2 had pretty decent MMX performance)

I haven't done Unreal based framerate/performance comparisons between CPUs and haven't seen that come up in other testing/benchmark compilations so far, so can't really glean anything from this beyond anecdotal testing I did a few years back where P55C, K6 (3.2V), and Cyrix MII all at 2.5x100 MHz, and performance all seemed pretty similar in both the software renderer and in Glide with a Voodoo 3 (2500 I think) and K6-2 seemed to be pretty close as well, but I didn't have a frame counter/indicator enabled for any of that. (but given the huge MMX performance boost seen in the K6-2 vs K6, there at least should be some difference there)

Additionally, I don't think there'd be much or any incentive to actually make use of 3DNow! functionality in the K6-2 if the engine was already ALU+MMX optimized around 16/32-bit integer ops, since the K6-2's MMX integer performance seems to be consistently better than its FP 3DNow! performance and MMX was a more widely supported feature at the time. (honestly, it would've made way more sense for Quake/Quake II to have an MMX patch than 3DNow! for the same reason, even if just for the OpenGL renderers, but I assume they didn't want to bother with changing other code to work around integers in their GL renderer implementation ... though Open GL itself supports integer and floating point format vertex data, so that wouldn't have been the reason; besides that, they could've supported integer math in GLquake/Quake II from the start as an option for better performance on CPUs with faster integer performance, or potentially even tweaking the software renderer to offload certain computations to the ALU: like using fixed point vertex math while not changing the span rasterizer itself, and associated FP perspective computation: something that would even help the K5 and 6x86 a good deal as both supported FPU pipelines or prefetch queues, as did the Cyrix 5x85 I believe, so allowing parallel execution like the P5 ... just with very different bias towards actual execution times, while 486 class CPUs and the K6 family didn't support that same sort of parallelism AFIK, even though the K6 FPU had significantly faster execution than the Cyrix one for many operations and even faster or lower latency than the P5 or P6 for some operations, particularly multiplication)

Plus, Unreal could be potentially quite playable on a fast 486 class CPU in Glide (or maybe MeTAL) modes, maybe Direct3D, so using integer math would help a good deal there, too, and much moreso on K5 and 6x86 based systems. (the K5 had particularly fast integer mul/div performance, which I'm pretty sure is why it also scores so high in Sandra's integer multimedia test without MMX, with good DSP-like multiply-accumulate performance without use of MMX, which also means it should be among the best non-MMX CPUs to run the sound driver in high quality mode, even if Epic didn't specifically consider that or at least didn't auto-detect it)

8-bit and 24/32-bit color modes are much more ALU rendering friendly as they pack/unpack easily along byte boundaries and pixels can be drawn as single 8-bit or 32-bit words (assuming unpacked 24-bit format), though 8bpp doesn't really pack/unpack at all, but relies on tables for indexed shading/lighting/blending effects (usually, though technically a packed 4+4 bit format could be used with logical 16x16 color/shade array for logical shading, or 16x16 color array for logical color blending with shading done via look-up). But Unreal uses 16-bit hicolor by default, doesn't support 8-bit rendering, and doesn't seem to have an affinity for the 32-bit color software rendering mode (I should compare framerates with dithered blending enabled and disabled at 16 and 32-bit color to be more sure on this, but haven't) I assume it makes significant use of pack/unpack SIMD MMX functionality at least for certain lighting and blending effects to make for fast 16-bit rendering. And even with blending aside, a pixel-by-pixel rasterizer would have little speed loss going from 16 to 32-bit color depth since it wouldn't be packing pixels into an output buffer for faster burst writes.

The exception might still be for non-MMX CPU rendering depending whether they set-up some sort of software-SIMD routine for packing pixels or a more direct/brute-force method of pixel-by-pixel bit-shifting to pack/unpack RGB elements. (in which case, there should be a real speed gain in 32-bit rendering mode: bandwidth usage would be higher, but that would be largely irrelevant for a pixel by pixel rasterizer since it's still going to be doing individual texel reads and pixel writes and not packing/buffering lines of pixels. (performance boost from cached read/write operations would be the main advantage for 16-bit data there, plus a smaller framebuffer)

Reply 47 of 59, by The Serpent Rider

User metadata
Rank l33t++
Rank
l33t++

Somebody tried that with Unreal Tournament. And as I recall, on really small maps with 1-2 bots you can squeeze around 10fps.

I must be some kind of standard: the anonymous gangbanger of the 21st century.

Reply 48 of 59, by kool kitty89

User metadata
Rank Member
Rank
Member

Current lowest spec set-up I've tried playing Unreal in is a Cyrix 6x86 at 1x60 MHz in software renderer mode ... but I didn't try it very long, so forget how unplayable it was. I did actually play through the intro section of the game at 1x75 MHz and the same with a 5k86 at 320x240. (that's another point I'll have to address in another thread: some 5k86 CPUs have an undocumented 1x multiplier setting, maybe even most/all of them do, but not k5s)
I didn't have a voodoo installed at the time, so I'm not sure how well Glide perofmrnace would fare. (though I'd think at those speeds, you're still in the ballpark of an overclocked 5x86/DX5 or DX4)

The 5k/K5 testing was partially due to the fast integer performance they have and my hunch that the Unreal engine is very integer-math intensive, even in non-MMX mode. (more relevant is the much faster IPC of the PR166 I have, but I've only done superficial play-testing there ... I'll have to do proper framerate tests eventually) Plus if nothing else, the K5's DSP-like integer performance should minimize the overhead on the sound processing.

In any case, all the software renderer cases I've tried without MMX-capable CPUs leads to weird green/blue pixel/line seam artifacts showing up for some reason. (not as weird or annoying as many directX artifacts at least, for what that's worth)

Also, I'm pretty sure it runs faster/better on a 6x86 at 2x75 than K5 at 2x66 and definitely runs better with my 6x86L at 3x68 MHz than the K5 at 2x75, but the latter also isn't stable at 3.5V and even 2x68 didn't fare too well stability wise. (the two IBM 6x86L PR-200s I have seem happy at 3.3 to 3.5V with decent cooling and 3x66 or 3x68 MHz, but are unreliable at 2x100 MHz in my P5A-B)

I've done less testing with P54 pentiums, plenty of play time on P55Cs, though. (that goes back to around 10 years ago when I was comparing a 3x100 MHz overclocked Cyrix MII with a 3x100 Pentium MMX along with a K6-2 300)

Reply 49 of 59, by leileilol

User metadata
Rank l33t++
Rank
l33t++

Unreal used to not start at all on 486s on initial release, it was a slightly later patch that did (though more likely for getting that to run on early 6x86s). Also when I did try an early enough Unreal on my 486 some decade back, I was getting around 5fps with my Voodoo2 on an empty DM map.

Also using MMX in a software renderer wouldn't be practical at all for costly cycles from switching reasons. It's definitely a thing Intel didn't want you to know when they pushed it for 3d-hardware-free colorful blur graphics in 1997.

apsosig.png
long live PCem

Reply 50 of 59, by silikone

User metadata
Rank Member
Rank
Member
leileilol wrote on 2020-03-01, 00:40:

Unreal used to not start at all on 486s on initial release, it was a slightly later patch that did (though more likely for getting that to run on early 6x86s). Also when I did try an early enough Unreal on my 486 some decade back, I was getting around 5fps with my Voodoo2 on an empty DM map.

Also using MMX in a software renderer wouldn't be practical at all for costly cycles from switching reasons. It's definitely a thing Intel didn't want you to know when they pushed it for 3d-hardware-free colorful blur graphics in 1997.

Does that really matter in the grand scheme of things? A few (hopefully) switches here and there would be nothing compared to the potential performance gain. I of course assume that great care was taken to avoid the pitfalls of MMX.

Do not refrain from refusing to stop hindering yourself from the opposite of watching nothing other than that which is by no means porn.

Reply 53 of 59, by marxveix

User metadata
Rank Member
Rank
Member

There is GLQuake port that uses 3Dnow, but where to get PPQuake now?
Info from russian, ixbt site:
https://translate.google.com/translate?sl=aut … gi?id%3D25:3024

PPQuake v0.41 (c) 2000 Peter Pawlowski <piotrpw@polbox.com>
Based on glQuake source code by id Software.

Most important changes from original glQuake:
- KickAss 3DNow! (tm) support - PPQuake will fail to run on a non-3DNow! CPU. If you use an emulator, run with -nocpuid to ignore CPUID results.
- vid_restart command - for now it just restarts whole program and tries to save / reload game if needed - pretty lame but better than nothing
- video menu works (see also: vid_restart)
- some variables are stored in windows registry because they're needed before config.cfg is loaded
- default console width is equal to screen width
- primary sound buffer is on by default, you can disable it by -noprimarysound
- gl_no8bit variable; nonzero value causes all textures to be loaded as 16-bit (takes effect after restarting)
- cool Quake2-like smooth model animation (3DNow! powered) - you can disable / enable it with gl_interpolate; models weren't designed for it and some of them (like nailguns) look crappy. Movement interpolation still doesn't work in multiplayer mode.
- weapon_cur command - prints current weapon number
- weapon_int command - syntax: "weapon_int x y" where x is weapon number returned by weapon_cur and y is interpolation switch; these settings aren't stored anywhere so put it in some autoexec.cfg to see what you want; type "weapon_int x" to see current interpolation state of given weapon.
- gl_fog - 0 is off, positive values ​​control density; based on id's experimantal code with a few modifications (GL_EXP instead of GL_LINEAR); probably won't work with most MiniGLs (but works with 3Dfx's one); use gl_fog_r, gl_fog_g and gl_fog_b to change color
- gl_waterwarp variable - nonzero value enables water warping effects; changes take effect after reloading a level.
- $% ^ * @! dedicated server works
- gl_foggywater (experimantal) - disables water transparency and adds some new special effects underwater.
- Winamp control commands - wa_play, wa_enqueue, wa_open, wa_next, wa_prev, wa_stop, wa_pause; Winamp needs to be run before ppQuake.
- ModPlug player controls - mp_play, mp_enqueue, mp_stop, etc.
- Tons of tweaks in particle system - use r_particles_density to change the amount of particles.
- A piece of hack that gets rid of lame jump tables in opengl32.dll - use gl_hack command to activate it. Be careful, it might cause very bad crashes. gl_hack command always displays number of functions fixed unless it fails to fix anything.

Known problems:
- windowed mode isn't in the menu yet and might have some lame side effects
- bad things happen when PPQuake loses focus - don't try to alt-esc or alt-tab
- Weapon model conversion occasionally messes up; if after intalling a new mod weapons look crappy in it, delete glquake directories in id1 directory and in mod's main directory. (this one comes from the original glQuake)

I took it here http://pp666.cjb.net/

31 different MiniGL/OpenGL Win9x files for all Rage 3 cards: Re: ATi RagePro OpenGL files

Reply 54 of 59, by dr.zeissler

User metadata
Rank l33t
Rank
l33t

Very interesting thread guys! thx!

I wondered why my K6-2/450 is relatively slow. I use Win95b with DX5. According to the posters above there is no use of 3D-Now because it's not available in DX5.
I did not use DX6 because games that require it will not run very good on my setup, because my little machine is equipped with RageIIc and PCX1. Though I can put in a Voodoo1 (the short one from orchid fits in that machine). But first the image quality in far worse and I lose the PCX1 and I love to check out other cards/api's and not always put a 3dfx in any machine.

Retro-Gamer 😀 ...on different machines

Reply 55 of 59, by Chadti99

User metadata
Rank Oldbie
Rank
Oldbie

Expendable has a 3DNow! Splash at startup and ads in-game so I assume it’s optimized for it but could also be marketing. I’ll get some pics later.

Attachments

Reply 56 of 59, by silikone

User metadata
Rank Member
Rank
Member

https://github.com/TheBearProject/UnrealEngin … Surf.cpp#L14034
https://github.com/TheBearProject/UnrealEngin … nLight.cpp#L582

Absolutely beautiful usage.

Do not refrain from refusing to stop hindering yourself from the opposite of watching nothing other than that which is by no means porn.

Reply 57 of 59, by Radeux

User metadata
Rank Newbie
Rank
Newbie
marxveix wrote on 2021-03-19, 18:03:
There is GLQuake port that uses 3Dnow, but where to get PPQuake now? Info from russian, ixbt site: https://translate.google.com/ […]
Show full quote

There is GLQuake port that uses 3Dnow, but where to get PPQuake now?
Info from russian, ixbt site:
https://translate.google.com/translate?sl=aut … gi?id%3D25:3024

PPQuake v0.41 (c) 2000 Peter Pawlowski <piotrpw@polbox.com>
Based on glQuake source code by id Software.

Most important changes from original glQuake:
- KickAss 3DNow! (tm) support - PPQuake will fail to run on a non-3DNow! CPU. If you use an emulator, run with -nocpuid to ignore CPUID results.
- vid_restart command - for now it just restarts whole program and tries to save / reload game if needed - pretty lame but better than nothing
- video menu works (see also: vid_restart)
- some variables are stored in windows registry because they're needed before config.cfg is loaded
- default console width is equal to screen width
- primary sound buffer is on by default, you can disable it by -noprimarysound
- gl_no8bit variable; nonzero value causes all textures to be loaded as 16-bit (takes effect after restarting)
- cool Quake2-like smooth model animation (3DNow! powered) - you can disable / enable it with gl_interpolate; models weren't designed for it and some of them (like nailguns) look crappy. Movement interpolation still doesn't work in multiplayer mode.
- weapon_cur command - prints current weapon number
- weapon_int command - syntax: "weapon_int x y" where x is weapon number returned by weapon_cur and y is interpolation switch; these settings aren't stored anywhere so put it in some autoexec.cfg to see what you want; type "weapon_int x" to see current interpolation state of given weapon.
- gl_fog - 0 is off, positive values ​​control density; based on id's experimantal code with a few modifications (GL_EXP instead of GL_LINEAR); probably won't work with most MiniGLs (but works with 3Dfx's one); use gl_fog_r, gl_fog_g and gl_fog_b to change color
- gl_waterwarp variable - nonzero value enables water warping effects; changes take effect after reloading a level.
- $% ^ * @! dedicated server works
- gl_foggywater (experimantal) - disables water transparency and adds some new special effects underwater.
- Winamp control commands - wa_play, wa_enqueue, wa_open, wa_next, wa_prev, wa_stop, wa_pause; Winamp needs to be run before ppQuake.
- ModPlug player controls - mp_play, mp_enqueue, mp_stop, etc.
- Tons of tweaks in particle system - use r_particles_density to change the amount of particles.
- A piece of hack that gets rid of lame jump tables in opengl32.dll - use gl_hack command to activate it. Be careful, it might cause very bad crashes. gl_hack command always displays number of functions fixed unless it fails to fix anything.

Known problems:
- windowed mode isn't in the menu yet and might have some lame side effects
- bad things happen when PPQuake loses focus - don't try to alt-esc or alt-tab
- Weapon model conversion occasionally messes up; if after intalling a new mod weapons look crappy in it, delete glquake directories in id1 directory and in mod's main directory. (this one comes from the original glQuake)

I took it here http://pp666.cjb.net/

I got in touch with Peter Pawlowski, he said he wiped his sources years ago. Also said it might be hiding on some old drives, not a very good chance though. Lets keep our fingers crossed and maybe he will deliver some PPQuake to us.

Reply 58 of 59, by marxveix

User metadata
Rank Member
Rank
Member
Radeux wrote on 2023-04-27, 15:45:
marxveix wrote on 2021-03-19, 18:03:
There is GLQuake port that uses 3Dnow, but where to get PPQuake now? Info from russian, ixbt site: https://translate.google.com/ […]
Show full quote

There is GLQuake port that uses 3Dnow, but where to get PPQuake now?
Info from russian, ixbt site:
https://translate.google.com/translate?sl=aut … gi?id%3D25:3024

PPQuake v0.41 (c) 2000 Peter Pawlowski <piotrpw@polbox.com>
Based on glQuake source code by id Software.

Most important changes from original glQuake:
- KickAss 3DNow! (tm) support - PPQuake will fail to run on a non-3DNow! CPU. If you use an emulator, run with -nocpuid to ignore CPUID results.
- vid_restart command - for now it just restarts whole program and tries to save / reload game if needed - pretty lame but better than nothing
- video menu works (see also: vid_restart)
- some variables are stored in windows registry because they're needed before config.cfg is loaded
- default console width is equal to screen width
- primary sound buffer is on by default, you can disable it by -noprimarysound
- gl_no8bit variable; nonzero value causes all textures to be loaded as 16-bit (takes effect after restarting)
- cool Quake2-like smooth model animation (3DNow! powered) - you can disable / enable it with gl_interpolate; models weren't designed for it and some of them (like nailguns) look crappy. Movement interpolation still doesn't work in multiplayer mode.
- weapon_cur command - prints current weapon number
- weapon_int command - syntax: "weapon_int x y" where x is weapon number returned by weapon_cur and y is interpolation switch; these settings aren't stored anywhere so put it in some autoexec.cfg to see what you want; type "weapon_int x" to see current interpolation state of given weapon.
- gl_fog - 0 is off, positive values ​​control density; based on id's experimantal code with a few modifications (GL_EXP instead of GL_LINEAR); probably won't work with most MiniGLs (but works with 3Dfx's one); use gl_fog_r, gl_fog_g and gl_fog_b to change color
- gl_waterwarp variable - nonzero value enables water warping effects; changes take effect after reloading a level.
- $% ^ * @! dedicated server works
- gl_foggywater (experimantal) - disables water transparency and adds some new special effects underwater.
- Winamp control commands - wa_play, wa_enqueue, wa_open, wa_next, wa_prev, wa_stop, wa_pause; Winamp needs to be run before ppQuake.
- ModPlug player controls - mp_play, mp_enqueue, mp_stop, etc.
- Tons of tweaks in particle system - use r_particles_density to change the amount of particles.
- A piece of hack that gets rid of lame jump tables in opengl32.dll - use gl_hack command to activate it. Be careful, it might cause very bad crashes. gl_hack command always displays number of functions fixed unless it fails to fix anything.

Known problems:
- windowed mode isn't in the menu yet and might have some lame side effects
- bad things happen when PPQuake loses focus - don't try to alt-esc or alt-tab
- Weapon model conversion occasionally messes up; if after intalling a new mod weapons look crappy in it, delete glquake directories in id1 directory and in mod's main directory. (this one comes from the original glQuake)

I took it here http://pp666.cjb.net/

I got in touch with Peter Pawlowski, he said he wiped his sources years ago. Also said it might be hiding on some old drives, not a very good chance though. Lets keep our fingers crossed and maybe he will deliver some PPQuake to us.

Good to know, i hope PPQuake still pops out somehow or someone else have used it and has it and shares it, fingers crossed indeed, i did try to send Peter email about PPQuake. I hope 3DNow Quake1 build exist to test it out with my socket7 AMD K6-2+ or slot1 AMD k7 machine.

31 different MiniGL/OpenGL Win9x files for all Rage 3 cards: Re: ATi RagePro OpenGL files