VOGONS


First post, by 386SX

User metadata
Rank l33t
Rank
l33t

Hello,

lately I'm testing different mini itx systems (compared to some AM2/AM3 DDR2 configs) like the AMD E-350 with its Radeon HD 6310 iGPU (now in linux later in win). While the iGPU seems ok what instead didn't surprise me at all is the CPU, which I was expecting faster than the old dual Atom while instead in some test I'd say it feels slow (and running at high temperatures too).
So I was wondering if to get more speed from it, some apps might be compiled, in linux I mean manually compile with different compilation flags, to support the SSE4a and those last 3dnow instructions. Which maybe might help into some fpu intensive modern tasks like encoding/decoding.
Thanks

Edit: grammar corrections and simpler question

Last edited by 386SX on 2022-07-18, 10:38. Edited 4 times in total.

Reply 2 of 17, by TrashPanda

User metadata
Rank l33t
Rank
l33t

https://en.wikipedia.org/wiki/3DNow!

AMD has dropped 3Dnow since ~2010 except for the two instructions listed here and even before that 3Dnow was rarely used with developers preferring to use SSE instructions instead as they were supported by both at the time. You would have to go back to Socket7 and Slot/Socket A CPUs to find full 3Dnow support if you really wanted to try compiling software that could use it fully.

Reply 3 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t

I mean most apps usually used for example in linux I suppose might be compiled for SSE3, SSE4.1, SSE4.2 or whatever. I imagine that the SSE4a or some last 3dnow features were not used so I was wondering if some manual compilation might be useful to enable if supported those instructions. The E-350 should have the latest 3dnowprefetch instructions and just wondering if only SSE3 in such old cpu were realistically probably used.

Reply 6 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t

Thanks. The A8 cpu still supported last 3dnow feature, I wonder if together with the SSE4a (I imagine not many supported these at all with SSE4.1 already existing) might help somewhere to have a better performing sw.
For example once I tried some fpu intensive app on ARM the different Neon version flags used to config the compilation resulted in very visible speed differences.

Reply 7 of 17, by TrashPanda

User metadata
Rank l33t
Rank
l33t
386SX wrote on 2022-07-17, 10:24:

Thanks. The A8 cpu still supported last 3dnow feature, I wonder if together with the SSE4a (I imagine not many supported these at all with SSE4.1 already existing) might help somewhere to have a better performing sw.
For example once I tried some fpu intensive app on ARM the different Neon version flags used to config the compilation resulted in very visible speed differences.

Not even sure it would be worth the effort in the long run.

Reply 8 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t
TrashPanda wrote on 2022-07-17, 10:27:
386SX wrote on 2022-07-17, 10:24:

Thanks. The A8 cpu still supported last 3dnow feature, I wonder if together with the SSE4a (I imagine not many supported these at all with SSE4.1 already existing) might help somewhere to have a better performing sw.
For example once I tried some fpu intensive app on ARM the different Neon version flags used to config the compilation resulted in very visible speed differences.

Not even sure it would be worth the effort in the long run.

I imagine the difference might be thin but it is always bad when possible features were not supported by sw. If I search for that AMD specific instructions not much is said about it or the sw that supported it. It seems it was all about SSE4.x in those times.

Reply 9 of 17, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie

I had to replace my beloved Phenom II 1100T as the main gaming computer precisely because of the lack of instructions. Until around 2016 at the latest, video games did not require SSE 4.1 or other instructions that the Phenom does not have and precisely when the six cores began to be relevant, everything was lost due to the lack of instructions. AMD made the biggest mistakes in its history during those years. No AMD CPU from that time is useful for current tasks outside of office and multimedia tasks because in addition to the lack of instructions is the lower IPC.
but I think they're great for retro rigs since the good XP support might have gone a bit further. And Windows 7 is also good.
Currently I only use the Phenom II 1100T for office tasks and online procedures, I will never get rid of it. 😀

Reply 10 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2022-07-17, 10:36:

I had to replace my beloved Phenom II 1100T as the main gaming computer precisely because of the lack of instructions. Until around 2016 at the latest, video games did not require SSE 4.1 or other instructions that the Phenom does not have and precisely when the six cores began to be relevant, everything was lost due to the lack of instructions. AMD made the biggest mistakes in its history during those years. No AMD CPU from that time is useful for current tasks outside of office and multimedia tasks because in addition to the lack of instructions is the lower IPC.
but I think they're great for retro rigs since the good XP support might have gone a bit further. And Windows 7 is also good.
Currently I only use the Phenom II 1100T for office tasks and online procedures, I will never get rid of it. 😀

I also find very interesting those Phenom II processors I was even thinking searching for a cheap one but not that cheap nowdays. But still the lack of these instructions seems to really be a weight instead on these "low" power designed cpu (just like the Atom but resulting in a different speed experience, not faster but not even always slower).
So maybe some app had the support for it but requiring manual compilation to support it (a theory), maybe with no different results or maybe even unstable who knows. But just wondering if some test were made comparing SSSE3 vs SSE4a vs SSE4.1 and it would be worth to compile few software for it.

Reply 11 of 17, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie

An example that sometimes the issue of the instructions is more a matter of carelessness, you will see if you investigate what happened with Resident Evil 7 and Final Fantasy XV that at launch did not work with "old" processors and then received a patch than to fix the problem. In the case of Resident Evil 7, I played it with The Phenom II and an HD7970 without problems, in the case of Final Fantasy XV, I tried it and it was playable but the low IPC of the processor was noticeable since it did not use the six cores, tested on an i7 920 and much better.

Reply 12 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t
Hoping wrote on 2022-07-17, 11:09:

An example that sometimes the issue of the instructions is more a matter of carelessness, you will see if you investigate what happened with Resident Evil 7 and Final Fantasy XV that at launch did not work with "old" processors and then received a patch than to fix the problem. In the case of Resident Evil 7, I played it with The Phenom II and an HD7970 without problems, in the case of Final Fantasy XV, I tried it and it was playable but the low IPC of the processor was noticeable since it did not use the six cores, tested on an i7 920 and much better.

That's why I often wonder if (just like I was quoting the early 3dnow! set which did make a difference when and how used) some of these unused istructions like that K10 cpu SSE4a which I think to understand are only four more instructions after the supported and common SSE3 set (none seems to quote the SSSE3 much supported too) could make some difference or not. Maybe I expect too much but as said once I tried an x86 emulator for ARM compiled specifically for the highest existing Neon VFP version of an armv7 quad core SoC, the speed difference of the final build compared to using the previous supported version, was very interesting. Something like the 3dnow version of Quake II for example applied to other fpu intensive sw that are open source and can be compiled with different features on or off.

Of course not comparable anyway to the 47 more instructions of the SSE4.1. 😉

Reply 13 of 17, by Hoping

User metadata
Rank Oldbie
Rank
Oldbie

Some time ago, I also created a thread similar to this one, because it caused me frustration to think that when it was suddenly important to have more than four cores, the Phenom II could not take advantage of it because of the lack of instructions. A six-core processor that was released when some programs and games still had problems with processors with more than two cores.
But I think we all see how everything works today as far as software is concerned, optimization is something of the past, possibly due to the great increase in the complexity of software and hardware.
Although I'm not a programmer so I'm not an expert, I do remember writing Basic programs on my CPC 6128 adding parts in "machine code" because Locomotive Basic was an interpreted language, so it was much slower.

Reply 14 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t

Old softwares (but I imagine modern software too) supported at the same time multiple cpu instructions and that might improve (maybe only in some scenario, not always) or mantain the final speed (like old mpeg2 sw decoders). But still it is interesting when the whole list of feature are supported. 3DNow! in last evolution maybe were already too old but this SSE4a wasn't and while not enough why not use it if possible. 😀

Reply 15 of 17, by RetroGamer4Ever

User metadata
Rank Oldbie
Rank
Oldbie

Obviously, English isn't your first language, so I'll just try to figure out what you're saying. It seems like you're wondering what you can do with old hardware that supports 3DNOW! functionality. The answer is "pretty much nothing". The instruction set and it's functionality was removed in most of the game titles that used it - there weren't many to begin with, but one you can use with it is SHOGO: Mobile Armor Division, which had an experimental renderer that used it - during the XP era patching and removed entirely from Linux, some years back. It isn't supported at all by Windows 10, as far as I know.

Reply 16 of 17, by brian105

User metadata
Rank Member
Rank
Member

Effectively no compiler actually includes options for SSE4a optimizations because of how small of an improvement it would make. Compare the number of added instructions for SSE4.1 vs SSE4a. And 3DNow! is far too old and superseded by SSE, which is better in every way.
https://en.wikipedia.org/wiki/SSE4

Presario 5284: K6-2+ 550 ACZ @ 600 2v, 256MB PC133, GeForce4 MX 440SE 64MB, MVP3, Maxtor SATA/150 PCI card, 16GB Sandisk U100 SATA SSD
2007 Desktop: Athlon 64 X2 6000+, Asus M2v-MX SE, Foxconn 7950GT 512mb, 4GB DDR2 800, Audigy 2 ZS, WinME/XP

Reply 17 of 17, by 386SX

User metadata
Rank l33t
Rank
l33t

English isn't my first language and sometimes it takes more time and words to better explain the question sorry. Anyway the answer is clear. I was basically wondering if compiling some open source software with the specific usage of SSE4a (AND eventually adding also the last 3dnow ones probably never used) if the compiler supported it and the sw too, may result in a better usage of these old cpu/fpu which were not much fast even in their times.

But I forgot the SSE4a instructions number added to the previous SSSE3, I was thinking more. Of course I wasn't expecting 3dnow last instructions could change a lot, maybe a specific compilation to use all those instructions might result in a specific compiled sw with a bit better speed, not only in games but in modern softwares too. Real time codec decoding/encoding would be an example. The SSE4.1 really make a big difference there.