VOGONS


Fun with ICL8

Topic actions

First post, by Ictoagh

User metadata
Rank Newbie
Rank
Newbie

I've been fooling around with the intel compiler lately, and ...

I'm fairly impressed. It seems to add a nice speed boost to the emulation. Basically, I'm just trying the features of the compiler using DOSBOX to get a feel for what it's capable of, so far, I've made a build for the P4, using SSE2 in vectorizable loops (There's a few in the MT32 synth sources), and also set up for parallelization, using the P4.

I'm thinking seriously about rebuilding the libraries underneath with the intel compiler, and using those.

It's not the fastest compiler in the world, though, but it produces fast code ... 😉

Reply 2 of 8, by Ictoagh

User metadata
Rank Newbie
Rank
Newbie
Tristan wrote:

Have you tried using the profile directed compilation option? Reports are
that it can make quite a difference to a programs performance (30% increase is not so uncommon).

Haven't tried it with DOSBox, yet, but I was meddling with the profile-driven and parallelism with POV-Ray, but I didn't really get much gains. (Benchmark may have been 5-10min better, but that's about it)

Reply 3 of 8, by Ictoagh

User metadata
Rank Newbie
Rank
Newbie

Just tried the profile guided optimization....

Compling for P4 only, using Parallelism, turning off /Op adding /O3 on my P4 2.6 HT, I can get up to 10,000 instructions per loop, with no skipping on the various Sierra games that have MT-32 support! 😁

Niiiice...

Reply 4 of 8, by Tristan

User metadata
Rank Newbie
Rank
Newbie

It is quite amazing how sensitive virtual machines (like Dosbox) are to compiler optimisation settings. A wrote a virtual machine a while ago and discovered that by changing one small option on the compiler(GCC) I could get an average improvement of 300% in performance. It seems that there is not much instruction level parallelism in an average virtual machine hence the sensitivity to compiler settings. I was thinking of releasing binaries using the Intel compiler I am just not sure if that is legal with the free version.

Reply 5 of 8, by canadacow

User metadata
Rank Member
Rank
Member

Compling for P4 only, using Parallelism, turning off /Op adding /O3 on my P4 2.6 HT, I can get up to 10,000 instructions per loop, with no skipping on the various Sierra games that have MT-32 support! Very Happy

With optimization like that I'm going to have to try the Intel Compiler for use with the driver code! I'm going to give a shot here this weekend. I'll let you know how it goes.

Reply 6 of 8, by Ictoagh

User metadata
Rank Newbie
Rank
Newbie
canadacow wrote:

With optimization like that I'm going to have to try the Intel Compiler for use with the driver code! I'm going to give a shot here this weekend. I'll let you know how it goes.

Just a small thought, don't do straight P4 (/QxN), try /QaxN, otherwise it will not run on PIII and below systems. Also -- The parallelism is only good for Multiprocessor and hyperthreaded machines, on a normal uniprocessor P4 (pre-net burst) and earlier generation processors, and those who are not hyperthreaded, there will be some performance degradation, but it may be livable.. experiment. When I ran the profile, I ran DOSBox through a few clips of music in firehawk, then let a clip run for a bit with a high CPU count and 0 frameskip. It WILL run frighteningly slow. With firehawk, there seems to be some trouble panning the drums, but other games seem to work just fine (Maybe they have the panning set up a certain way for the game, but, I dunno..

Overall, I plan to purchase the ICL compiler after experimenting with it some, but at $400, I'll have to wait a while.