VOGONS

Common searches


First post, by Guest

User metadata

In the README.TXT file of DOSBox 0.63 its mentioned that in order to set the optimal cycles number is to load the Task-Manager when the game is actually running and increase cycles up until the cpu-usage is above 95% (more or less) .

I have an MSI Noe3 , P4 3.0 Ghz with Fx5200 128M Msi video card and an XP Service Pack 2. And i can only go up to about 51% cpu-uasge. when i continue to increase the cycles , the game become slower and the cpu-usage don't go higher than 51%.

Any recommandations ??

Reply 1 of 10, by avatar_58

User metadata
Rank Oldbie
Rank
Oldbie

Try this with a more CPU intensive game like One Must Fall 2097 (thats the game I tested) or any late SVGA game. If the game ends up running perfectly at lower cycles then there is no need to increase them anymore. Many games really don't even need higher than 9000 cycles to work perfectly.

Reply 2 of 10, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Your CPU supports Hyperthreading which makes Windows think there are actually 2 processors in your system. Since dosbox is only single threaded it can only use 1 cpu at the time, therefore 50% is the maximum you will get.

Reply 4 of 10, by gulikoza

User metadata
Rank Oldbie
Rank
Oldbie

Heh...as long as HypersomethingTM makes a good marketing trick 😁
But really, with the introduction of dual core procs later this year, dosbox could really benefit from multithreading. I know graphics was threaded and this was later removed because of some problems. I've also done some experiments to thread my D3D code, but the results on the single (not hyperthreaded 🤣) CPU are roughly the same (probably due to small 2-3ms update per frame which is somewhat inefficient for thread switching). But what about cpu core? It is the most (?) calculation intensive. What about having 1 CPU run the cpu emulation and the other everything else? 😀

Reply 5 of 10, by avatar_58

User metadata
Rank Oldbie
Rank
Oldbie

Hey, I'm not here to debate the good use of hyperthreading..I'm just admitting it never crossed my mind. I have only used P4's occasionally. Personally I find my AMD faster but then again I don't load it full of spyware like they do.. 😜

Reply 6 of 10, by Reckless

User metadata
Rank Oldbie
Rank
Oldbie

Hyperthreading is indeed all marketing and next to no gain. However as gulikoza pointed out dual core CPUs are not far away which will bring real SMP systems to the masses. I've ran a dual CPU Xeon at work for a few years now and it's suprising to see how some modern software doesn't use it!

IIRC parts of Quake 3's engine was tested on different threads at some point but I think JC said the gain wasn't worth it. I don't know what the final product shipped with though.

avatar_58, it's easy to criticise or berate what you've never owned but this really adds nothing to the discussion.

Edit: Here's the .plan entry from JC regarding threading and MP systems:

Name: John Carmack Email: johnc@idsoftware.com Description: Programmer Project: Quake Arena Last Updated: 09/10/ […]
Show full quote

Name: John Carmack
Email: johnc@idsoftware.com
Description: Programmer
Project: Quake Arena
Last Updated: 09/10/1998 02:43:36 (Central Standard Time)
-------
9/10/98
-------

I recently set out to start implementing the dual-processor acceleration
for QA, which I have been planning for a while. The idea is to have one
processor doing all the game processing, database traversal, and lighting,
while the other processor does absolutely nothing but issue OpenGL calls.

This effectively treats the second processor as a dedicated geometry
accelerator for the 3D card. This can only improve performance if the
card isn't the bottleneck, but voodoo2 and TNT cards aren't hitting their
limits at 640*480 on even very fast processors right now.

For single player games where there is a lot of cpu time spent running the
server, there could conceivably be up to an 80% speed improvement, but for
network games and timedemos a more realistic goal is a 40% or so speed
increase. I will be very satisfied if I can makes a dual pentium-pro 200
system perform like a pII-300.

I started on the specialized code in the renderer, but it struck me that
it might be possible to implement SMP acceleration with a generic OpenGL
driver, which would allow Quake2 / sin / halflife to take advantage of it
well before QuakeArena ships.

It took a day of hacking to get the basic framework set up: an smpgl.dll
that spawns another thread that loads the original oepngl32.dll or
3dfxgl.dll, and watches a work que for all the functions to call.

I get it basically working, then start doing some timings. Its 20%
slower than the single processor version.

I go in and optimize all the queing and working functions, tune the
communications facilities, check for SMP cache collisions, etc.

After a day of optimizing, I finally squeak out some performance gains on
my tests, but they aren't very impressive: 3% to 15% on one test scene,
but still slower on the another one.

This was fairly depressing. I had always been able to get pretty much
linear speedups out of the multithreaded utilities I wrote, even up to
sixteen processors. The difference is that the utilities just split up
the work ahead of time, then don't talk to each other until they are done,
while here the two threads work in a high bandwidth producer / consumer
relationship.

I finally got around to timing the actual communication overhead, and I was
appalled: it was taking 12 msec to fill the que, and 17 msec to read it out
on a single frame, even with nothing else going on. I'm surprised things
got faster at all with that much overhead.

The test scene I was using created about 1.5 megs of data to relay all the
function calls and vertex data for a frame. That data had to go to main
memory from one processor, then back out of main memory to the other.
Admitedly, it is a bitch of a scene, but that is where you want the
acceleration...

The write times could be made over twice as fast if I could turn on the
PII's write combining feature on a range of memory, but the reads (which
were the gating factor) can't really be helped much.

Streaming large amounts of data to and from main memory can be really grim.
The next write may force a cache writeback to make room for it, then the
read from memory to fill the cacheline (even if you are going to write over
the entire thing), then eventually the writeback from the cache to main

memory where you wanted it in the first place. You also tend to eat one
more read when your program wants to use the original data that got evicted
at the start.

What is really needed for this type of interface is a streaming read cache
protocol that performs similarly to the write combining: three dedicated
cachelines that let you read or write from a range without evicting other
things from the cache, and automatically prefetching the next cacheline as
you read.

Intel's write combining modes work great, but they can't be set directly
from user mode. All drivers that fill DMA buffers (like OpenGL ICDs...)
should definately be using them, though.

Prefetch instructions can help with the stalls, but they still don't prevent
all the wasted cache evictions.

It might be possible to avoid main memory alltogether by arranging things
so that the sending processor ping-pongs between buffers that fit in L2,
but I'm not sure if a cache coherent read on PIIs just goes from one L2
to the other, or if it becomes a forced memory transaction (or worse, two
memory transactions). It would also limit the maximum amount of overlap
in some situations. You would also get cache invalidation bus traffic.

I could probably trim 30% of my data by going to a byte level encoding of
all the function calls, instead of the explicit function pointer / parameter
count / all-parms-are-32-bits that I have now, but half of the data is just
raw vertex data, which isn't going to shrink unless I did evil things like
quantize floats to shorts.

Too much effort for what looks like a reletively minor speedup. I'm giving
up on this aproach, and going back to explicit threading in the renderer so
I can make most of the communicated data implicit.

Oh well. It was amusing work, and I learned a few things along the way.

Reply 8 of 10, by DosFreak

User metadata
Rank l33t++
Rank
l33t++

Indeed, most server processor intensive applications actually recommend disabling HyperThreading for best performance. 2 processor intensive threads duking it out for resources.....HyperThreading is really only good for minimally intensive multitasking, real power comes from multiple cores, not faking it.

How To Ask Questions The Smart Way
Make your games work offline

Reply 9 of 10, by Darkfalz

User metadata
Rank Member
Rank
Member

I believe HyperTransport has to do with memory throughput, whereas HyperThreading is the multithreading thing. Ie. they're not the same thing by any stretch.

HyperThreading really healps running mulitple stuff at once, and doesn't hurt in running single threaded stuff, so while it's a "gimmick" some of the time, it's also useful others.

Reply 10 of 10, by Freddo

User metadata
Rank Oldbie
Rank
Oldbie
gulikoza wrote:

Heh...as long as HypersomethingTM makes a good marketing trick 😁
But really, with the introduction of dual core procs later this year, dosbox could really benefit from multithreading. I know graphics was threaded and this was later removed because of some problems. I've also done some experiments to thread my D3D code, but the results on the single (not hyperthreaded 🤣) CPU are roughly the same (probably due to small 2-3ms update per frame which is somewhat inefficient for thread switching). But what about cpu core? It is the most (?) calculation intensive. What about having 1 CPU run the cpu emulation and the other everything else? 😀

I agree 😀 It would be very nice if DOSBox became a good multithreaded software. My next computer will most likely be dual core and it would be very cool if DOSBox could take advantage of it.

Not something I suspect will happen, at least not in a near future, though, as it was removed because some kind of problem.