VOGONS


pentium mmx vs pentium

Topic actions

First post, by computergeek92

User metadata
Rank Oldbie
Rank
Oldbie

How much faster is a Pentium MMX 166 vs a Classic Pentium 166? What about comparing 200MHz versions?

Dedicated Windows 95 Aficionado for good reasons:
http://toastytech.com/evil/setup.html

Reply 1 of 30, by GeorgeMan

User metadata
Rank Oldbie
Rank
Oldbie

The double L1 cache the MMX has, makes more difference than the MMX instructions, when we talk about ~1997-1998 era.

A Pentium MMX 166MHz performs about the same as a Pentium 200MHz. It all depends on the application used though.

1. Athlon XP 3200+ | ASUS A7V600 | Radeon 9500 @ Pro | SB Audigy 2 ZS | 80GB IDE, 500GB SSD IDE2Sata, 2x1TB HDDs | Win 98SE, XP, Vista
2. Pentium MMX 266| Qdi Titanium IIIB | Hercules graphics & Amber monitor | 1 + 10GB HDDs | DOS 6.22, Win 3.1, 95C

Reply 3 of 30, by PhilsComputerLab

User metadata
Rank l33t++
Rank
l33t++

Some results for Voodoo 2 and SLI: http://www.philscomputerlab.com/voodoo-2-and- … ng-project.html

Depending on what you're running your mileage will vary.

YouTube, Facebook, Website

Reply 4 of 30, by Scali

User metadata
Rank l33t
Rank
l33t
kixs wrote:

I remember it being a bit slower in some operations - but I could be wrong 😉

That is also what I recall. The reason for this is that they had to add an extra pipeline stage to incorporate the new MMX extensions.
As far as I recall, they were about the same in most cases, certainly not like a PMMX166 being as fast as a P200, when MMX wasn't used. But cache-heavy stuff may benefit the PMMX more.

Last edited by Scali on 2015-09-15, 08:44. Edited 1 time in total.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 5 of 30, by GeorgeMan

User metadata
Rank Oldbie
Rank
Oldbie

Well, if you look at Phil's links above, in games, the MMX 166 actually performs a little better than a Classic P200.

1. Athlon XP 3200+ | ASUS A7V600 | Radeon 9500 @ Pro | SB Audigy 2 ZS | 80GB IDE, 500GB SSD IDE2Sata, 2x1TB HDDs | Win 98SE, XP, Vista
2. Pentium MMX 266| Qdi Titanium IIIB | Hercules graphics & Amber monitor | 1 + 10GB HDDs | DOS 6.22, Win 3.1, 95C

Reply 6 of 30, by Scali

User metadata
Rank l33t
Rank
l33t
GeorgeMan wrote:

Well, if you look at Phil's links above, in games, the MMX 166 actually performs a little better than a Classic P200.

Games are best-case here, they probably use a lot of hand-optimized MMX code, and also benefit most from cache.
I don't think this is representative of other applications, or perhaps even representative of running software-rendered games.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 7 of 30, by GeorgeMan

User metadata
Rank Oldbie
Rank
Oldbie

The L1 makes more difference than you think because it doesn't need super-specific optimization, in contrast to the MMX instruction support.

I agree that this is the best-case, BUT in this case the P166 MMX flies ABOVE the P200. I said that on average, it's about the same between those two.
Somewhere between 0% better and 20% better (a P166 to a P166MMX), and somewhere inside the +10% to +30% margin in more advanced software (like 3d games).

Keep in mind that a P200 clocks 20,5% higher than a P166 MMX and that most uses for old PCs nowadays are games, and my 1st answer in this topic getts completely explained. 😀

1. Athlon XP 3200+ | ASUS A7V600 | Radeon 9500 @ Pro | SB Audigy 2 ZS | 80GB IDE, 500GB SSD IDE2Sata, 2x1TB HDDs | Win 98SE, XP, Vista
2. Pentium MMX 266| Qdi Titanium IIIB | Hercules graphics & Amber monitor | 1 + 10GB HDDs | DOS 6.22, Win 3.1, 95C

Reply 8 of 30, by Scali

User metadata
Rank l33t
Rank
l33t
GeorgeMan wrote:

The L1 makes more difference than you think because it doesn't need super-specific optimization, in contrast to the MMX instruction support.

The thing is though, if you optimize your code tightly for a regular Pentium, your 16K L1 cache may be enough, most of the time.

GeorgeMan wrote:

Somewhere between 0% better and 20% better (a P166 to a P166MMX), and somewhere inside the +10% to +30% margin in more advanced software (like 3d games).

It can actually be slower, because there is more latency for certain instructions because of the longer pipeline as I said.
I think you also have to make a big distinction between software 3d and hardware accelerated 3d.
Hardware acceleration probably benefits a lot from MMX to pack and stream the data to the videochip.
Software renderers are less 'streaming' in that sense. They could benefit from a handoptimized MMX rasterizer (especially when rendering RGB, but Pentiums were too slow for that really), but if you run a regular rasterizer (eg Doom/Quake in software mode), I bet the difference isn't that large.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 9 of 30, by kixs

User metadata
Rank l33t
Rank
l33t

Good old Toms Hardware review of Pentium MMX:
http://www.tomshardware.com/reviews/pentium-m … tations,19.html

The new Pentium MMX hardly shows any improvement for DOS Gamers. An increase of 2.5% is hardly worth mentioning.

Requests are also possible... /msg kixs

Reply 10 of 30, by Scali

User metadata
Rank l33t
Rank
l33t
kixs wrote:

Good old Toms Hardware review of Pentium MMX:
http://www.tomshardware.com/reviews/pentium-m … tations,19.html

Ah yes, they tested Quake in software mode, and indeed, little difference.
I guess it depends very much on how you used your Pentium/PMMX. I was writing optimized software renderers in assembly back in the day. Slightly before the big bang of 3d hardware acceleration. I never actually coupled a Pentium or Pentium MMX to a 3d accelerator I think. Moved to PII by that time.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 11 of 30, by Thandor

User metadata
Rank Member
Rank
Member

I also did a few benchmarks: thandor.net - Pentium MMX 200. Red is the MMX CPU, and grey is the regular Pentium. Both at 200MHz with little differences.

As mentioned by others: back in it's day MMX wasn't widely used, so the L1-cache made the (little) difference.

thandor.net - hardware
And the rest of us would be carousing the aisles, stuffing baloney.

Reply 12 of 30, by sunaiac

User metadata
Rank Oldbie
Rank
Oldbie

In Phil's benchmark suite :

85632c0c95ca691945a32a2c43651346217944f3.png

R9 3900X/X470 Taichi/32GB 3600CL15/5700XT AE/Marantz PM7005
i7 980X/R9 290X/X-Fi titanium | FX-57/X1950XTX/Audigy 2ZS
Athlon 1000T Slot A/GeForce 3/AWE64G | K5 PR 200/ET6000/AWE32
Ppro 200 1M/Voodoo 3 2000/AWE 32 | iDX4 100/S3 864 VLB/SB16

Reply 15 of 30, by PhilsComputerLab

User metadata
Rank l33t++
Rank
l33t++

Funny 🤣

http://www.philscomputerlab.com/486-benchmark-suite.html

It doesn't include speedsys though. This is the suite used for this project if you want to compare other people's systems: Phil's Ultimate VGA Benchmark Database Project

YouTube, Facebook, Website

Reply 16 of 30, by carlostex

User metadata
Rank l33t
Rank
l33t
Scali wrote:

That is also what I recall. The reason for this is that they had to add an extra pipeline stage to incorporate the new MMX extensions.
As far as I recall, they were about the same in most cases, certainly not like a PMMX166 being as fast as a P200, when MMX wasn't used. But cache-heavy stuff may benefit the PMMX more.

^ This.

Reply 17 of 30, by PhilsComputerLab

User metadata
Rank l33t++
Rank
l33t++

At least Voodoo II cards under Windows perform better on the MMX. Not sure if it's a driver optimization. Here the MMX166 can beat the P200. Not sure about other cards, like Nvidia / ATI.

The other benefit is that the MMX draws less power, runs a bit cooler, but also less boards work with it.

YouTube, Facebook, Website

Reply 18 of 30, by Tertz

User metadata
Rank Oldbie
Rank
Oldbie
philscomputerlab wrote:

It doesn't include speedsys though.

486 benchmark suite and Ultimate VGA Benchmark have no speedsys and have no the table above. Hence the mystery stays.

awgamer wrote:

Phil. (I couldn't resist:)

You wisdom is beyond your years.

DOSBox CPU Benchmark
Yamaha YMF7x4 Guide

Reply 19 of 30, by idspispopd

User metadata
Rank Oldbie
Rank
Oldbie

Common knowledge would be that PMMX 166 roughly compares to P1 200 on average.
At that speed the bus bandwidth starts to become a bottleneck so a P1 200 is less than 20% faster than a P1 166.
Asides from the larger cache (16kB code + 16kB data instead of 8kB code and 8kB data) the MMX chip

  • uses the more advanced branch prediction from the PPro
  • has 4 instead of 2 write buffers
  • has one pipeline step more (already mentioned) - this doesn't mean higher performance, on the contrary
  • has a return-stack (4 levels)

The larger cache probably has the biggest influence, though.

Quake is heavily optimized for the original Pentium. I suppose that includes cache size so the larger cache shouldn't have a too big influence here.

Scali wrote:

Hardware acceleration probably benefits a lot from MMX to pack and stream the data to the videochip.

I thought that 3D hardware acceleration means that the program is doing lots of floating point calculations. Since it is somewhat expensive to switch between floating point and MMX modes (that's one of the improvements in SSE) MMX shouldn't be much help here. I thought that MMX is much more useful for sound processing (eg. Unreal), or maybe software textured 3D (there is a special version of POD IIRC). But since Scali does lots of 3D programming he might know what he's talking about.