VOGONS


First post, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Now with 4K DirectShow results!

Ever wonder how well a particular CPU would handle decoding a specific video format without any help from the GPU or decoder card? If you have, THIS THREAD IS FOR YOU!
Video playback on computers has fascinated me since the early 90s. Over the years I've made notes on how well different CPUs handled video playback on their own. These notes are finally coming together in handy dandy charts!

A few notes:
-All of the 16x9 video streams have a 1.85:1 aspect ratio. I did not use any 2.40:1 AR video because, while they are encoded at the same 1280x720 or 1920x1080 resolution as the 1.85:1 stuff, the black bars do take quite a bit of work off the decoder.
-This is an ongoing project. As I find more hardware (e.g. Cyrix) I'll be adding them.
-Different CPUs will be used in different charts. For example, PIIs will be left out of VP9 decode tests, and mobile i7s will be left out of Xvid tests.
-If parts of the charts appear to be cut off, use ctrl+mouse wheel to zoom out.
-For reference, I've also added some hardware-accelerated entries.
-In the charts, S/W indicates software decoding. All hardware features were disabled in the player/decoder.
-In the charts, OC indicates that the processor was overclocked. Oops!
-In the charts, SC indicates single-channel memory. DC is dual-channel memory. QC is quad-channel memory.
-I own all of the DVDs and Blu-rays used for this test. No piracy here!

I will be adding charts for the following video formats in the very near future. I already have most of the data jotted down, I just need to create tables for them.
Xvid at 640x480, 1.7 Mb/s, ripped from NTSC DVD
WMV at 640x480, 1.7 Mb/s, ripped from NTSC DVD
Xvid at 1280x720, 5MB/s. Ripped from Blu-ray, downscaled to 720p, and transcoded to Xvid

H.264 at 852x480, 2.5MB/s. Ripped from Blu-ray, downscaled to 480p, and re-encoded with x264
H.264 at 1280x720, 4.5MB/s. Ripped from Blu-ray, downscaled to 720p, and re-encoded with x264

MPEG2 at 1920x1080, 34Mb/s. Ripped from Blu-Ray without any re-encoding
VC-1 at 1920x1080, 28Mb/s. Ripped from Blu-Ray without any re-encoding
H.264 at 1920x1080, 30Mb/s. Ripped from Blu-Ray without any re-encoding
H.264 at 1080i/60, 35Mb/s. Software deinterlacing and decoding. Ripped from Blu-Ray without any re-encoding.

VP9 at 480p, 720p(60), 1080p(60), 1440p, and 2160p. HTML5 YouTube player in Chrome.
Netflix at 1080p, using the HTML5 player in Chrome.

And if I ever get more 486s, cyrix, and early Pentium systems I may add an MPEG-1 table.

Here's the first chart
Here, I'm doing one of the more basic video tests: playing a regular NTSC DVD on many different configurations. The disc had an average video bit rate of 6.5mb/s with spikes up to around 8.2mb/s. Audio was Dolby AC3 5.1 at 448Kb/s. The source was progressive (24p).

QeOP9o1.png

Xvid 480p
-Xvid in AVI, 640x480p, 30fps, ~1.7Mb/s. Dolby AC3 2.0 192Kb/s
-Original source: DVD-Video, MPEG2 720x480i 4x3 NTSC
-Using the same xvid 1.2.2 decoder across all operating systems. Xvid (or, at least this version of it) does not support hardware acceleration, so all test streams are decoded in S/W.
bryBdFA.png

WMV 480p
-WMV, 640x480p, 30fps, ~1.7 Mb/s. WMA 192 Kb/s
-Original source: DVD-Video, MPEG2 720x480i 4x3 NTSC
WbkfgR1.png

Xvid 720p
-Xvid in AVI, 1280x720p, 24fps, ~5Mb/s. Dolby AC3 5.1 384Kb/s
-Original source: Blu-ray, H.264 1920x1080p 32Mb/s, DTS MA 5.1
-Using the same xvid 1.2.2 decoder across all operating systems. Xvid (or, at least this version of it) does not support hardware acceleration, so all test streams are decoded in S/W.
8L1URQS.png

H.264 480p
-H.264 Level 3.1, 852x480p, 24fps, ~2.5Mb/s. Dolby AC3 5.1 384Kb/s
-Original source: Blu-Ray, H.264 1920x1080p 32Mb/s, DTS MA 5.1
-Using CoreAVC 1.85 on XP/2K, a very fast software decoder.
-Using LAV Video Decoder 0.58.2 for software decoding on Win7. This is the version that comes with MPC-HC 1.7
U4JRZKR.png

H.264 720p
-H.264 Level 4.1, 1280x720p at 24fps. 5Mb/s average video bit rate. Dolby AC3 5.1 384Kb/s
-Original source: Blu-Ray, H.264 1920x1080p 32Mb/s, DTS MA 5.1
-Using a combination of CoreAVC and LAV video decoders on the XP machines to compare speed. LAV on Win7.
UbvOPdc.png

H.264 1080p
-Ripped from Blu-ray without re-encoding the video stream.
-H.264 Level 4.1, 1920x1080p at 24fps. 30Mb/s average video bit rate. DTS HD MA 7.1
G94Yip0.png


VC-1 1080p

-Ripped from Blu-ray. No re-encoding.
-VC-1, 1920x1080p at 24fps. 28Mbps average bit rate. Dolby TrueHD 5.1 audio.

I always thought that VC-1, based on WMV-HD, would be a little easier to decode than H.264, but that doesn't appear to be the case. It's either much harder to decode, or the software decoders just aren't as well-optimized as software H.264 decoders. It's an utter mess on single-core processors; even some of the slower dual-core CPUs had trouble with it!
s3G9apy.png

MPEG2 1080p
-Ripped from Blu-Ray. No re-encoding.
-MPEG2, 1920x1080p at 24fps. 34Mbps average video bit rate. PCM 5.1 audio at 4608 Kb/s.

MPEG2 is much easier to decode than VC1 and H.264. Most of the single core processors can even handle it.
eeFKQNI.png

H.264 1080i60
-S/W=Software deinterlacing and decoding. H/W=hardware deinterlacing and decoding.
-Ripped from Blu-ray. No re-encoding.
-H.264, 1080i60, 35Mbps. Dolby TrueHD 5.1

This is kind of an ultimate 1080 directshow test for these CPUs. Far more punishing than the regular 1080p24 stuff. It features very high bit rates (35 mb/s), software deinterlacing and decoding, higher frame rates, and of course lossless audio decoding.
itBp45B.png

Google VP9 480p - 4k60
-VP9 encoded YouTube clips, using HTML5 in Chrome 47. I only used 30 and 60 fps videos. 24 fps is too easy. 😀
-None of the video cards I used feature VP9 acceleration, so everything is being decoded by the CPU.
-I used Windows 7 on all of the machines--even the old S478 systems. Why? Because Chrome's performance--particularly video playback--sucks under WinXP. I believe Chrome's graphics routine simply isn't optimized for WinXP any more.
-Why no P3 or Athlon XP CPUs? Chrome doesn't run on processors without SSE2. IE and Firefox will still work on older CPUs, but YouTube streams H.264 to those browsers.
-I have a couple of AMD APU based laptops I'm in the process of testing out. I'll be adding them to the list very soon.

One thing I noticed with the slower systems is that the initial "surge" of data that occurs when a video starts playing burns quite a few CPU cycles. This stresses many of the single core and slower dual-core CPUs, causing them to drop frames. I've measured downstream activity as high as 40-50Mb/s during the first 30 seconds of playback. So, to give the slower systems a better chance, I paused each video for 30 seconds before beginning the playback test. To keep the benchmarks consistent, I did this on every machine.

NOTE: The initial tests were performed with Chrome 47. The recently added CPUs were benchmarked using Chrome 48 and 50. However, HTML5 video performance between 47 and 50 seems to be identical.
If you can't see all of the results, hold down CTRL and use your scroll wheel to zoom out the page.
cGVjrgu.png

MPEG-1 at 240p
-VCD compliant MPEG-1 video at 352x240, 30fps, ~1.2 Mb/s. Audio track is stereo MPEG audio at 224Kb/s
-Original source: DVD-Video, MPEG2 720x480i 4x3 NTSC, AC3 2.0
-I couldn't get PowerDVD to run on the 486 (probably uses Pentium instructions) so I used a tiny (under 1MB) program called Roxio VCD Player on both of the Win95 machines. The newer systems used PowerDVD 4.0 with hardware video acceleration disabled.
NbbILjk.png

DirectShow 4K Decoding Performance Part 1: H.264 at 24 fps
-H.264 Level 5.1, 3840x2160/24p. 58 Mb/s average video bit rate. Dolby Digital 5.1 @ 640kb/s
-Original source: Me playing GTA V at 4K. 😊
UmZoaMB.png

DirectShow 4K Decoding Performance Part 2: HEVC at 60 fps
-HEVC 10-bit, 3840x2160@60fps. Average bit rate: 62.5 mb/s. Dolby Digital 5.1 @ 640kb/s
This is by far the most brutal DirectShow test I'll be doing. HEVC is much harder to decode than H.264, and at 2160p/60fps only one of the CPUs could handle it. Even the 4GHz Phenom X6 had trouble here. Software HEVC decoding appears to benefit hugely from SSE4 support. I wish I had some LGA1366 equipment I could benchmark. I remember the first-gen i7s being not much faster than Penryn clock-for-clock, at least when they were first released. However they did have 8 threads and full SSE4 support. It would be interesting to see how well they'd perform here.

A newer version of MPC-HC was used for this test. Interestingly, MPC's memory usage ballooned to 1.4GB during playback!
HduDJzk.png

DirectShow 4K Decoding Performance Part 3: H.264 at 60 fps and 105mb/s
-H.264, 3840x2160@60fps. Average bit rate: 105 mb/s. Dolby Digital 5.1 @ 640kb/s
H.264 is so much easier to decode than HEVC that even the Core 2 Quad can handle it fairly well, despite the massive bit rate. And unlike the HEVC decoder, H.264 doesn't seem to need SSE4 for decent performance.
7appWNA.png

Questions, comments and suggestions are welcome!

Last edited by Standard Def Steve on 2016-07-04, 04:51. Edited 27 times in total.

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 2 of 48, by Tiger433

User metadata
Rank Member
Rank
Member

What version of driver you use with your Radeon 9800 Pro ?

W7 "retro" PC: ASUS P8H77-V, Intel i3 3240, 8 GB DDR3 1333, HD6850, 2 x 500 GB HDD
Retro 98SE PC: MSI MS-6511, AMD Athlon XP 2000+, 512 MB RAM, ATI Rage 128, 80GB HDD
My Youtube channel

Reply 3 of 48, by elianda

User metadata
Rank l33t
Rank
l33t

I got some questions:
What features are exactly accelerated by the Radeons in combination with powerdvd 4 ?

How do you measure the CPU usage?
How do you measure the fps?
Are the videos played from HDD? (ATA or SCSI? DMA?)
Does the graphics mode you use has full overlay+scaling support? And is the same renderer output used in Win2K/XP/7 ?
Why no direct comparison PDVD4 to MPC-HC in XP to get an overlapping data point?
Which decoding filters (name+version) are used?
Is sound output format set to stereo (2.0)?

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 5 of 48, by BSA Starfire

User metadata
Rank Oldbie
Rank
Oldbie

Interesting thread, I did try a few software DVD players on a Cyrix MII 333, ALi alladin V and RAGE PRO Turbo 8mb(on board), 512Mb SDRAM & Windows ME, was mostly a failure, the Cyrix is not a good choice for video playback.

286 20MHz,1MB RAM,Trident 8900B 1MB, Conner CFA-170A.SB 1350B
386SX 33MHz,ULSI 387,4MB Ram,OAK OTI077 1MB. Seagate ST1144A, MS WSS audio
Amstrad PC 9486i, DX/2 66, 16 MB RAM, Cirrus SVGA,Win 95,SB 16
Cyrix MII 333,128MB,SiS 6326 H0 rev,ESS 1869,Win ME

Reply 6 of 48, by Tiger433

User metadata
Rank Member
Rank
Member

On my Sony Vaio PCG-F707 Laptop with Pentium III 600 and NeoMagic 256AV+ with 3 MB Vram and AGP bus I play videos on Win98SE on VLC 0.8.6, first I installed ffdshow rev 1936, next i unpacked VLC from zip and I can play videos even with subtitles fully smooth, also with movies from DVD I don`t have any problems there. But I must for that change desktop to 16bit color, in 24 bit I can`t play anything smooth, but even with that 16bit color movies look good for me.

W7 "retro" PC: ASUS P8H77-V, Intel i3 3240, 8 GB DDR3 1333, HD6850, 2 x 500 GB HDD
Retro 98SE PC: MSI MS-6511, AMD Athlon XP 2000+, 512 MB RAM, ATI Rage 128, 80GB HDD
My Youtube channel

Reply 7 of 48, by swaaye

User metadata
Rank l33t++
Rank
l33t++

With the K6 you should look into being sure write allocation is enabled for the video card. I saw a big improvement on a Voodoo3 + Aladdin V + K6-3 setup by enabling it in the AGP GART with ALI's utility. Software playback became smooth.

I'm pretty sure on Intel this is handled automatically by some driver or the BIOS. I had a 300 MHz Katmai playing DVDs in software.

SSE is useful for MPEG2 iDCT and I think that's why it is so helpful.

Reply 8 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie
Tiger433 wrote:

What version of driver you use with your Radeon 9800 Pro ?

To keep things as lean as possible I used an older driver, Catalyst 4.12

elianda wrote:
I got some questions: What features are exactly accelerated by the Radeons in combination with powerdvd 4 ? Motion compensation […]
Show full quote

I got some questions:
What features are exactly accelerated by the Radeons in combination with powerdvd 4 ?
Motion compensation and IDCT is handled by the Radeons in PowerDVD. The processor still performs bitstream decoding, even in HW accelerated mode.

How do you measure the CPU usage?
Not in the most scientific way. 😊 But I believe the following method is Good Enough(tm).
-First, I wait a few minutes for Windows to settle down after booting.
-I then pick a scene with plenty of action and/or fine detail and play 2 minutes of that clip on each setup.
-As the clip plays, I keep track of minimum and maximum CPU usage numbers as displayed by Windows Task manager. I monitor total CPU usage, not dvdplayer.exe.
-I discard any extreme highs and lows. For example, the A64 3700+ entry in the chart had an actual CPU usage range of 0-21%. However, the system very rarely went below 7% or above 16%.

How do you measure the fps?
Those were really just rough estimates. I've played around with enough benchmarks that I know what 15fps looks like. But the main point of those notes was that the system was too slow to decode the video stream at its full frame rate.

Are the videos played from HDD? (ATA or SCSI? DMA?)
Movies were played from the DVD drive. All of the Win2K and XP systems had PATA drives with DMA enabled. The Win7 systems had SATA DVD drives.

Does the graphics mode you use has full overlay+scaling support? And is the same renderer output used in Win2K/XP/7 ?
Yes.
Win2K and XP were using the same renderer--PowerDVD doesn't let me choose. MPC-HC in Win7 used EVR CP. Also, DWM and transparent glass effects were enabled on all Win7 machines.

Why no direct comparison PDVD4 to MPC-HC in XP to get an overlapping data point?
Yeah, I should've done that. I probably will retest the faster XP machines with MPC-HC and add results to the chart later on. I originally picked PowerDVD for the 2K/XP machines to give the K6 and PII machines a bit of a boost. The LAV software decoder in MPC-HC outputs sharper looking video than the ancient version of PowerDVD, but it does burn more CPU cycles in the process. I believe it also requires SSE. So, to keep things consistent in NT5 land, I just used PowerDVD on all of them.

Which decoding filters (name+version) are used?
The XP/PowerDVD machines just used the standard Cyberlink MPEG2 decoder built into that version of PowerDVD.
Win7/MPC-HC used LAV video decoder 0.58.2, which is a slightly modified ffmpeg.

Is sound output format set to stereo (2.0)?
Yes

GL1zdA wrote:

Why are you using PowerDVD for the older ones?

PowerDVD 4 is very lean and works well with older hardware. Cyberlink's old MPEG2 decoder doesn't require SSE and is much faster than the relatively recent version of LAV/ffmpeg I used on the newer machines. Also, modern decoders don't appear to support older hardware acceleration methods used by cards like the Radeon VE and 9800 Pro; they just drop back to software mode.
While I could've used MPC-HC on the PIII and newer machines, I decided to keep using PowerDVD so that the newer CPUs could be directly compared to the older ones.

BSA Starfire wrote:

Interesting thread, I did try a few software DVD players on a Cyrix MII 333, ALi alladin V and RAGE PRO Turbo 8mb(on board), 512Mb SDRAM & Windows ME, was mostly a failure, the Cyrix is not a good choice for video playback.

I'd love to get my hands on some of the less common S7 chips as well as the 100+MHz 486 processors and do an MPEG-1/VCD comparison.

swaaye wrote:

With the K6 you should look into being sure write allocation is enabled for the video card. I saw a big improvement on a Voodoo3 + Aladdin V + K6-3 setup by enabling it in the AGP GART with ALI's utility. Software playback became smooth.

Hmm, I might have to look into that. None of the K6 machines are assembled at the moment, but if either of the S7 boards had that setting in the BIOS, it's definitely something I would've enabled.

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 9 of 48, by swaaye

User metadata
Rank l33t++
Rank
l33t++
Standard Def Steve wrote:

Hmm, I might have to look into that. None of the K6 machines are assembled at the moment, but if either of the S7 boards had that setting in the BIOS, it's definitely something I would've enabled.

Yeah on my ASUS P5A, it had to be enabled with either registry tweaks or the ALI AGP utility (basically a GUI for registry tweaks to their driver).

I believe you can use a utility like Central Tweaking Unit to enable write combining / write allocation with any video card.

It's a lot of effort for something that they should've gotten working automatically like Intel did.

Reply 10 of 48, by elianda

User metadata
Rank l33t
Rank
l33t

ctcm allows you to check if write allocation, write combining and cacheable area are set correclty.

Retronn.de - Vintage Hardware Gallery, Drivers, Guides, Videos. Now with file search
Youtube Channel
FTP Server - Driver Archive and more
DVI2PCIe alignment and 2D image quality measurement tool

Reply 11 of 48, by Scali

User metadata
Rank l33t
Rank
l33t

Video encoding and decoding is an interesting problem.
There are many variables to play with. For example, the MPEG format can be seen as a stream of a 'JPG image' (I-frame) followed by some delta-encoded information (motion blocks/vectors, mostly P-frame).
Now, different CPUs have different strengths and weaknesses.
So, one CPU may be faster if you encode the video with relatively many I-frames, while another CPU may be faster with more P-frames.
And then there is the issue that using more I-frames will result in lower compression, so you will require more bandwidth to read the video data.
Of course you could reduce the quality settings so the I-frames will become smaller again, reducing bandwidth... etc.

So what I'm saying is: it would be interesting to encode the same video with various different settings, to see how the different versions of the video perform on the same CPU.
Of course the encoder and decoder used can also affect things. Not all codecs perform the same, both in terms of speed and in terms of quality.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 12 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Added a few more charts!

-WMV at 480p
-xvid at 480p
-xvid at 720p

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 14 of 48, by alexanrs

User metadata
Rank l33t
Rank
l33t

I wonder how a dual PIII-S system would fare. Those CPUs are beastly when you remember how old they are. One would not think they are capable of HD video playback.

Reply 15 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie
F2bnp wrote:

Loving this thread! It'd be neat if you had some K6-2+/K6-III+ CPUs for comparison, I wonder if on-die L2 cache helps in any meaningful way.

I've been toying with the idea of setting up a K6 Super System. Something like a K6-3+ overclocked to around 600MHz. I wonder if it would catch up to the 300A @ 450 in video decoding.

alexanrs wrote:

I wonder how a dual PIII-S system would fare. Those CPUs are beastly when you remember how old they are. One would not think they are capable of HD video playback.

Dual PIII-S on an overclockable DDR board is something I want in a bad, bad way. Prices for such boards are insane here in Canada, so I'll probably never own one.

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 16 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Added new charts!
-H.264 at 852x480
-H.264 at 1280x720

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 17 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Added high bit rate H.264 at 1080p.

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190

Reply 18 of 48, by Standard Def Steve

User metadata
Rank Oldbie
Rank
Oldbie

Added 1080p VC-1. Software decoders for this format don't seem to be as efficient as software AVC decoders.

P6 chip. Triple the speed of the Pentium.
Tualatin: PIII-S @ 1628MHz | QDI Advance 12T | 2GB DDR-310 | 6800GT | X-Fi | 500GB HDD | 3DMark01: 14,059
Dothan: PM @ 2.9GHz | MSI Speedster FA4 | 2GB DDR2-580 | GTX 750Ti | X-Fi | 500GB SSD | 3DMark01: 43,190