VOGONS


Reply 40 of 48, by rasz_pl

User metadata
Rank l33t
Rank
l33t

The "VGA minimal acceleration" is a trick where you can force VGA to write 2 or 4 bytes for every byte you write. Its clunky, difficult to handle forcing unchained mode where framebuffer is no longer a linear stream of bytes, and works only for consecutive strips of bytes http://www.brackeen.com/vga/unchain.html#4

Doom "low detail mode " actually takes advantage of this by writing 160x200 = 32K bytes for every screen refresh and letting VGA chip automagically expand that into 320x200 picture. Sounds great in theory, in practice this hurt Doom performance with CPUs faster than 486 running at ~40-50MHz so Yes, this is similar to S3 decelerator situation 😀

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 41 of 48, by och

User metadata
Rank Newbie
Rank
Newbie

It's no surprise that software back then didn't take advantage of graphic acceleration, FPU, MMX, and whatever other technologies until 3D GPUs became popular, and OpenGL/DirectX started to mature. So much complexity and variety of hardware to explicitly support simply wasn't practical. I remember back in the 90ies I didn't understand how consoles, with comparatively slow CPUs and tiny amounts of memory, were able to outperform PCs - I did not know most of PC software simply did not utilize a lot of hardware capabilities.

I remember working and saving the entire summer to purchase my first Pentium PC. Still a teenager, I finally bought one at a computer fair, a Pentium 166 with a 1MB Cirrus Logic graphics adaptor. The PC easily shred through older 2D games, and handled 2.5D games such as Doom and Duke3D pretty well, but struggled with some early "true 3d" games. In particular I remember getting pretty low FPS in the original Quake, NFS2, Fatal Racing. I asked around, and someone suggested I upgrade my video card to a Matrox Millenium, which was around $400 back then - huge amount for a high school student. Never the less, I saved up and purchased a 4MB Millenium. With all the fancy claims on the box, I was expecting it to be twice as fast as my old Cirrus Logic, but in games there was no noticeable difference. Windows 95 interface was a lot snappier on the Millenium, but at that time I was not particularly concerned with that. I wanted smooth frame rates in 3D games, and once I got my first 3DFX Voodoo card my jaw dropped seeing glide enabled games for the first time.

I was also recently watching a podcast, it is a bit too technical for me, but it turns out there was always 2D GUI acceleration in Windows supported to a certain degree by various graphic adaptors, but for the last 10 years or so the hardware acceleration has been dropped, and GPU drivers full render Windows GUI in software.

https://youtu.be/rdjmtFExuC4

Reply 42 of 48, by och

User metadata
Rank Newbie
Rank
Newbie

Another question, what about multicore CPUs or even multi CPU systems? Do games and other software have to be coded to use multiple cores/CPUs, or do they benefit automatically if the OS support such configurations?

Reply 43 of 48, by Jo22

User metadata
Rank l33t++
Rank
l33t++

^It depends, I think. If a Windows program/game uses threading, it will benefit more from that.

Otherwise, Windows will simply spread its workload across all available cores.
Which in turn will allow the program/game to hog the first core more.

Personally, I think that this started with Windows 2000/XP and the Pentium IV.
Some versions of the Pentium 4 had Hyper-Threading, which simulated a second core/processor (a difference; two physical processors were the max).

Edit: There were earlier dual-cpu PCs from the 90s that Windows NT support.
But that was in the Windows NT 3.x/4 days, before 3D games and DirectX on Windows
NT had taken off.
Windows NT 3.1 had supported a rare dual 80386 PC, for example.

Anyway, I just mention it for the sake of fairness.
Some of these early dual-cpu PCs don't matter here, unless they get retrofitted with an upgrade to Windows 2k/XP.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 44 of 48, by Deano

User metadata
Rank Newbie
Rank
Newbie
rasz_pl wrote on 2023-12-26, 21:59:

Descent 1/2 dont use FPU in software rendering mode.

Maybe it was Forsaken, one of them did.

rasz_pl wrote on 2023-12-26, 21:59:
3D cards do consume floats, thats how you get subpixel precision. 3dfx cards accept either single precision floats or fixed poin […]
Show full quote
Deano wrote on 2023-12-26, 14:10:

In early 3D accelerated games we still didn't use floats that much because 3D cards didn't consume floats, triangle setup engines only appear in 2nd gen cards (Voodoo II and RIVIA for example).

3D cards do consume floats, thats how you get subpixel precision. 3dfx cards accept either single precision floats or fixed point for tringles, color, alpha, and texture coordinates.
"Triangle setup" unit of Voodoo 2:
1 culls triangles facing away from camera.
2 allows drawing triangle strips and fans (less data to transfer because coordinates of previous triangle are being reused).
and has nothing to do with triangle data format.

You are correct it could consume floats, my memory has faded, but internally it used fixed point and converted the input floats into its fixed point. Subpixel was available in the fixed point vertex format and are what the chip actually used.

Back then (at least in the circles I was in) "Triangle Setup" referred to the calculation of the interpolants, at the HW level (below Glide) the GPU doesn't calculate interpolants so its the CPU that calculates the interpolants.

I only interacted with that level of HW when working on a Voodoo 1 based arcade renderer, so I assume that the advertised "Triangle Setup" includes calculating the interpolants but perhaps it didn't.

Last edited by Deano on 2023-12-27, 06:48. Edited 1 time in total.

Game dev since last century

Reply 45 of 48, by Deano

User metadata
Rank Newbie
Rank
Newbie
rasz_pl wrote on 2023-12-27, 03:21:

The "VGA minimal acceleration" is a trick where you can force VGA to write 2 or 4 bytes for every byte you write. Its clunky, difficult to handle forcing unchained mode where framebuffer is no longer a linear stream of bytes, and works only for consecutive strips of bytes http://www.brackeen.com/vga/unchain.html#4

Doom "low detail mode " actually takes advantage of this by writing 160x200 = 32K bytes for every screen refresh and letting VGA chip automagically expand that into 320x200 picture. Sounds great in theory, in practice this hurt Doom performance with CPUs faster than 486 running at ~40-50MHz so Yes, this is similar to S3 decelerator situation 😀

The VGA also has a single free logic op, which can be useful (The classic example is arranging the palette with a dark set of colours, so you can change one bit and get the darker bits). But as you said once the CPU got high enough speed that it was better to work on local memory, it stopped being useful.

A lot of optimisations only work for a limited time (in HW terms), so 286 era using the internal 32 bit VGA bus is great. 386+ and its now a lose.

Game dev since last century

Reply 46 of 48, by Jo22

User metadata
Rank l33t++
Rank
l33t++

It's not really a feature of VGA, but rather its RAMDAC, but.. Palette cycling kind of was an intelligent feature, too.
Animation could be simulated without the need for transferring lots of data.

Here are some examples from the Amiga, which had same capability.

SAK_NightFlight.gif
Filename
SAK_NightFlight.gif
File size
180.67 KiB
Views
304 views
File comment
"Night Flight" by Sheryl Knowles, done with Graphicraft; https://amiga.lychesis.net/applications/Graphicraft.html
File license
Fair use/fair dealing exception

(You need to click the picture or save the destination to see the GIF animation.)

On DOS, the intro/title screen of Descent 1 might be a good counter example.
The fading effect ("Descent") was done with palette rotation/cycling.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 47 of 48, by Deano

User metadata
Rank Newbie
Rank
Newbie
Jo22 wrote on 2023-12-27, 07:18:

It's not really a feature of VGA, but rather its RAMDAC, but.. Palette cycling kind of was an intelligent feature, too.

No the RAMDAC also has a palette mask field but VGA write mode (2+3 IIRC) can apply a 8 bit logic op (AND/OR/XOR) to the incoming data. It was originally designed to ease 4 bit pixel modes (EGA/VGA 16 colour) but was also useful in a few ways in 256 colour modes. It meant you could make small changes to the data without the CPU having to change the data manually. Potentially saving a few instructions per pixel.

Very minimal but useful free acceleration.

Game dev since last century

Reply 48 of 48, by DrAnthony

User metadata
Rank Newbie
Rank
Newbie
Deano wrote on 2023-12-26, 14:16:
Gmlb256 wrote on 2023-12-26, 13:47:

MMX is a general-purpose SIMD instruction set and software need to have explicit support for it. Some games such Rebel Moon Rising, Extreme Assault and POD used it for software rendering, but performance pales in comparison to real 3D hardware accelerators.

MMX is integer only, wasn't until SSE with the Pentium 3 that we got float SIMD. Which was useful for accelerating geometry transforms if you need high vertex throughput but most games didn't.

At least on the Intel side, but AMD brought 3D NOW with the K6-2 a bit earlier. It definitely made a huge difference when implemented on both the driver side (like 3DFX) and game engine side (Quake and Quake 2 I believe).