Optimize Quake and GL_Quake for a 486?

Reply 40 of 43, by kool kitty89

Posted on 2016-04-02, 12:50

kool kitty89 Offline

Rank Member

Rank: Member
Posts: 438
Joined: 2012-02-15, 08:43
Location: San Jose, CA

Scali wrote:
For the Pentium there is a simple explanation for that: Intel did not implement a dedicated integer multiply circuit. Instead, a […]
Show full quote
kool kitty89 wrote:
That said, it's interesting to note that integer multiplication performed poorer than FPU multiplication on all the 1995 CPUs except Intel's own 486.

For the Pentium there is a simple explanation for that:
Intel did not implement a dedicated integer multiply circuit. Instead, a multiply was performed on the FPU. So it was the same unit doing both mul and fmul, but the integer version had some extra overhead.
I am not sure if others such as AMD and Cyrix followed Intel's approach here.

The results of the integer and floating point comutation execution times in the 686 trials seemed to point to several CPUs mirroring proportionally fast speeds for FP and integer execution, though usually with significantly faster integer examples. I don't think literally swapping the operation over to the ALU and coercing it back into FP after the fact would be the only explanation there (or vice versa) given partially shared circuitry for execution of similar integer and floating point operations might be used instead. (shared or partially shared functional units, but not literal microcoded coersion/swap operations)
Designing things that way would also elliminate a lot of the value of even considering manufacturing an FPU-less variant of the CPU, but it would certainly limited CPU/FPU parallelism. (and most cases had much more logic and pipelining dedicated to keeping the integer operations moving as quickly as possible, not floating point)

There's several references to the Cyrix 5x86 and 6x86 FPU being derived from their previous 387 and 486 FPUs (faster execution than Intel or AMD 486 and added register optimization that sped things up a bit more on the 6x86, but nowhere near Pentium class), but that doesn't mean those CPUs weren't engineered to minimize redundant circuitry. (still, the FPU FIFO used and implied intended parallelism of execution really pointed to discrete functional units entirely)

Cyrix didn't actually have very fast integer or FPU computation compared to anything other than 486, they got their speed from cache efficiency and fast execution of certain logical operations. They did have an exceptionally fast integer and floating point divide (especially integer) but that really wouldn't improve performance much on the whole and I can only guess it's a fluke of other circuitry design that allowed that as a bonus feature. (there's no rational reason for so much bias towards division, on the contrary, AMD's bias against division performance and in favor of everything else on the K5 made perfect sense -- the K5 also has so much integer parallelism on the RISC execution end, I could see them sacrificing some of that to allow parallel FPU operations to be fed into shared functional units, same goes for the K6)

The fast integer divide could have been a quirk favoring better perspective correct (integer oriented) texture mapping on Cyrix CPUs (K6 and presumably NX586 had fast integer divide too, very close to as fast as the P6) but that sort of thing never eventuated.

I have no idea how any of that might apply to the 486s and their own FPU vs ALU performance oddities. (or why intel 486s seem to have exceptionally fast integer multiply)

Scali wrote:
kool kitty89 wrote:
(and likely ray-casting to organize that list and minimize overdraw)

I don't see how ray-casting would work for this case actually. Ray-casting a generic 'polygon soup' list of triangles will be very expensive.
Z-sorting is very efficient, and with a simple affine texturemapper, it is basically faster to draw a pixel than to perform a z-test. So you don't care that much about overdraw.

I recall references to some polygon renderers using ray-casting for scene visibility in some sort of sector arrangement. I forget if some references were to quake, but the context would be different there given the different sort of rendering to most other software renderers up to that point.

I think the Yetti3D demo on the GBA (and 32x) uses a square grid arrangement of screen subdivision for deciding whether a portion of a polygon is visible or not visible, sort of a low resolution depth buffer (I'd assume 16-bit Z values, but the screen resolution is very low) it potentially reduces overdraw somewhat and polygon sorting Z errors. (Z errors end up as square cookie-cutter artifacts rather than entire polygons overlapping in wrong areas -Tomb Raider II's software Z-sorting algorithm does this really badly in places; DOS, PSX, and Saturn Tomb Raider might be as bad but it seems less noticeable) Yetti uses something like 8x8 pixel cells I think.

kool kitty89 wrote:
The column renderer method might favor the 486 a bit more than polygon renderers of the same period (with or without texture mapping) due to making better use of the 32-bit wide bus/registers

What do you mean by that? The columns are just 1 pixel wide, which amounts to 1 byte in 256-colour mode. Since you draw vertically, you can't do 16-bit or 32-bit writes. So there isn't much difference between polygon rendering and column rendering, bus-wise. In both case you do one byte at a time.
I would say that a polygon renderer can actually make better use of the 32-bit registers, because you can do more accurate/efficient fixedpoint interpolation with 32-bit registers. For column rendering that doesn't apply.

I think I phrased that oddly or wrongly. I also wrote contradictory claims elsewhere in my comments.

32-bit optimization for horizontal line/span renderers (including line-fill oriented polygon renderers) would make more efficient use of 32-bit registers while 8-bit pixel wide column renderers would not at all. (aside from some optimization for horizonal column scaling to 2/3/4 bytes wide, and rendering 2 bytes at a time for the low-detail mode -or 2 or 4 bytes at a time for scaled columns)

The only advantages the 486 would have over the 386 for those 8-bit operations would be (I think) faster/preferential pipelined execution for some of those operations and some advantages of the on-die cache (or any cache at all in the case of some 386s and most 386SXs). Caching some portions of the horizontal spans rendered for floor/ceiling spans would also be faster, but presumably has better 32-bit register optimization as well.

Though on the topic of fast 8-bit operations, I'd think that would give some 286 optimized games an edge (per clock) over the 386 and 486, or at very least cacheless 386s. (all would be running in real-mode too, so no speed gains from linear addressing on the 386) Played Wolfenstein 3D on a 20 MHz 286 system, but haven't seen it played on a 20 MHz 386DX or SX (or one of those oddball 486SX-20s) ... or any 386SX for that matter.

kool kitty89 wrote:
Quake could probably get away with setting perspective only once per line

I doubt it, since Quake generally renders very large polygons for walls and floors. They would distort heavily over a long run of pixels.

Hmm ... it would probably look like Tomb Raider set to low detail (but without the sub-pixel and sorting errors), lots of warping in the horizontal plane in large spans. It didn't look nice, but it was still the most playable option for a lot of systems unless you wanted a postage stamp sized window. (well ... unless you try running it on a 386 ... in which case it's not really playable even then, but I know someone who did that in an exercise of curiosity ... or futility 🤣 )

I'm pretty sure the Playstation renders very similarly to that, actually. Not full 2-axis affine mapping (simple X/Y linear interpolation) but interpolation only in the horizontal with perspective calulated once per raster line of a polygon. (so very similar to affine line rasterizers in some PC games) Texture warping in DOS Tomb Raider (and the software renderer for Tomb Raider II) looks a lot like that of PSX games with limited/poor polygon subdivision whereas disabling perspective correction entirely in a hardware accelerated engine results in really weird totally zero perspective swimming zig-zag polygon soup. (no bowing texture fish eye effect just full on ugly intolerable mess -Ati's Rage II series seems to use some sort of coarder subdivision when perspective correction is enabled that still shows some distortion at times, not sure if its line or polygon subdivided but some utilities offered perspective correction detail levels too -I haven't messed with any of them myself though; S3's ViRGE shows no such perspective errors when correction is enabled)

I think this is basically a sort of constant-Z span rendering, or functionally similar. (just accepting the errors in favor of speed rather than customizing an entire game design system around limited paramaters of perspective correct rendering) Of course, the PSX GPU also restricts rendering to triangles and strips/fans composed of triangles.

kool kitty89 wrote:

But it appears to be a generic convex N-gon renderer, just like Quake. So it can do triangles, quads and more. There is a constant MAX_POINTS_IN_POLY, set to 100: https://github.com/videogamepreservation/desc … aster/3D/3D.INC

Ah ... somehow Quake's non-triangle-bound nature never came up in discussions or literature I've come across before. Thanks. (though it does make a lot of my other rambling commentary from earlier rather pointless)

This is mainly interesting when you use (approximate) perspective texturemapping. Namely, with proper perspective, the interpolation of gradients is the same across the entire N-gon's surface. If you were to render with affine texturemapping, it would distort too much, and you'd want to treat the N-gon as a triangle-fan, where you render each triangle separately, performing setup for interpolation for every triangle in the N-gon.

kool kitty89 wrote:
Also odd that seams seem to show up quite often, but overlapping polygon edges don't (polygons clipping through eachother where they meet). If it's a matter of rounding vertex data causing THAT instead of open seams, they made a bad decision given slight single-pixel clipping/overlapping like that is far less noticeable than the seams. (unless both happen and only the seams are noticeable)

The problem on the PlayStation is that texture coordinates had limited precision. This was fine when you rendered triangles as-i […]
Show full quote
The problem on the PlayStation is that texture coordinates had limited precision.
This was fine when you rendered triangles as-is, because you could model things in such a way that your textures always fit to the proper coordinates on vertices.
But when you introduce clipping, you run into a problem: You are cutting off a part of the triangle, and introducing new vertices. You have to cut off the texture accordingly. But if you do not have enough precision for your texture coordinates, you cannot fit the texture properly, and it will move around a bit.
The same goes for that other problem: lack of perspective. In order to get some perspective-correction, PS1 renderers would attempt to subdivide large polygons into sets of smaller ones. But again, you have the problem that you can't fit the texture properly on the new vertices.
A combination of these two factors is probably what you're seeing when the textures are 'shifting'/'wobbling' on screen.

Clipping errors aren't an issue if you just accept lots of overdraw and do little to no clipping at all, but you still get texture warping errors from linear interpolation of texture lines. (clipping erros might be the cause of twitching/shifting issues, but not th emore blaring bowing or fish-eye effect for large vertical wall-type surfaces near the camera)

There's a different sort of twitching going on with DOS Tomb Raider's medium detail setting, but that just seems like near polygons getting perspective correct subdivision of some sort and ones at a certain distance being uncorrected. (like ALL the polygons in low-detail mode)

I also misspoke on much of those trapezoid rendering comments, I believe. The fast/simple advantages of trapezoid rendering comes from exclusively rendering primitives with horizontal bot and bottom sections (these could include a zero width point, so a flat topped or bottomed triangle, but quads would be flat topped and bottomed) this limits practical use for actual quads to mostly special case models/environments and flat terrain/floor/ceiling map tiles/pannels (the sort often compared to SNES's Mode 7 tiles, or a texture mapped flat floor and ceiling for a Wolfenstein 3D type game ... or Doom type with lots of additional segmentation, but see below).

Using that basic rendering mechanism can be chained into polygons of various/arbitrary shapes so long as only 2 end points exist per horizontal line, thus chaining polygons into vertically stacked horizontal segments (a tower of sorts). The old S3 ViRGE 325 documentation in the Vogons archive actually describes this sort of rendering for 2D polygon fills (though not in their S3D engine section, though I suspect the 3D engine's commands parse those operations into 2D drawing commands using most of the same hardware). It makes a lot of sense to use given how simple an extension it is over existing 2D line-drawing algorithms. (in effect only drawing 2 2D diagonal line segments -or triangle legs- per operation/segment wich each successive step generating the end points for the next raster line fill)

On that thought of horizontal lines, though (and especially floor and ceiling spans or tiles), Quake and Doom both restrict PoV, even with quake's mouse look the camera still doesn't tilt (or roll, to use flight sim context), something Descent definitely has to deal with quite a lot of.

Reply 41 of 43, by leileilol

Posted on 2016-04-02, 18:34

leileilol Offline

Rank l33t++

Rank: l33t++
Posts: 12335
Joined: 2006-12-16, 18:03

Quake's PoV restriction is not a hard renderer limit. Quake can also roll and tilt

long live PCem

Reply 42 of 43, by amon26

Posted on 2017-02-25, 15:36

amon26 Offline

Rank Newbie

Rank: Newbie
Posts: 8
Joined: 2017-02-20, 07:24

Might be posting in a dead thread here, but I just got a 486/dx and I have a lot of fond memories of doing everything in my mortal power to get quake running smoother on it. Leileilol, i'd love to see what more you come up with if you ever have the urge to mess with it more.

Reply 43 of 43, by NUVIOLAJORGE

Posted on 2022-05-19, 17:34

NUVIOLAJORGE Offline

Rank Newbie

Rank: Newbie
Posts: 8
Joined: 2021-07-27, 20:21
Location: argentina

runs in a 386?
https://www.facebook.com/jorge.nuviola/videos … 357569269655358

Main menu