Scali wrote:
This is what I developed for 486 back in the day: https://youtu.be/xE9iifKXvY4
I wanted to keep the subpixel/texel correction in a 486 renderer, because I feel it adds to the perceived quality and smoothness (Tomb Raider doesn't have that, and looks rather shaky). But I did not use the FPU for most maths, because fixedpoint was faster on 486. The matrix calculations, polygon clipping etc are also done entirely in integer.
It also uses z-sorting rather than z-buffering, because of the much lower memory bandwidth on a 486 compared to Pentium, which makes z-buffering very expensive.
Per-polygon Z-sorting using a painters algorithm (and likely ray-casting to organize that list and minimize overdraw) would tend to be the most typical and efficient arrangement. Per-pixel z-buffering can limit overdraw even more, but at the expense of eating up a lot more bandwidth just handling the Z-buffer. (also probably would've been nicer for hardware accelerated detail options to include Z-buffer vs software Z-sorting given how many early accelerators did slow Z-buffering and/or lacked the RAM to afford it ... or drivers where z-fighting is so prominent that painters algorthim type sorting would look a hell of a lot nicer; Tomb Raider II had a really nice variety of settings for its Direct3D renderer, but it doesn't seem to have been common at all)
Neat that you managed good sub-pixel accuracy there. I've had a few discussions with Playstation homebrew developers that mention the GTE (16-bit fixed-point geometry DSP) is capable of pretty decent sub-pixel accuracy as long as you make sure to use the right rounding rules and output the correctly adjusted vertex data to the GPU. Otherwise you get seams, and this is a problem that can persist even at 32-bit precision and with floating point calculations too. (a lot of it seems to be about understanding the behavior of the GPU -an issue that also crops up on a lot of Direct3D drivers for early 3D accelerators ... the Rage Pro still has noticeable seaming and jittering/sliding issues in some games)
Some Playstation ports also seem to suffer from software Z-culling/clipping optimized more for TV style overscan, leaving visible polygon drop-out in the PC version. (or at least it seems more visible than on the playstation itself)
Scali wrote:386SX wrote:Off topic: and what about optimizing Doom itself for the 386? Did anyone tried some different coding as the various port (as the Jaguar)?
I suppose Doom is already optimized for 386.
It doesn't require an FPU, and otherwise the code doesn't really have anything specifically bad for 386 or particularly good for 486. It runs very well on both, a 486 is just faster.
The column renderer method might favor the 486 a bit more than polygon renderers of the same period (with or without texture mapping) due to making better use of the 32-bit wide bus/registers (or even a 386SX's 16-bit bus to some extent -given you still have 32-bit registers to buffer into and 16 bits is still double the width of what Doom's pixel columns use -lots and lots of 8-bit writes as far as I understand). So X-Wing, Wing Commander III, and Descent (and a few 3D RPGs like Elder Scroolls and Ultima Underworld).
I'd think cacheless 386 systems would show a more dramatic bias here given the bandwidth and latency issues working in DRAM alone. (and gains from making page-mode reads/writes) And a fair number of 386DX40 boards lacked cache or used only 32 kB of cache and most 386SX boards lacked cache. (while an SX33 or 40 would be fast enough to at least handle X-Wing playably and maybe even Descent with the right settings)
Scali wrote:kanecvr wrote:So... it's theoretically possible to optimize dos_quake and gl_quake for a 486?
Software rendered Quake moreso than GLQuake.
The problem with GLQuake is that it runs on top of MiniGL, so you can't really escape using the FPU for geometry processing, because floating point is what MiniGL expects.
Wouldn't it work if the MiniGL had been written with integer math in mind in the first place? Otherwise there might be some fixed-point computation schemes that convert relatively quickly to 32-bit float format with less overhead than using Fxch. (aside from the K5 and late model K6-2)
Scali wrote:Yes, Doom uses the same trick as Wolf3D for the walls: you just raycast them, and draw scaled columns. The perspective is solved by the raycasting part.
One added problem with that is you're stuck with rendering one pixel per framebuffer write rather than potentially buffering spans on a 32-bit register basis or even longer spans still small enough to fit reliably into the L1 cache. Anything nonlinear (ie columns) written to DRAM would be particularly slow given you'd ruin any speed gains from page-mode. (granted, an entire render buffer will often fit into the board-level cache and avoid that problem anyway, particularly at 320x200x8bpp)
For the ceilings and floors iirc they have precalced the perspective in a table.
So they take advantage of the limited degrees of freedom in Doom to speed up the rendering (the rendering accuracy is pretty much 'perfect', like Quake).
They might be, but similar results could be achieved by rendering the floor as a rotated square (or multiple square segments) with scaling factor set only once per line. (with a Wolf3D style game with textured floors/ceiling added, a single square plane could be used given the lack of elevation) Using a pre-scaled table would save time though given the PoV angle is fixed and look-up is faster than hardware computation on the 486 and 386. (I'd assume tables are used for that in games like Whacky Wheels too, and for handling perspective in Mode 7 on the SNES -given how slow the CPU is, especially since the 65816 has fast/low latency memory access and interrupts -so good for both the table optimization and raster-interrupt handling)
Quake could probably get away with setting perspective only once per line (rather than using span subdivision) for quads (ie most/all of the level map) and leave triangles just for the enemies/weapon models (and fire and such) which are non-corrected affine mapped anyway. (I misspoke previously in my post a few montsh back on the gun-texture-warping, I realize those models omit the perspective correction)
But then again, the biggest change from Doom to Quake is that you got full 3d.
Doom pushes a 486dx2-66 to the max, so if you want full 3d as well on such a system, you''ll have to trade in performance and/or quality. Which gives you something like Descent.
Or all the space/flight sims that have relatively little on-screen compared to 3D maze/dungeon type stuff. (texture mapped dungeon crawlers like Ultima Underworld would be closer to Descent there than the likes of X-Wing, Tie Fighter, or Wing Commander III -and the full-screen polygonal stuff on those latter three are all untextured, with the exception of the carrier's flight deck in WCIII)
Descent does manage to look a lot nicer in texture perspective and sub-pixel allignment than Tomb Raider and without the limited PoV of Doom. (it might actually be more playable on a 386 than Doom is, especially comparing Doom in high detail mode) Tomb Raider is way more CPU intensive, obviously, but there's a hell of a lot more being drawn on-screen there. (both in terms of vertex count and pixel/textel drawing)
Tomb Raider has a horribly inaccurate software renderer though. It's neither very fast nor very stable. It does about the same as Descent does, but I think Descent does it somewhat better.
It has just about the same problems as the Playsation version of Tomb Raider does. (Tomb Raider 2 got a bit better on the Playstation though the PC software renderer doesn't support perspective correction at all and the palette works far worse for shading)
Descent seems to be a better example though, and uses much more quake-like level design. (Saturn Quake seems to be a good example of what a software quad+triangle rasterizer could have done without any span subdivision -just relying on affine mapping being naturally more correct on single-piece quads than 2-piece triangle strips/fans -not sure which term is applicable to 2-triangle primitives like that)
I wonder if Descent actually used a quad renderer ... obviously you need to use triangles in the geometry calculations (and all quads will end up as 2-triangle strips on the point-plotting end, but treated as single quadrilateral primitives by the rasterizer) Doing fully warped quads is more complicated too (Saturn/3DO style) but I think there's some simplifications there too, like limiting quads to squares/rectangles/rhomboids/trapezoids where at least 2 of the lines are parallel. (and any model primitives that don't fit the trapezoid limit can just fall back to 3-point polygons anyway, or even use the same rendering code and 'attach' two of the quad points to fold it into a triangle -like Saturn games do, but without the overdraw issue) That's simpler on a hardware design end too and is actually what the (unreleased) Jaguar II chipset does for its 3D primitive rasterizer. (the blitter can render trapezoids, but not free 4-point quads -I think it better matched the existing line-drawing algorithms the Jaguar Blitter supported and still offered more flexibility than a fixed-function triangle rasterizer)
Honestly, 3D projected/rotated rhomboids/trapezoids cover pretty much all the common instances where quads are used in 3D models anyway and where they're more useful than triangles. (the most obvious distortion in Tomb Raider -and other affine mapped triangle renderers- is on large rectangular surfaces anyway -which when projected in 3D, still end up as trapezoids anyway; unless my grasp on 3D perspective is totally off base here, or at least approximated 3D perspective using vanishing point allignment) Or ... maybe it only works with a fixed PoV like Doom (which Descent does to an extent as well -I don't recall being able to roll) so everything has parallel vertical wall-edge allignment). More skewed perspectives would make simple vanishing point style perspective non-functional or at least full of errors, so you'd need to resort to 4-point fully warped quads or triangles. (with the Jaguar II, triangles would be the only option and quads like that would need 2-triangle strips, or resort to GPU-assisted line-list style rasterization using quads, which would probably still be faster than software/GPU sub-divided texture spans -and note 'GPU' refers to the embedded RISC MPU, not the blitter or object processor)
Mind you, Tomb Raider does not perform any skinning, so you get very 'blocky' movement of limbs. Quake is smoothed out. I think Half-Life may have been the first to perform realtime skinning, giving you the best of both worlds. But by then we were well into Pentium territory.
Ironically, you end up with animated models close to the camera a lot more in Tomb Raider vs Quake, so the trade-offs might have better matched the two games if they'd swapped their animation styles. (OTOH you really need a high framerate running/walking animation for the player model at the very least, this ended up looking quite a bit weirder than Quake when Bathesda used it in Redguard Adventures -reserving enough memory to have a more fluid animation was probably a better solution than articulated models given it would only be for the player and not other animated models; plus the camera has a stiffer/tight chase view on the player compared to Tomb Raider, so the choppy walking -or step climbing- is even more obvious ... especially for a 1998 game with 3DFX support that otherwise looks rather nice ... well aside from the flat shaded lighting in the software renderer)
Well, I would say that they already passed the 486-station with Doom. Apparently that's what they did on a 486. Preferring image quality and smooth framerate over 'true' 3D.
In that case, Quake likely would've ended up designed around Duke Nukem 3D style portal limitations to work around the height-map level design. But column renderers are inherently slower than line renderers due to the inability to make use of multi-pixel writes (unless you render multiple columns simultaneously, which gets messy quickly). So a true 3D-space vertex plotted polygon engine COULD be faster than a Doom style column/span renderer if other trade-offs were made. (Doom on the Saturn and 3DO probably would've been a hell of a lot faster as a 3D engine than a ray-cast height map, and the Playstation port converted it to just that ... the Jaguar port might have been faster as a polygon engine for that matter -likewise Descent would have likely run faster on the Jaguar than Doom did)
Plus, 3D game aesthetics can be optimized to use textures more selectively rather than texture mapping every single thing, and speeing up rendering quite a bit by avoiding the need to fetch textels. (using ray casting to depth-sort detail levels, and rendering untextured, shaded models -possibly with lower polygon count- in the far distance could certainly have been one way to speed things up significantly, especially on more bandwidth bound hardware) Tomb Raider probably should have done that to speed up rendering. (it does due fade-to-black distance fogging and limited draw distance, but I don't think it cuts out texture mapping entirely)
So for 486 you'd probably only want a 32-pixel span loop unrolled. It's probably going to be quite a bit slower if you have multiple types of spans, and the logic to pick the right one.
I suppose all or nothing perspective correction would be somewhat more reasonable (just doing affine rendering on everything in a low-detail mode) but aside from the K5, I'm not sure how much that would help Quake's renderer given the other floating point operations are so much slower on a 486, 5x86, or 6x86 than a Pentium, pipelining or no. (on top of the 32-bit bus limit on 486 systems -even if you push the bus/L2 up to 66 MHz, the FPU bandwidth isn't remotely close enough ... albeit probably more of a slow FPU limit than 32-bit bus) Multiply and divide are the operations the Cyrix FPU really does better than a 486DX (and divide is as fast or faster than a Pentium) the slow Fxch doesn't even make int/float swaps a viable workaround for speeding up non-pentiums with the exception of the K5. (the K5 would probably be pretty damn fast at Quake if it used the FPU registers as buffers and did all the multiplies and divides on the ALU)
Even the K6 (and probably NX586 FPU) had a 2-cycle Fxch up until the CXT revision K6-2 (which came around after Quake 2's release so there was no incentive to even consider offloading floating point operations to the integer end and swapping back and forth due to Fxch overhead ... aside from the very limited marketshare of K5 users)
Either that, or saying 'to hell with perspective!', and just do affine texturemapping, with triangle subdivision to somewhat limit the distortion. But you should do this with subpixel/subtexel accuracy to maintain stability (that was my aim). That is something that Tomb Raider does not do (wasn't possible on PlayStation, but they could have done better than that on PC).
The sub-pixel accuracy issue seems independent of texture mapping entirely and also doesn't seem that common on the whole (you don't see gaps forming in big ship models in X-Wing or Tie Fighter ... aside from the camera clipping through models entirely in 3rd person mode or durring collissions) and it seems an odd coincidence that it cropped up around the time the Playstation became popular (and was a common -but not absolutely necessary- problem on that platform as well)
Also odd that seams seem to show up quite often, but overlapping polygon edges don't (polygons clipping through eachother where they meet). If it's a matter of rounding vertex data causing THAT instead of open seams, they made a bad decision given slight single-pixel clipping/overlapping like that is far less noticeable than the seams. (unless both happen and only the seams are noticeable)