Thanks for the write-up in that thread. I found that performance problem with PPro, but I didn't know the cause. I did previously compare a few of the older gcc compilers, and it was noted that the oldest of the few performed as well as any in a Pentium/DOS framework, notwithstanding compatibility trade-offs (as in the case of Quake2, where exe and dll are compiled by different gcc versions).
From my limited readings, it seems that J. Carmack was writing 3d engines on a nearly monthly basis, and he must have acquired great insight into all the problems spots, maybe not by theory, but just by practice. The Quake renderer may not be elegant underneath the surface, as far as people have said that; but I think it would be difficult to improve upon, as you noted, without losing too much visual detail.
Another avenue would be if the vQuake code was available. 😀 Although the extra video processing requires the hardware.