How was multitexturing on the GF256?
In synthetic tests I found multitexturing increased the texel rate by about 80%. The GF2GTS brought this up to 89%, so surprisingly not a huge gain despite moving to a dual-texturing pipeline setup.
NVIDIA's next single-texturing architecture, NV40, increased the fillrate in multi-texturing tests by 77% in my results. NVIDIA was very good about leveraging multi-texturing, compared to ATI's R300 architecture which only saw around ~46% increase (R420 and up seemed to fix this).
I think 3dfx Rampage was built around whatever "pixel shader 1.0" is.
The original Radeon had something like NVidia shading rasterizer / register combiners (NV1x) but I read an article years ago that described how NV's solution was more useful in the end. I wish I could find that article again but I have no idea what site it was on. On the other hand Radeon could perform EMBM whereas only NV20 onward can do that.
EMBM seems to be another case of infamous "cap bits", in which its support goes back to DX6 but if and when it ever became a minimum requirement, I'm unsure about. Clearly NVIDIA could exclude support for it and still claim full DX6 and 7 compliance, but Matrox and ATI went beyond those requirements.
But obviously that doesn't imply a programmable shader even if its use seems to be more commonly associated with DX8 and up. The extent to which R100's shaders were actually "programmable" will probably remain a mystery, and per Scali's quote seems to be something early drivers dabbled in, but have since buried.