I used to run a 98lite/NT4 dual boot, single CPU, and under NT4 OpenGL was faster than 9x on my TNT. I replaced the dual boot with Win2k once I got a copy (thanks, The Tower FTP, I miss you. She was my first love, er, first dodgy FTP server account).
I also have, in storage, probably now dead, dual P200 on a Tyan board. That started as dual P100s, and with searching slowly got it to the fastest the board could take. That machine ran NT4, a software firewall, with a PCI ISDN card as my home internet gateway over 20 years ago. The amount I spent to get something better than dial up! 64k each way, or 128k at twice the phone call costs, was stunning for the time.
I was Low Ping Bastard, I'm ashamed to admit 😀
I did try 2000 on the dual pentium, and what I remember was the install being staggeringly slow. I guess that is single threaded? The text boot during install I think doesn't use the SMP HAL. 2K didn't last on the machine, it ended up running NT4 Terminal Services Edition and being the file server for MP3s. Ripping station too, had SCSI Plextor drives in at some point. I did find a copy of LAME compiled for SMP, too.
With either NT4 or 2k, that dual machine was very snappy for a Pentium based system. 128meg of RAM helped.
Games back then did not use SMP, and Quake 3 was a famous example of one that did. No chance getting any performance out of that on a pentium, though. And I think Q3's SMP implementation wasn't great.
Probably best for gaming is to use the machine as a server. Running many dedicated servers on it at once would be the best use of resources. Some 9x-only games with a dedicated server mode might work on NT, too. Don't put them on the internet, but I hope you already knew that 😀