Netburst was unusually sensitive to optimization, so I think it's plausible that a carefully programmed game would narrow the "per-clock" performance gap between it and P3. I have no idea what games, if any, make a good demonstration of this though. I presume the P4s relative performance should have improved when newer compilers were released or when programmers took the time to write in assembly.
I think that higher priced professional software (CAD and whatnot) was where programmers would invest the most time and effort into this, and they were targeting the P4 pretty specifically.
Philosophically I have some sympathy for what Intel was trying to do, in designing a CPU that had the potential to be very fast if the code for it was written carefully. In that sense I disagree with the popular notion of Netburst being a "brute force" approach. I perceive it as meaning to be elegantly powerful but it relied heavily on software being written to work with it smoothly, so it could stay on the throttle and not have to dump the pipeline too much.
I think by the 2000s not many programmers (or the companies that employed them) wanted to invest as much time into optimization as they would have say 5-10 years prior.
Another issue though is that by the 2000s there's a lot of enhanced craziness going on in superscalar x86 CPUs that the programmer can't directly control even in assembly. It's more difficult (and I imagine frustrating) to optimize when the CPU's performance depends on features that can't be taken out of autopilot. It was like assembly had become deprecated to where it no longer served it's original purpose of giving the programmer full control over the CPU.
I think Itanium attempted to address that, but it went nowhere.
Now when you get to Prescott, yeah I call that brute force.