I'm maybe being a bit extreme here, but in order to qualify stability at a given speed, I prefer to test at a slightly higher speed. So to qualify a BX133 configuration, I would be in favor of testing at 138-140FSB.
The rationale is that it somewhat compensates for the imperfection of stress testing, and also establishes that a reasonable margin exists, so you know you're not sitting on the edge. I've used this approach when overclocking (not with this application though) and it has always given safe results. I was bit once when I didn't do this, and the person's PC started having BSODs about 3 months later.
One problem with doing this though is that it may just as easily introduce problems that have nothing to do with the video card. BX133 builds are already well above the originally intended speed of a 440BX chipset, so adding a slight bit more isn't inconsequential to it.
I suppose a test that stresses memory transfers across the AGP bus would be good, but I don't know what is best for this. A few laps at peak operating temperature of 3DMark at 138FSB/92AGP would be pretty convincing IMO (but time consuming). POSTing several times could maybe be another test, but as long as the occasional failure to POST doesn't bother the user then this doesn't necessarily matter.
It would be interesting to see a large population of test results from people's different cards all in one place. I have some cards I'd be interested to test, but I don't have a suitable system assembled right now.