VOGONS


Reply 41 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member
swaaye wrote:
Putas wrote:

my experience is opposite: http://www.vintage3d.org/virge.php

Nice work here Putas.

Did you do your video card tests and recordings with ViRGE cards at default speeds or overclocked like Putas did? (I've also seen some references to the ViRGE 325 actually being rated for 80 MHz operation, so lower clock rates may have been selected for other specific tolerances of the cards using the ViRGE)

Last edited by kool kitty89 on 2012-03-20, 06:16. Edited 2 times in total.

Reply 43 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member

Another note: on Putas's page, he lists the 55 MHz ViRGE's bandwidth as 440 MB/s, but that's not right.
The ViRGE in question is using 50 ns EDO DRAM (as seen in the photo -and is also standard for the Stealth 3D 2000). In this specific case Etron Tech EM614153A-50 chips are used, and those are rated for 20 ns EDO page-mode read/write cycles which translates to 50 MHz maximum access speed (with ideal timing for all other pulses needed per access -which is unlikely/impossible given the 55 MHz clock rate limtation), so the ViRGE almost certainly limited to 2-cycle long peak access times (and thus 220 MB/s and not 440).
However, it should be noted that, at 80 MHz, at 2-cycle long accesses, that same RAM should still fall within acceptable tolerances, so the successful overclocking would make sense.

And on that note of RAM speeds, I have a Stealth 3D 2000 with HYM324025S-50 chips: also 50 ns DRAM, with very similar timing requirements to the above (as most 50 ns EDO DRAM is).

Reply 44 of 57, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

Oh boy, the work is never over. I ignored it since many memory chips can significantly surpass their declared timings. Or that is what I gathered in my post EDO days. The oc seems to go over the top tough. I got also 40ns board which is limited by the chip at ~86 MHz.

Reply 45 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member
Putas wrote:

Oh boy, the work is never over. I ignored it since many memory chips can significantly surpass their declared timings. Or that is what I gathered in my post EDO days. The oc seems to go over the top tough. I got also 40ns board which is limited by the chip at ~86 MHz.

Yes, 50 ns EDO RAM working fine at the 55 MHz speed would be believable (though unlikely as most manufacturers stick to the conservative side of such tolerances), but working at 80 MHz would be way too far past that to be realistically stable.

On top of that, I've some seen references to the ViRGE 325 being a 2-cycle renderer.

OTOH, with pixel/textel fillrates of 1/2 pix per clock peak (unfiltered), that obviously wouldn't be taking advantage of the 64-bit bus (which would allow 4 16-bit pixels per read/write for a potential peak 1 pixel per clock -or 2 16-bit pixels per clock peak for cached textures or for fill operations -and obviously slower real-world performance due to additional latency from DRAM page changes -something several older game consoles were already doing, even the Atari Jaguar with peak 4 pixels/clock for solid fills and gouraud shading or 2 pixels/clock for cached textures or fills on the Playstation).

Albeit, this issue is also true for the performance specs of most contemporary GPUs (the peak fillrate being listed as far less than the RAM bandwidth would allow), which would imply that the high maximum bandwidth possible on many of those early cards is only being used for framebuffer scanning and 2D blitter operations. (or only partially taken advantage of for 3D to reduce average latency -buffering for 64/128-bit DMA for pixel reads/writes and cache fills even though the peak fillrate would be limited by the GPU core as well -but maintaining much closer to theoretical peak GPU throughput -I'd assume the latter given the level of technology in question and the overall cost investments in implementing 64/128-bit buses)

Reply 46 of 57, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

Stupid me for blindly using sdram math. After re-learning EDO I think it will be best to not state any bandwidth numbers at all, since I don't see a way how to determine when burst mode can be used. Rather then 2 cycle operation I am more likely to believe actual memory clocks are lower then reported.

Reply 47 of 57, by swaaye

User metadata
Rank l33t++
Rank
l33t++

Are you guys aware of the Virge documentation PDF available on VOGONSdrivers? Full register info, etc.

http://www.vogonsdrivers.com/getfile.php?fileid=326

Reply 48 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member
Putas wrote:

Stupid me for blindly using sdram math. After re-learning EDO I think it will be best to not state any bandwidth numbers at all, since I don't see a way how to determine when burst mode can be used. Rather then 2 cycle operation I am more likely to believe actual memory clocks are lower then reported.

swaaye wrote:

Are you guys aware of the Virge documentation PDF available on VOGONSdrivers? Full register info, etc.

http://www.vogonsdrivers.com/getfile.php?fileid=326

OK, interesting, it lists support for both 1 and 2-cycle EDO DRAM timing (presumably all cards using RAM in the 50/60 ns range are using 2-cycle accesses, while 40/35/30 are probably 1-cycle)

As for EDO vs SDRAM timing/bandwidth, SDRAM still has to deal with most/all of the same bottlenecks as EDO, but the main difference is that more of that is handled by logic/buffers on-chip by the RAM itself while EDO is more reliant on the external DRAM controller (a suitably fast and capable async DRAM controller should manage similar performance with EDO DRAM as SDRAM of similar speeds -ie 30 ns PC66 EDO should do about the same as PC66 SDR).

FPM DRAM is another story though . . . and even more complex to properly gauge than EDO due to the lack of overlapping cycles and latching. (meaning real-world performance is even more reliant on the DRAM controller, especially the clock speed -to allow close to perfect pulse widths for all parameters . . . for example, you'd need a 100 MHz controller to reach max performance of most 80 ns FPM DRAM -you'd still be limited to 50 ns Page mode cycle times, but much better than the 80 ns PC time you'd get from a 25 MHz controller and moderately better than the 60 ns you'd get from a 50 MHz controller -FPM page cycles need 2 pulses to complete, but one can be much shorter than the other so 40+10 ns in the case of a 100 MHz clock and 80 ns FPM DRAM -EDO avoids that by latching/overlapping those 2 cycles, meaning slow controllers aren't a detriment as the 2nd cycle is hidden anyway, so similarly timed 80 ns EDO DRAM would allow 40 ns page-mode cycles with a 25, 50, or 100 MHz controller -and, obviously, comparing other RAM speeds and bus/DRAM controller speeds would have other variables too)

Reply 49 of 57, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

The datasheet is nice, formula clear, but I cannot get any definitive answer from it. I don't know how to read the values from bios, or *gasp* set them.

My 40ns board is clock for clock performing same with the 50ns Diamond.

Reply 51 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member
Putas wrote:

Maybe easiest way to confirm edo operation would be a game with optional hardware cursor, though I do not know any which could run on the Virge.

Fillrate benchmarks might be useful too. If the card performed even slightly better than 4 (16 bit) pixels per clock, it would definitely be doing single-cycle accesses. (albeit, you'd need a test that specifically did plain 2D line/polygon fills, not copy/fill with textures -which would throw things off with source texture fetching)

The ViRGE does support both 1 and 2-cycle EDO timing, so it should vary depending on the specific card in question.

On another note, it would be nice to confirm the 2-cycle texture mapping performance spec (peak throughput of 1 pixel/textel per GPU 2 clocks without filtering -or 27.5 Mtex/s at 55 MHz).
Apparently, filtered textures require 8 passes, but I'm not sure if this means 8 cycles or 8 2-cycle passes -I believe it's 8 cycles though. (I haven't found information in the documentation to confirm or contradict this)

Looking at the documentation some more, I also noticed there's some very nice Alpha blending support in the ViRGE with textures of 32-bits (8888 ARGB), 16-bits (4444 ARGB), 16-bits (1555 ARGB), 8-bits (A4-blend-4), plus 8 and 4-bit textures without alpha. (it seems 8-bit paletted textures are supported, but 4-bit is limited to the blend4 format -which seems to generate colors by taking RGB values stored in 2 registers and interpolating between them -limiting 4-bit textures to interpolated gradients between those 2 colors, the same is true for 8-bit alpha+blend4 textures)
Blend4 4-bit and 8-bit paletted textures have no alpha channel, so no per-pixel alpha effects would be possible. (though per-line and per-polygon blending effects should still apply)

Edit: I believe S3's terminology uses "decal" blending to refer to per-polygon or per-texture blending effects. (ie making the entire texture or polygon translucent rather than using per-textel alpha -it also explicitly mentions that 8-bit paletted textures can only use "decal" blending)
It also mentions that paletted textures cannot be filtered.

Reply 52 of 57, by bytesaber

User metadata
Rank Member
Rank
Member
MaxWar wrote:

I still cannot be assured it will run older s3 titles to my expectations...

Is there a thread or place that can show me the past on "s3 titles" and hardware needed? I am interested in DOS games that can talk directly to 3D cards without drivers.

Just like how Tomb Raider works directly with a Voodoo 1 under DOS. I also have a S3 325 (diamond stealth 4000) and it works directly with Terminal Velocity.

Are there other titles of this nature? I understand that Tomb Raider can not speak to a Voodoo2 or Voodoo3 under DOS. Just like how Terminal Velocity can not talk to a S3 DX or other non 325 chips. (asside from patching).

I guess my question is, how did this story pan out for S3 titles and what are those titles? Do they all require 325 only? Sorry if this question is poorly written.

Reply 53 of 57, by Putas

User metadata
Rank Oldbie
Rank
Oldbie

Think I've found the game with hardware cursor- good old Hardwar. There was no performance difference with 325 or DX. Then again some cards like ELSA are specified explicitly as single cycle. Still confused.

bytesaber wrote:

I guess my question is, how did this story pan out for S3 titles and what are those titles? Do they all require 325 only? Sorry if this question is poorly written.

S3d titles were produced till 1997 even for Windows so the later one support newer Virges.

Looking/searching for s3d (S3 Virge specific) games

Reply 54 of 57, by kool kitty89

User metadata
Rank Member
Rank
Member

Update to this old topic: I discovered the mclk.exe overclocking utility here http://www.geocities.ws/liaor2/myutil/mclk.html

Which allows direct control over the clock synthesizer PLL configuration as well as being able to change (and display) some register settings and display the BIOS defaults.

Both of the Stealth 2000 boards I have end up showing as 1-cycle EDO cards and 55 or 56 MHz (I'll have to recheck at some point to be sure on the exact values there). Mode switching to 2-cycle showed a noticeable performance drop (though mostly for fillrate and 2D performance) though using FPM timing mode seemed to be the same as 2-cycle EDO. (there's also the undocumented 4th register setting for memory timing, but that seemed to be similar to 1-cycle EDO though maybe not as stable with overclocks, I didn't really stress test too much)

Neither card was solidly stable at 80 MHz for me (especially if changing resolutions or modes -using Mclk is worse for this given it executes in fullscreen DOS mode before reporting the results in a shell window) though I did notice some clcock synthesizer combinations that were more stable than others (same resulting clock, but different settings to get there -there's 3 clock paramaters that can be set, the instructions for using mclk aren't the most clear but there's pointers and help info provided in the program itself with a formula displayed for the clock synthesis parameters so with a bit of fiddling you can mostly work out what does what).

Stability int he 80 MHz range didn't seem hurt or helped by enabling 2-cycle EDO mode either, so it doesn't seem burst-mode timing was the breaking point for stability.

The second card I have, using Etrontech 50 ns DRAM similar to Putas' board overclocked slightly better than the other, but I wasn't totally confident it was happy at 80 MHz. I did several runs in the mid 70s that seemed OK and flipped through a range of speeds from 65 to 79 MHz (65, 71, 73, 76, 79, and 80 -the latter completed benchmarks but artifacted and 79 did at times too, though sometimes would disappear with a screen refresh). 73 seemed the fastest speed that was totally stable and oddly did better than higher clocked tests in Final Reality for at least some of the categories. (I'll need to go in and compare all results again to be sure there, but I think the city scene was one that continued to gain noticeably as clocks kept going up while others seemed bottlenecked by other things, 2D tests didn't seem to gain much or at all from the highest overclock range either)

2 MB fails to fit all textures in the Robots test in Final Reality, so they all scored pretty low there being PCI DMA bound. (though there was still noticeable difference in scores at different clock speeds)

I'm unsure if messing with the clock synthesis also alters the video DAC clock rate at all, or if its entirely asynchronous. (in a synchronous context, I'd assume the commonly cited 135 MHz would apply to 2.5x 54 MHz or 2x67.5 MHz for less conservatively clocked cards; at one point I was wondering if a minimum DAC multiplier arrangement might add one other area for stability issues ... a 1.667 (5/3) ratio would match exactly with 81 MHz core clock which is exactly where both of my cards hit really weird artifacting different from other video errors at lower clocks, but re-reading this thread and the mention of 86/87 MHz speeds with boards with faster RAM makes me think this is just a memory limit on my end)

In any case, it appears I was mistaken with my comments regarding graphics card manufacturers being that conservative with DRAM speed and staying within official specs. I'm still surprised 80 MHz is even close to stable on those 50 ns chips though, and more so that 1-cycle mode doesn't really seem to fair any worse than 2-cycle.

It does make me wonder why S3 went with the ViGE DX and SDRAM when the 325 was capable of good high clock rates with EDO DRAM (at least faster rated EDO). Or why video board manufacturers didn't do so. DX added some features and was faster per clock, but not to a huge degree and the 2 MB limit on most common 325 cards was the bigger limit than anything, especially with the persisting lack of additive (saturation) alpha blending modes and errors on zero opacity portions of 1-5-5-5 and 4-4-4-4 alpha textures alike in 16-bit rendering mode plus apparent inability to disable dithering. (4 MB 35 ns cards should have done pretty well at 80 MHz or so) Focusing on 325 longer might have led to more driver fixes than happened with modified/updated 1997/98 ViRGE variants too. (if forced dithering was a driver level issue, that would have been a good one to fix, though given alpha errors show up even on what appear to be 1-5-5-5 textures where rounding errors should be impossible -simple on or off opacity- I'm not really sure what to think, unless those were wrongly converted to 4-4-4-4 with low-opacity rather than zero, in which case programming/API/driver errors) Hmm, at least the ViRGE could filter alpha textels unlike Rage. (I think vertex alpha is the exception for Rage, per-textel alpha is stuck unfiltered, but of course Rage II/Pro can do additive alpha, Virge can't even do that on vertex alpha modes -hence the opaque 3D block text at the end of Final Reality)

Come to think of it, does Terminal Velocity actually have those alpha texture boarder (rectangle) artifacts around explosions and such?

Reply 55 of 57, by Putas

User metadata
Rank Oldbie
Rank
Oldbie
kool kitty89 wrote:

I'm unsure if messing with the clock synthesis also alters the video DAC clock rate at all, or if its entirely asynchronous.

Of course it is asynchronous.

kool kitty89 wrote:

I'm still surprised 80 MHz is even close to stable on those 50 ns chips though, and more so that 1-cycle mode doesn't really seem to fair any worse than 2-cycle.

That is quite in the norm for 50ns EDO, usually there is some reserve, so for example 50ns chips can work at 40ns timings.

kool kitty89 wrote:

It does make me wonder why S3 went with the ViGE DX and SDRAM when the 325 was capable of good high clock rates with EDO DRAM (at least faster rated EDO). Or why video board manufacturers didn't do so. DX added some features and was faster per clock, but not to a huge degree ...

EDO was memory of choice also for DX, many have high ceiling as well. Depending on context 50% improvement can be huge. Why S3 did not want high clock boards (overall, not only with Virge) is indeed a mystery.

kool kitty89 wrote:

Focusing on 325 longer might have led to more driver fixes than happened with modified/updated 1997/98 ViRGE variants too. (if forced dithering was a driver level issue, that would have been a good one to fix...

The architectures were still pretty similar, it is unlikely there was lot more to do for first Virge. Do I understand you right that you expect better image without texture dithering?

kool kitty89 wrote:

Come to think of it, does Terminal Velocity actually have those alpha texture boarder (rectangle) artifacts around explosions and such?

Not border specifically, but they are visible because of the color inaccuracies.

Reply 56 of 57, by vutt

User metadata
Rank Member
Rank
Member

Got my hands on "Generic" Virge DX, decided to test and give my thoughts/hopefully helpful hints.

Actually it runs OK-ish with few tweaks on my P3-933Mhz/Asus P3b-f rig. Yes textures look low res, but it is in line with typical mid 90-ies 3D game look.
I at least couldn't find any footage with FPS counter on screen on YT so I took my own crappy mobile off the screen one: https://www.youtube.com/watch?v=RxACmeoUnWA
It's hovering between 30 - 15 FPS mainly, but I have to admit controls doesn't feel like it's 15FPS. Very playable.
Recommendations:
1) Turn sky textures OFF - will increase performance
2) Pick Menu->Graphics Options ->Poly: Perspect - it will fix rather ugly texture morphing
3) I had trouble finding Virge DX version wrapper. For those who struggle like me files can be found here: https://ctrl-alt-rees.com/2020-12-06-s3-virge … -downloads.html

One more thing - game has build in bench tool with few performance stats. While in Main Menu press Ctrl-Z. It indeed runs slower in S3 3D mode. In S33D mode first line "Memory write BW" clocked only 428 MB/sec compared to regular 2D software mode as seen on attached screenshot below.
I tried to emulate slower PC by disabling P3 L1 cache with hope that perhaps on slower machine Virge will act as 3D accelerator - no it did not. In both modes it run aprox 3-4 FPS.
Also even in 2D mode S3 Virge is slower (45fps) than for example my regular Geforce3 Ti200 (fully maxed out 60FPS). Although GF3 is AGP card so it's unfair to compare.

Last but not least in other news - new remastered version seems to be coming up soon: https://www.gog.com/news/coming_soon_btermina … oosted_editionb

Update: Played around with MCLK tool in DOS today. Well it looks like I got hit by shitty end of silicon lottery. My Virge DX cant go over 56Mhz without getting dos text scrambled...

Attachments

  • TVBench.jpg
    Filename
    TVBench.jpg
    File size
    225 KiB
    Views
    602 views
    File license
    Public domain