VOGONS


Real hardware DOS gaming console for HDMI displays

Topic actions

Reply 100 of 113, by LightStruk

User metadata
Rank Member
Rank
Member
ragefury32 wrote on 2021-01-07, 04:46:

Wait, did I read their block diagrams incorrectly? I went through their Youtube channel and they had an almost 2 hour deep dive of the EX2. It's in Mandarin Chinese, but the slides are in English.
https://www.youtube.com/watch?v=AB9paYWC5Qo

The way how I interpret the architecture is that the crossbar I/O setup can be mapped to either cores at any time, and the master core has access to re-map if needed, so they share the entire I/O fabric - the idea is that it was supposed to run 2 segregated OS in the same hardware, but with the more powerful one moderating I/O access to the weaker one. You "should" be able to put a PCIe video card (like the Vortex86 VGA card) onto the EX2, map it to the slave, map the GPIO pins for the ISA sound chip to it as well and then use USB (which is mapped to the master) for Wifi and/or Bluetooth for the controller, then maybe have the master moderate keystokes/controller positions/whatever to the slave. Then the master can write to that LCD framebuffer for implementing menus. The awkward thing about the Vortex86EX2 deep dive is that they never really talked about what CPU instructions are supported in the core.

I've also seen that video, and I agree with everything you wrote here, except for the bit about mapping ISA to the slave core. The block diagram on the product page clearly only shows ISA on the Master core, and the fact sheet for the Slave core doesn't list ISA as an option either, while the fact sheet for the Master core definitely does list ISA.

ragefury32 wrote on 2021-01-07, 04:46:
Okay, so I am going to make 2 assumptions: a) The slave core in the EX2 is basically the same core as the one on the EX (which i […]
Show full quote

Okay, so I am going to make 2 assumptions:
a) The slave core in the EX2 is basically the same core as the one on the EX (which is used in the 86duino) - hmmm, /proc/cpuinfo for it exists here, and we can see if it's a i586 or an i686 if we look at the eflags:

http://www.86duino.com/?page_id=85/installati … g-1/linux-image

The ones to note:
fpu tsc cx8 sep cmov mmx
FPU/TSC is pretty standard fare for anything p5 and above.
Note that it has cx8 (common on p5), mmx (that's on the p55c and later), cmov (that's a P6 class instruction) but mtrr is not there. So it's probably i686 kosher, but you can't mess with the MTRR to combine block transfers like you can on a K6-2 or a Pentium 2+, although you'll probably need to play around with one to determine that.

b) The master core is an EX core but with more stuff inside, and here's the /proc/cpuinfo with eflags.

https://tortoiseit.blogspot.com/2020/09/infor … 86-ex2-soc.html

The ones to note:
fpu pse tsc msr pae cx8 apic sep pge cmov pat mmx fxsr sse sse2 nx cpuid pni ssse3
FPU/tsc is assumed, PSE is a p5/p6 option, it does have the SSE-SSSE3 stuff, PAE/NX (usually found together in anything newer than a Banias Pentium-M).
So basically, it claims about the same features as a Yonah Pentium-M (well, Core Solo but with much less L2)

So no, the EX2 is probably more powerful than you think....but D&MP's technical information is rather vague and doesn't answer the tough questions.

Thank you for finding the /proc/cpuinfo for the master core. With SSE2 and SSE3, it would be fun to benchmark it against a P3-600, just to see how it compares clock-for-clock.

Reply 101 of 113, by ragefury32

User metadata
Rank Member
Rank
Member
LightStruk wrote on 2021-01-07, 14:54:

I've also seen that video, and I agree with everything you wrote here, except for the bit about mapping ISA to the slave core. The

block diagram on the product page clearly only shows ISA on the Master core, and the fact sheet for the Slave core doesn't list ISA as an option either, while the fact sheet for the Master core definitely does list ISA.

Thank you for finding the /proc/cpuinfo for the master core. With SSE2 and SSE3, it would be fun to benchmark it against a P3-600, just to see how it compares clock-for-clock.

Oh god. That frakking site - you have to wonder who wrote their documentations and why it's so monumentally bad. Well, bad as in being pedantic about useless info, and skipping over stuff that's actually important. For a really good example of that, download their fact sheet for their master and slave cores. The slave core documentation is a hilarious read:
http://www.vortex86.com/downloads/Vortex86EX2

So a quick summation -
- 30 pages, but the first non-index page starts at 14, so it's probably copy-pasted verbatim from a bigger document. Page 1-3 gives me a listing and also shows ISA as being supported.
- Page 4-6 is just a general 386-style register listing. If it's claimed to be x86 compatible this is already assumed and tells me nothing I don't know
- Page 7-21 talks about Northbridge/Southbridge config registers. Great - but this is not an I/O reference. I also got the feeling that some of this is also duplicated on the master core factsheet
- Page 22-30 talks about memory addressing modes (direct, indirect, etc)...which anyone with some access to undergrad CS class on x86 assembly language would have knowledge of, and does not point out anything specific about the capabilities of the specific core used in the Vortex86 - like for example, is there an MTRR, does it support cmov or any of the P6 class stuff, why no L2 for the slave (even though its just an EX core - what, did they blow their silicon budget and have to pare it back) and why did they make one core PAE/NX and not the other one?

Anyways, so I think the point of confusion lays at this diagram where they separate out the I/O to Rich I/O Group 0 and group 1. Group 0 seems to be the low latency, low throughput stuff like HDA, I2C, SD and whatnot, while Group 1 looks like the higher latency, higher throughput stuff like the LCD framebuffer and the ISA bus.

Vortex86EX SoC.jpeg
Filename
Vortex86EX SoC.jpeg
File size
115.94 KiB
Views
270 views
File comment
SoC I/O Layout
File license
Fair use/fair dealing exception

I didn't see anything that suggest that one particular group has an affinity to a particular core. My reading from this EX2 block diagram suggest that they should both have equal access to the I/O crossbar.

Vortex86EX2 Shared IO .jpeg
Filename
Vortex86EX2 Shared IO .jpeg
File size
92.2 KiB
Views
270 views
File comment
SoC Shared I/O
File license
Fair use/fair dealing exception

Their own sales literature points to the ability for one core to take over the I/O if it fails, so it doesn't seem to make sense if one core is only allowed to have access to I/O and not the other.

To be honest, it's really hard to get a good read on the performance of the EX2's master core. If it's functionally like a Yonah (or a Diamondville N270 Atom - seems to have the same eflags), but only clocked to 600MHz with a 128kb unified L2 connected to DDR3-400...is that reasonably performant? And how would it compare to the grandaddy Bonnell Atom itself? My guess is that either it'll be somewhat close, or since the Atom runs at 1.6Ghz base, it's likely faster.

Last edited by ragefury32 on 2021-01-09, 07:23. Edited 1 time in total.

Reply 102 of 113, by LightStruk

User metadata
Rank Member
Rank
Member
ragefury32 wrote on 2021-01-07, 15:43:

Anyways, so I think the point of confusion lays at this diagram where they separate out the I/O to Rich I/O Group 0 and group 1. Group 0 seems to be the low latency, low throughput stuff like HDA, I2C, SD and whatnot, while Group 1 looks like the higher latency, higher throughput stuff like the LCD framebuffer and the ISA bus.

I didn't see anything that suggest that one particular group has an affinity to a particular core. My reading from this EX2 block diagram suggest that they should both have equal access to the I/O crossbar.

That diagram puts an asterisk next to ISA, LCD, and KB / MS, with the asterisk clearly labeled, "dedicated to the Master Core."

ragefury32 wrote on 2021-01-07, 15:43:

To be honest, it's really hard to get a good read on the performance of the EX2's master core. If it's functionally like a Yonah (or a Diamondville N270 Atom - seems to have the same eflags), but only with a 128kb unified L2 connected to DDR3-400...is that reasonably performant? And how would it compare to the grandaddy Bonnell Atom itself? My guess is that either it'll be somewhat close, or since the Atom runs at 1.6Ghz base, it's likely faster.

It would probably compare decently well clock-for-clock to the first Atom, which was a descendent of the Pentium microarchitecture. The P6 cores are still probably faster.

Ever notice that, after the P6 architecture came out, all of the x86 clone makers aside from AMD gave up on trying to compete on performance? The P6 is vastly more advanced than the P5, with internal micro-ops, out-of-order execution, superpipelining, superscalar ALU, strong branch prediction, and high clocks. Cyrix made a brief push to keep up, then went for the niche Internet appliance market, then got bought out. AMD couldn't execute on its ambitious K5, so they bought NexGen and valiantly chased Intel with the K6 until they poached enough Alpha engineers to make the K7. Rise and IDT were just crushed by the P6.

The Vortex86 cores all descend from the Rise mP6, which was pipelined, superscalar, and had a branch predictor, but it's still an in-order design, and it was slower than all of its contemporary competitors with more advanced architectures. Fortunately for me, I don't need P6-level performance to run DOS games!

Reply 103 of 113, by ragefury32

User metadata
Rank Member
Rank
Member
LightStruk wrote on 2021-01-07, 21:00:

That diagram puts an asterisk next to ISA, LCD, and KB / MS, with the asterisk clearly labeled, "dedicated to the Master Core."

Yeah, but the asterisk looks like it is meant for the 6 PCI pins - otherwise the slave core documentation would not include ISA access. Normal PCI is not on the top level diagram as “shared I/O” while PCIe and ISA is.
Like I've said - their documentation? Poorly written from a clearness and conciseness point of view.

LightStruk wrote on 2021-01-07, 21:00:

It would probably compare decently well clock-for-clock to the first Atom, which was a descendent of the Pentium microarchitecture. The P6 cores are still probably faster.

Ever notice that, after the P6 architecture came out, all of the x86 clone makers aside from AMD gave up on trying to compete on performance? The P6 is vastly more advanced than the P5, with internal micro-ops, out-of-order execution, superpipelining, superscalar ALU, strong branch prediction, and high clocks. Cyrix made a brief push to keep up, then went for the niche Internet appliance market, then got bought out. AMD couldn't execute on its ambitious K5, so they bought NexGen and valiantly chased Intel with the K6 until they poached enough Alpha engineers to make the K7. Rise and IDT were just crushed by the P6.

The Vortex86 cores all descend from the Rise mP6, which was pipelined, superscalar, and had a branch predictor, but it's still an in-order design, and it was slower than all of its contemporary competitors with more advanced architectures. Fortunately for me, I don't need P6-level performance to run DOS games!

Yeah, the Atoms never got out-of-order execution until the Bay Trail/Silvermont, which was, what, 2 major revisions behind Bonnell, and it was a great little chip. I remember that they matched fairly well against Arandale (Nehalem mobiles). It's too bad that their their system controllers weren't too great, to the point where Linux had broken drivers for it after, say, 2015-16 after Intel committed a bunch of code that made them unstable. Seems like a bit of a pattern for Chipzilla - they were also known for some half-ass drivers for the Coffee Lake SoCs. The P6 was basically the Pentium Pro (which is at its heart very RISC-like with ) but with 16 bit support backported in. Of course, I am not sure if that means that the P6 was much better, or that the P5 is just 486-and-a-half with that U/V pipe semi-superscalar ridiculousness. Via did do the smart thing, stood aside and chased the embedded market with the C3 (which is essentially the IDT Winchip design), which kept them well for a while. Then Atoms, the Pentium-M and its derivatives ate their lunch. After the Nano (Isaiah) came out they kinda dropped off the radar.

The EX based "small" core would be able to handle gaming (I pointed to some videos of it running Quake 2 using an Arduino86 Vortex86VGA Z9s board). But if your "big" core is running the Linux housekeeping, you would conceivably need more firepower to handle the bluetooth/wifi stuff.

Reply 104 of 113, by LightStruk

User metadata
Rank Member
Rank
Member

Hey look, a Vortex86DX3 PC/104 board on an ISA backplane!

IMG_5551.jpg
Filename
IMG_5551.jpg
File size
749.53 KiB
Views
207 views
File license
Public domain

I don't have the CMI8330 card yet, so let's do some benchmarking and video compatibility testing first. First up - don't hate me for my 16:10 LCD monitor; it's the only one I have on hand with a VGA input. HWInfo says that it has VBE 3.0, which is nice...

IMG_5521.jpg
Filename
IMG_5521.jpg
File size
657.13 KiB
Views
207 views
File license
Public domain

Let's see how it performs on PC Player Benchmark. 320x200x8bpp looks good, scoring 117.1. Not bad, that's better than my Geode LX 800 (500 MHz), although that runs at literally half the clock speed. Now to try 640x480x8bpp with a linear frame buffer:

IMG_5522.jpg
Filename
IMG_5522.jpg
File size
638.78 KiB
Views
207 views
File license
Public domain

Uh oh. That's not good. I like that it scores higher at 640x480 than at 320x200 (145.5!), but that's some gnarly corruption. Maybe it's just that benchmark. Let's try a game where people care about 640x480, like Duke Nukem 3D:

IMG_5525.jpg
Filename
IMG_5525.jpg
File size
675.21 KiB
Views
207 views
File license
Public domain

Awesome, no problems, and it's buttery smooth! Waaaaait a second, why is only half of the pistol drawn?

There are clearly some issues with the high resolution VESA modes. I did try PC Player Benchmark at 16bpp and 32bpp, and all of the Linear Framebuffer modes had screen corruption of some kind except for 1024x768x32bpp. If I forced it not to use a linear mode, then the screen corruption disappeared, but performance dropped considerably - 640x480x8bpp /NOLINEAR scored only 46.0.

I will reach out to DMP, but I'm not holding my breath that they will investigate, much less fix the issue. Has anyone here seen similar issues with LFB VESA modes and know of a fix? I have not tried UNIVBE yet, but it probably doesn't support this integrated video hardware anyway.

Reply 105 of 113, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Those are both normal issues with those apps at high frame rates, which they never reached on typical hardware of their time. The same thing even occurs with DOSBox on current machines if you don't limit the cycles.

Reply 106 of 113, by LightStruk

User metadata
Rank Member
Rank
Member
jmarsh wrote on 2021-01-19, 14:56:

Those are both normal issues with those apps at high frame rates, which they never reached on typical hardware of their time. The same thing even occurs with DOSBox on current machines if you don't limit the cycles.

I will give NOLFBLIM a try later to see if it helps. According to my notes, my Geode LX @ 500 MHz scores 41.3 on this benchmark, while my Via Nehemiah @ 800 MHz scores 69.5. Perhaps those are both low and slow enough that the corruption and tearing don't occur. The VGA core in the Vortex86DX3 runs on PCI-E if I'm not mistaken, which would make it much faster than the PCI or AGP video cores integrated into the other cores, even if the CPU itself isn't vastly faster.

Reply 107 of 113, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie

Look at the numbers shown for "typical" machines when PCPBench exits - 20fps is around the top. Certainly any speed higher than the refresh rate (60/70Hz depending on vbios) is guaranteed to introduce artifacts.

You're going to have to significantly slow things down if you want this machine to run DOS software properly.

Reply 108 of 113, by LightStruk

User metadata
Rank Member
Rank
Member

Remember that CMI8330 I ordered from a sketchy .top website? Yep, either a scam or a highly dysfunctional operation. The PayPal receipt had a China Post tracking number of a package that had already been shipped and delivered before the order date. Thankfully, PayPal agreed and quickly refunded my money.

So, I've now ordered a CMI8330 from an eBay seller in the Czech Republic! This at least looks more legit. The seller's been on eBay for over 7 years with no negative reviews. CMI8330 cards seem really rare here in the US...

FWIW, if a CMI8330 ends up being the chip I use, I'm not worried about supply. It should be much easier to get a batch of bare chips from a distributor via digipart, where there are tens of thousands of units listed, than it is to get the vintage card.

Reply 109 of 113, by LightStruk

User metadata
Rank Member
Rank
Member

I had to order it from an eBay seller in Europe, but here's the CMI8330!

IMG_5618.jpg
Filename
IMG_5618.jpg
File size
395.19 KiB
Views
107 views
File license
CC-BY-4.0

For a quick test, I tried 16-bit/22 KHz audio + FM with Duke Nukem 3D, and other than a little bit of noise on the analog out, it seems to work well. I haven't tried the S/PDIF out yet; I need to source an adapter to convert it to optical. If only I hadn't thrown away my SB Live! + LiveDrive combo years ago...

Once I do have that adapter, the next step is to try and settle the OPL3 accuracy question once and for all.

What music should I test? I'm thinking some Adlib Tracker II songs, since those are more likely to exercise the OPL3 to its full potential, and a few full OPL3 games like Warcraft 2, Tetris Classic, and SimCity 2000. Anything else I should consider?

What should I compare it to? It's important to not introduce any analog stages for this comparison, so classic ISA sound cards with discrete OPL3 chips are not an option. I have a genuine YMF744 card, although the OPL3 block in that chip is based on the YMF289 instead of the YMF262, and the pitch is slightly lower with the 289. What if I compared the CMI8330 to Nuked OPL3? Is there any evidence that it's not 100% accurate?

Reply 111 of 113, by LightStruk

User metadata
Rank Member
Rank
Member
chrismeyer6 wrote on 2021-04-07, 19:32:

This has been a very interesting project to follow. Just curious to see if you have made any progress on it recently?

Yes, I have made some progress! I have finished the first draft of a PCB layout for an important feature of the console. This render shows this mystery feature on a 16-bit ISA card (for development and testing), but once I have the bugs worked out, the same components will go on the console PCB.

mystery-isa-card.jpg
Filename
mystery-isa-card.jpg
File size
154.83 KiB
Views
40 views
File license
CC-BY-4.0

Also, I haven't ruled out producing this card as-is for others to enjoy... assuming I can get it to work, and that others find it interesting.