VOGONS


Collection rationale

Topic actions

Reply 120 of 162, by appiah4

User metadata
Rank l33t++
Rank
l33t++
doublebuffer wrote on 2023-08-03, 10:05:
rasz_pl wrote on 2023-08-03, 09:50:

Thats not how this works. You arent recreating Pentium architecture, but building something to run its instruction set sufficiently fast.

Right, but that would mean the original pentium wasn't optimized for the die area/port count which I find hard to believe. The fastest implementation for pentium architecture is pentium, everything else is imitation (the dreaded word "emulation"). If we accept imitation, is there any reason to not to implement it on software instead of the much more cumbersome fpga? Parallelism and better power efficiency come to mind, but as I see it if we're ready to sacrifice accuracy then software would do just as well.

This is a wrong assumption. AMD K5 is more efficient than Pentium per clock cycle by a LARGE amount.

Retronautics: A digital gallery of my retro computers, hardware and projects.

Reply 121 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
vstrakh wrote on 2023-08-03, 10:18:

But you didn't you say that "pentium on de10 is impossible because of 110000 Y vs 3000000 Z".

It was a ballpark figure to gauge the practicality to fit pentium core on DE10.

vstrakh wrote on 2023-08-03, 10:18:

Do you have any practical experience on this?

A few courses in university, by no means I'm no expert, my bread and butter is software.

vstrakh wrote on 2023-08-03, 10:18:

A single LE is not a boolean logic primitive, it implements an arbitrary function of up to 6 inputs. A single LE literally stores up to 64 outputs for all possible combinations of its inputs.

That's why I originally wrote the weasel clause about the impracticality of directly comparing LEs to plain transistor count. Now it has been, gosh, like 15 years since I actually programmed for VHDL, but if my memory serves me correctly, if you program a single OR operation, it would waste the entire LE, despite it only using 2 inputs, due to the hardwired nature of fpga. This is why I said 1 LE approximately (at worst case) can fit 3 transistors (a basic boolean logic - not counting latches and such which obviously would need more transistors but still fitting to single LE). Correct me if I remember wrong.

Reply 122 of 162, by rasz_pl

User metadata
Rank l33t
Rank
l33t
doublebuffer wrote on 2023-08-03, 10:05:

Right, but that would mean the original pentium wasn't optimized for the die area/port count which I find hard to believe.

it was optimized as good as it was possible in ~1989-91

doublebuffer wrote on 2023-08-03, 10:05:

The fastest implementation for pentium architecture is pentium, everything else is imitation (the dreaded word "emulation").

Only if by architecture you mean exact floorplan and execution pipeline, everything else is a better implementation of Pentium instruction set. Currently fastest being i9 something, although nowadays it might be AMD Ryzen 9 7950X3D.

doublebuffer wrote on 2023-08-03, 10:05:

If we accept imitation

imitation of what? and as compared to what? Is running dos on K5 an imitation of the real thing? K5 is a 29K derived RISC inside.

doublebuffer wrote on 2023-08-03, 10:05:

, is there any reason to not to implement it on software instead of the much more cumbersome fpga?

I dont understand what you mean. We already did multiple times, thats what DOSBox and many others do.

doublebuffer wrote on 2023-08-03, 10:05:

Parallelism and better power efficiency come to mind, but as I see it if we're ready to sacrifice accuracy then software would do just as well.

There is neither any loss of accuracy in properly implemented software emulated cpu core nor in FPGA one. There is however huge loss of performance in directly converting software implementation into verilog, and thats what afaik ao486 is.

vstrakh wrote on 2023-08-03, 10:20:

The latency is the answer. FPGA delivers well defined signals, ideally equal to the real hw, though internally the mechanics might differ.
A software not only can't do low latency, it can't even maintain that latency consistent, it's jumping all over the place, it's noticeable and very irritating.

Latency would only matter if you interfaced software emulated part with real hardware, and even then its not a problem with sufficiently fast emulation, as evidenced by MCL65+ and MCL86+.

doublebuffer wrote on 2023-08-03, 10:33:
vstrakh wrote on 2023-08-03, 10:18:

A single LE is not a boolean logic primitive, it implements an arbitrary function of up to 6 inputs. A single LE literally stores up to 64 outputs for all possible combinations of its inputs.

That's why I originally wrote the weasel clause about the impracticality of directly comparing LEs to plain transistor count. Now it has been, gosh, like 15 years since I actually programmed for VHDL, but if my memory serves me correctly, if you program a single OR operation, it would waste the entire LE, despite it only using 2 inputs, due to the hardwired nature of fpga. This is why I said 1 LE approximately (at worst case) can fit 3 transistors (a basic boolean logic - not counting latches and such which obviously would need more transistors but still fitting to single LE). Correct me if I remember wrong.

irrelevant, we are not interested in recreating exact Pentium floor plan, we dont need a http://www.visual6502.org/ or http://visual6502.org/sim/varm/armgl.html

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 123 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 10:50:

it was optimized as good as it was possible in ~1989-91

Yup, that's what I thought as well.

rasz_pl wrote on 2023-08-03, 10:50:

I dont understand what you mean. We already did multiple times, thats what DOSBox and many others do.

Yes, if we have dosbox running on PC, why do we need an implementation on much slower FPGA?

rasz_pl wrote on 2023-08-03, 10:50:

There is neither any loss of accuracy in properly implemented software emulated cpu core nor in FPGA one. There is however huge loss of performance in directly converting software implementation into verilog, and thats what afaik ao486 is.

Dosbox and especially ao486 aren't accurate. If you program software only against them you cannot trust it runs on real hardware. I would be all in for an accurate implementation of x86 instruction set, either on software or hardware, but neither approach seem to bring correct results.

rasz_pl wrote on 2023-08-03, 10:50:

irrelevant, we are not interested in recreating exact Pentium floor plan, we dont need a http://www.visual6502.org/ or http://visual6502.org/sim/varm/armgl.html

Well yes, it seems we have different goals in mind.

Reply 124 of 162, by rasz_pl

User metadata
Rank l33t
Rank
l33t
doublebuffer wrote on 2023-08-03, 11:00:

Dosbox and especially ao486 aren't accurate. If you program software only against them you cannot trust it runs on real hardware. I would be all in for an accurate implementation of x86 instruction set, either on software or hardware, but neither approach seem to bring correct results.

can you provide one example of code that fails to run in dosbox/ao486 because of CPU fault but runs on real hardware?

doublebuffer wrote on 2023-08-03, 11:00:

Well yes, it seems we have different goals in mind.

what would that goal be?

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 125 of 162, by vstrakh

User metadata
Rank Member
Rank
Member
doublebuffer wrote on 2023-08-03, 10:33:

It was a ballpark figure to gauge the practicality to fit pentium core on DE10.

But you had to have some picture in your mind to justify this comparison, some hard numbers on how transistors count maps to logical elements?

Let's look at it from the different angle. Say, can you tell any numbers about Yamaha's YM2149 sound generator? How many transistors there could be?
Can you imagine such chip being implemented in ~90*3 transistors (your "three transistors per boolean primitive")?
Because right now I'm working on the compact YM2149 implementation for 6-lut architecture, and in those 90 LE's I have 2 (two) such generators.

Reply 126 of 162, by vstrakh

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 10:50:

Latency would only matter if you interfaced software emulated part with real hardware, and even then its not a problem with sufficiently fast emulation, as evidenced by MCL65+ and MCL86+.

Nah, just playing old games of 286-era, dosbox slows down when moving the mouse over the screen, the PIT changing tone when something happens outside of dosbox, etc. Rising cycles makes it more tolerable, but it's still doesn't feel right.

Reply 127 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 11:23:

can you provide one example of code that fails to run in dosbox/ao486 because of CPU fault but runs on real hardware?

When I did initial research on this topic, I read a lot about ao486 because at first glance it seemed to fit perfectly for my needs (retro programming in dos), but as I dug deeper people were complaining it was basically useless for anything else than running old software, I'll see if I can find the discussion regarding to this.

rasz_pl wrote on 2023-08-03, 11:23:
doublebuffer wrote on 2023-08-03, 11:00:

Well yes, it seems we have different goals in mind.

what would that goal be?

Accurate emulation of real hardware beyond gaming.

Reply 128 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
vstrakh wrote on 2023-08-03, 11:47:

But you had to have some picture in your mind to justify this comparison, some hard numbers on how transistors count maps to logical elements?

Yes, as I previously wrote, one logic block corresponds to one logical operation on real hardware. If you need an OR gate, it will take one block, if you want a latch, it will take one block, etc. In the example I assumed that a boolean gate would be equal to 3 transistors (I think AND gate takes 3 transistors, again from my memory, might be more or less but the ballpark is right).

Reply 129 of 162, by vstrakh

User metadata
Rank Member
Rank
Member
doublebuffer wrote on 2023-08-03, 12:11:

assumed that a boolean gate would be equal to 3 transistors (I think AND gate takes 3 transistors, again from my memory, might be more or less but the ballpark is right).

It's not about how many transistors are needed per boolean gate.
LE's are not boolean gates, and your "ballpark estimation" has no basis in reality. You can't compare anything with your current understanding of the tech.

Reply 130 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 11:23:

can you provide one example of code that fails to run in dosbox/ao486 because of CPU fault but runs on real hardware?

https://misterfpga.org/viewtopic.php?t=2833

Reply 131 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
vstrakh wrote on 2023-08-03, 12:19:

your "ballpark estimation" has no basis in reality.

It has. Say on the original pentium there was an OR gate (3 transistors). So you write an OR gate on the pentium VHDL. When you compile that OR gate into fpga configuration, it will take 1 logical block. Hence 3 transistors is equivalent to 1 logical block in this case. But if the original pentium has, say an D Latch, it will take something like 30 transistors (again a ballpark), but gets configured as a single logical block, hence 30 transistors equivalent to 1 logical block. So obviously it's not 1:1 correspondence, I never claimed it to be, just to give some figure how it's impossible to fit the entire pentium layout to the given fpga without taking any shortcuts.

EDIT: I'm drunk again

Reply 132 of 162, by rasz_pl

User metadata
Rank l33t
Rank
l33t
vstrakh wrote on 2023-08-03, 11:50:
rasz_pl wrote on 2023-08-03, 10:50:

Latency would only matter if you interfaced software emulated part with real hardware, and even then its not a problem with sufficiently fast emulation, as evidenced by MCL65+ and MCL86+.

Nah, just playing old games of 286-era, dosbox slows down when moving the mouse over the screen, the PIT changing tone when something happens outside of dosbox, etc. Rising cycles makes it more tolerable, but it's still doesn't feel right.

Yah. Dosbox never promised to be cycle accurate. Set cycles=max and you wont be having any latency problems 😀
as for cycle accurate emu MCL86+ running 8088 MPH https://www.youtube.com/watch?v=AAgtQljp0Tc

vstrakh wrote on 2023-08-03, 11:47:

Because right now I'm working on the compact YM2149 implementation

Spectrum/Amstrad/Atari ST sound is a crime against humanity, Im sure Geneva Convention mentions it.

doublebuffer wrote on 2023-08-03, 12:06:
rasz_pl wrote on 2023-08-03, 11:23:

can you provide one example of code that fails to run in dosbox/ao486 because of CPU fault but runs on real hardware?

When I did initial research on this topic, I read a lot about ao486 because at first glance it seemed to fit perfectly for my needs (retro programming in dos), but as I dug deeper people were complaining it was basically useless for anything else than running old software, I'll see if I can find the discussion regarding to this.

Its speed is "useless" when someone expects to run Quake, NFS or Diablo. It runs Windows 98 SE and NT 4.0 SP6 no problem so its pretty compatible.

doublebuffer wrote on 2023-08-03, 12:25:
vstrakh wrote on 2023-08-03, 12:19:

It's not about how many transistors are needed per boolean gate.
LE's are not boolean gates, and your "ballpark estimation" has no basis in reality. You can't compare anything with your current understanding of the tech.

It is. Say on the original pentium there was an OR gate (3 transistors). So you write an OR gate on the pentium VHDL. When you compile that OR gate into fpga configuration, it will take 1 logical block. Hence 3 transistors is equivalent to 1 logical block in this case. But if the original pentium has, say an D Latch, it will take something like 30 transistors (again a ballpark), but gets configured as a single logical block, hence 30 transistors equivalent to 1 logical block. So obviously it's not 1:1 correspondence, I never claimed it to be, just to give some figure how it's impossible to fit the entire pentium layout to the given fpga without taking any shortcuts.

- we dont have Pentium diagram
- Intel didnt design Pentium at the schematic level
- Pentium just like 486 was fully synthesized and no one at Intel was putting individual transistors/gates anywhere https://www.researchgate.net/publication/2680 … -_A_CAD_History

486 design

486 design:
A fully automated translation from RTL to layout (we called it RLS: RTL to Layout Synthesis)
No manual schematic design (direct synthesis of gate-level netlists from RTL, without graphical schematics of the circuits)
Multi-level logic synthesis for the control functions
Automated gate sizing and optimization
Inclusion of parasitic elements estimation
Full chip layout and floor planning tools

- last cpu designed at a transistor/gate level was somewhere in 1985
- even if you did, it would then go thru https://en.wikipedia.org/wiki/Logic_optimization https://www.youtube.com/watch?v=lJ3q9RHIatU

doublebuffer wrote on 2023-08-03, 12:20:
rasz_pl wrote on 2023-08-03, 11:23:

can you provide one example of code that fails to run in dosbox/ao486 because of CPU fault but runs on real hardware?

https://misterfpga.org/viewtopic.php?t=2833

"I got down to a simple “Hello World” program compiled with DJGPP only. Just stdio and no explicitly linked libraries. Freaking Hello World didn’t even run without hangcrashing."
yeah, something tells me fpga CPU core is not the problem here 😀 later someone mentions memory extenders, that would fall into purview of whole Mister PC SoC.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 133 of 162, by doublebuffer

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 12:49:

Pentium just like 486 was fully synthesized

Doesn't matter, the end result is still x transistors implemented on silicon, whereas FPGA has to emulate them. If you would write pentium VHDL it would require approximately the same amount of logic, and by extension, logical blocks. I still stand behind my ballpark figure, and understanding of the technology.

rasz_pl wrote on 2023-08-03, 12:49:

yeah, something tells me fpga CPU core is not the problem here 😀 later someone mentions memory extenders, that would fall into purview of whole Mister PC SoC.

There were many people complaining about the same thing. If you own a mister could you provide DJGPP compiled hello world as a proof people on that thread were wrong? If not, how can you be so sure? Basically we have their words (who have actually tried to program on ao486) against yours (who seems to know everything better than everyone else from the get-go).

Reply 134 of 162, by vstrakh

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-08-03, 12:49:

Spectrum/Amstrad/Atari ST sound is a crime against humanity, Im sure Geneva Convention mentions it.

It's a heritage. And anyway, it's more of a challenge on how ridiculously small I can pack it 😀

Reply 135 of 162, by appiah4

User metadata
Rank l33t++
Rank
l33t++
vstrakh wrote on 2023-08-03, 13:20:
rasz_pl wrote on 2023-08-03, 12:49:

Spectrum/Amstrad/Atari ST sound is a crime against humanity, Im sure Geneva Convention mentions it.

It's a heritage. And anyway, it's more of a challenge on how ridiculously small I can pack it 😀

In Atari ST's defense (I can't believe I am doing this as an Amiga guy..) it has amazing MIDI capabilities built in for its time, so that is one hell of a saving grace.

That said, the onboard sound is pretty bad considering how good the Atari 800XL sound is for its time..

Retronautics: A digital gallery of my retro computers, hardware and projects.

Reply 136 of 162, by HanSolo

User metadata
Rank Member
Rank
Member
vstrakh wrote on 2023-08-03, 12:19:
doublebuffer wrote on 2023-08-03, 12:11:

assumed that a boolean gate would be equal to 3 transistors (I think AND gate takes 3 transistors, again from my memory, might be more or less but the ballpark is right).

It's not about how many transistors are needed per boolean gate.
LE's are not boolean gates, and your "ballpark estimation" has no basis in reality. You can't compare anything with your current understanding of the tech.

The initial statement was:

HanSolo wrote on 2023-08-02, 15:40:
appiah4 wrote on 2023-08-02, 11:03:

MISTer can emulate a PSX, so a low end Pentium with a Voodoo card should be doable IMO. It's not presently done, though..

I think it's limited by the number of Logic Elements. A Pentium with Voodoo is probably too complex to be recreated on the FPGA chip of the DE10 Nano.

I think we all can agree that there is some limit to the complexity that can be recreated with a given number of LE. I doubt that anybody here can say for sure where that limit is, but there is one.
Maybe a Pentium+Voodoo is below that limit, but such a system is not comparable to a PSX. The transistor count is not a measure for complexity but it is one indicator for it (unless the guys at Intel don't know anything about their job 😀 ).

Reply 137 of 162, by rasz_pl

User metadata
Rank l33t
Rank
l33t
doublebuffer wrote on 2023-08-03, 13:01:
rasz_pl wrote on 2023-08-03, 12:49:

Pentium just like 486 was fully synthesized

Doesn't matter, the end result is still x transistors implemented on silicon, whereas FPGA has to emulate them. If you would write pentium VHDL it would require approximately the same amount of logic, and by extension, logical blocks. I still stand behind my ballpark figure, and understanding of the technology.

Lack of understanding 🙁. Not a single core (feel free to correct me) running on Mister is implemented this way. PSX works just fine while GPU 1Mil transistors 130mm2 at 600nm https://www.techpowerup.com/gpu-specs/sony-gte.g977 + CPU/GTE/mdec 50mm2 at 500nm another 500-800K. According to your counting method that would never fit 😀
N64 CPU is 4.6mil transistors + 2.5mil RCP yet here we are https://www.youtube.com/watch?v=C2EQp9DaZvc

doublebuffer wrote on 2023-08-03, 13:01:
rasz_pl wrote on 2023-08-03, 12:49:

yeah, something tells me fpga CPU core is not the problem here 😀 later someone mentions memory extenders, that would fall into purview of whole Mister PC SoC.

There were many people complaining about the same thing. If you own a mister could you provide DJGPP compiled hello world as a proof people on that thread were wrong? If not, how can you be so sure? Basically we have their words (who have actually tried to program on ao486) against yours (who seems to know everything better than everyone else from the get-go).

same thing = "the ones that lock up are all bundled with CWSDPMI.EXE" https://misterfpga.org/viewtopic.php?t=2136
CWSDPMI.EXE hanging is not because of CPU, but most likely memory/chipset architecture implemented by Mister. ao486 is just a small part of
https://github.com/MiSTer-devel/ao486_MiSTer. Motherboard/vga/sound is implemented here https://github.com/MiSTer-devel/ao486_MiSTer/ … /master/rtl/soc

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 138 of 162, by BitWrangler

User metadata
Rank l33t++
Rank
l33t++
HanSolo wrote on 2023-08-03, 13:40:

The transistor count is not a measure for complexity but it is one indicator for it (unless the guys at Intel don't know anything about their job 😀 ).

In particular, the transistor count isn't the same number of transistors you'd use to build that piece of the circuit in discrete components on a breadboard. Due to the economics and physics of wafer production and coating and layering, the cost of a particular component to a circuit might be another layer deposited on the wafer, which has a cost in production time, and has a cost in failure rate, each process adds like 1% more likelihood of screw up or number of bad parts when they're cut up. So, in a lot of situations the solution is "MoAr transistors" on the layers they've already decided on, to work around the need for more layers. So basically, that 5 transistors over there might actually be a capacitor. However, then there are some gates that can be just single transistors very simply, and in that design work because they don't need much drive or to drive much themselves. Whereas a general purpose logic IC for that gate has to be designed more robustly, will demand a certain drive current and will supply a certain drive current. Thus then the single logic element of a programmable logic device, be it PAL/GAL or FPGA might either replace hundreds of redundant transistors, used because it was the best way to do it on "IC rules" for that process, time and budget, or replace only a mere handful of transistors, because that part of the circuit didn't have huge fanouts and loads.

So in practice, it's a bit like there's 3 different ways to make a machine, out of lego, out of erector set/meccano , and 3D print it, and despite aiming to do the same task and use the same principles, no version is going to match the other in exact details of operation and scale.

Unicorn herding operations are proceeding, but all the totes of hens teeth and barrels of rocking horse poop give them plenty of hiding spots.

Reply 139 of 162, by rasz_pl

User metadata
Rank l33t
Rank
l33t
rasz_pl wrote on 2023-08-03, 14:15:
doublebuffer wrote on 2023-08-03, 13:01:

If you would write pentium VHDL it would require approximately the same amount of logic

Not a single core (feel free to correct me) running on Mister is implemented this way.

Ill correct myself. It appears this is the first one, crazy persons quest to run whole megadrive by reverse engineering die shots into net lists and compiling that into vhdl https://github.com/nukeykt/Nuked-MD-FPGA But even that doesnt work the way you imagined. Its not implementing individual transistors in logic elements, there is reverse engineering netlist into logic blocks https://github.com/emu-russia/SEGAChips/tree/main/FC1004 and compiling that using FPGA tools. MegaDrive is >100K transistors between VDP/68K/Z80/Yamaha and support.

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction