VOGONS


First post, by Rikintosh

User metadata
Rank Member
Rank
Member

Sorry if I'm sounding stupid, I was just wondering, with the advanced compilers that we have today, the advanced optimizations that we've developed over the last 30 years, if maybe recompiling some win9x files could squeeze out a little more juice?

I know that 9x is not open source, but there are some things that are, like files used by Wine and ReactOS.

Maybe make use of some technologies that appeared after the development of 9x, like MMX, SSE, 3dnow, VESA, openGL acceleration...

Take a look at my blog: http://rikintosh.blogspot.com
My Youtube channel: https://www.youtube.com/channel/UCfRUbxkBmEihBEkIK32Hilg

Reply 1 of 29, by leileilol

User metadata
Rank l33t++
Rank
l33t++

no. There's already tight assembly everywhere that gcc gods to pray won't -flto over. The major speed bottleneck left's the IE4+ cruft

Rikintosh wrote on 2023-06-08, 00:18:

Maybe make use of some technologies that appeared after the development of 9x, like MMX, SSE, 3dnow, VESA, openGL acceleration...

All of those technologies came during (and some before) Win9x's lifetime 😖 and using them won't make things faster. It's an operating system, not a duct-taped compositor.

apsosig.png
long live PCem

Reply 2 of 29, by Disruptor

User metadata
Rank Oldbie
Rank
Oldbie

Yes, it would make it faster.
Win95 is compiled for 80386.
Imagine it would be optimized for 686 with CMOV, introduced in Pentium Pro and Pentium II.

Reply 3 of 29, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Disruptor wrote on 2023-06-08, 03:47:

Yes, it would make it faster.
Win95 is compiled for 80386.
Imagine it would be optimized for 686 with CMOV, introduced in Pentium Pro and Pentium II.

Win9x was a 16/32-Bit hybrid, though. Lots of thunking going on and calling of DOS/BIOS routines (through V86, but still)..

Some code parts/DLLs (DOS or Win16 based) may still use 16-Bit registers only (AX rather than EAX etc).
Not sure how this works out with CMOV and other modern 386+ instructions. 🤷‍♂️

Visual Studio 6 compilers already had an Pentium Pro option, I remember, to optimize code for its pipeline.
The produced code was still 386/486 compatible, just slower on old CPUs.

Personally, I wonder if Win95 could be optimized to be better aware of cache memory.
Or something along these lines.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 4 of 29, by Falcosoft

User metadata
Rank Oldbie
Rank
Oldbie
Jo22 wrote on 2023-06-08, 06:50:
... Some code parts/DLLs (DOS or Win16 based) may still use 16-Bit registers only (AX rather than EAX etc). Not sure how this wo […]
Show full quote

...
Some code parts/DLLs (DOS or Win16 based) may still use 16-Bit registers only (AX rather than EAX etc).
Not sure how this works out with CMOV and other modern 386+ instructions. 🤷‍♂️
...

CMOV works perfectly in 16-bit code with 16-bit registers. But you can also use 32-bit registers in 16-bit code segments with the help of the 0x66 prefix (this works not only with CMOV).
Actually you can use 3dnow and even SSE/SSE2 instructions in 16-bit code. In case of Win95 and SSE the problem is the OS itself is not aware of XMM registers so it cannot not save the register state.

Website, Facebook, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper

Reply 5 of 29, by Scali

User metadata
Rank l33t
Rank
l33t
Rikintosh wrote on 2023-06-08, 00:18:

Maybe make use of some technologies that appeared after the development of 9x, like MMX, SSE, 3dnow, VESA, openGL acceleration...

I think you are conflating the OS itself with third-party drivers.
OpenGL drivers for Win9x do exist, and some of them include MMX and 3DNow!-optimizations (not sure about SSE-support under Win95, since the base OS does not support it, but it could be used anyway by drivers).
For example, here's a press release from NVIDIA from 1999, announcing their 3DNow!-optimized drivers for OpenGL and DirectX:
http://web.archive.org/web/19990830194911/htt … a_nvidia_1.html

Which also means that the base OS isn't all that relevant. There's not too much time spent in the regular OS-code. All the performance-heavy stuff is in things like networking, disk IO, graphics and such, for which you use third-party drivers, which can be optimized for any hardware you like.
So I don't think recompiling the code will make much of a dent in performance.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 6 of 29, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Falcosoft wrote on 2023-06-08, 07:55:
Jo22 wrote on 2023-06-08, 06:50:
... Some code parts/DLLs (DOS or Win16 based) may still use 16-Bit registers only (AX rather than EAX etc). Not sure how this wo […]
Show full quote

...
Some code parts/DLLs (DOS or Win16 based) may still use 16-Bit registers only (AX rather than EAX etc).
Not sure how this works out with CMOV and other modern 386+ instructions. 🤷‍♂️
...

CMOV works perfectly in 16-bit code with 16-bit registers. But you can also use 32-bit registers in 16-bit code segments with the help of the 0x66 prefix (this works not only with CMOV).
Actually you can use 3dnow and even SSE/SSE2 instructions in 16-bit code. In case of Win95 and SSE the problem is the OS itself is not aware of XMM registers so it cannot not save the register state.

Makes sense, thanks for pointing that out. What I was worrying about was the thunking mechanism that does converting between Win16/Win32 API calls and different register sizes.
It's an extra step in between, kind of.
I was worrying that it may cause an overhead that might hamper with the performance improvement that CMOV or other instructions could possibly provide.
Ie, a direct copy operation between registers.
If it it was all 32-Bit (ie, i386) and no thunking was needed, such an operation would be quick.
Not sure how that thunking mechanism is implemented in detail, though.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 7 of 29, by mkarcher

User metadata
Rank l33t
Rank
l33t
Jo22 wrote on 2023-06-08, 09:44:
Makes sense, thanks for pointing that out. What I was worrying about was the thunking mechanism that does converting between Win […]
Show full quote
Falcosoft wrote on 2023-06-08, 07:55:

CMOV works perfectly in 16-bit code with 16-bit registers. But you can also use 32-bit registers in 16-bit code segments with the help of the 0x66 prefix (this works not only with CMOV).
Actually you can use 3dnow and even SSE/SSE2 instructions in 16-bit code. In case of Win95 and SSE the problem is the OS itself is not aware of XMM registers so it cannot not save the register state.

Makes sense, thanks for pointing that out. What I was worrying about was the thunking mechanism that does converting between Win16/Win32 API calls and different register sizes.
It's an extra step in between, kind of.
I was worrying that it may cause an overhead that might hamper with the performance improvement that CMOV or other instructions could possibly provide.
Ie, a direct copy operation between registers.
If it it was all 32-Bit (ie, i386) and no thunking was needed, such an operation would be quick.
Not sure how that thunking mechanism is implemented in detail, though.

While thunking between 16 bit and 32 bit code will induce overhead (i.e. those calls / returns are inter-segment far jumps the incur a lot of overhead in protected mode, because the code segment descriptor cache needs to be updated from the descriptor table), this kind of thunking does not affect what kind of instructions and registers may be used between calls. Also, if a register (like ESI or EDI) is specified to be "preserved over a call", plain 16-bit code will just save and restore the low 16 bits, but at the same time, it will also modify only the low 16 bits, so in return, all 32 bits are unchanged. "Enhanced" 16 bit code (i.e. code that runs in a 16-bit segment, but uses 32-bit registers) will save and restore the complete ESI/EDI registers.

CMOV especially is not at all impacted by the fact that some parts of Windows run in 16-bit segments, and other parts run in 32-bit segments, because CMOV has no "global" effect. CMOV works "as if" it were a conditional jump instruction that might jump over a subsequent move instruction - but it does it without upsetting the instruction decoding pipeline, no matter whether the move is executed or not. All kind of conditional jumps and classic move instructions work perfectly in Windows 95, so CMOV would work perfectly too (assuming you have a processor that implements CMOV).

On the other hand, I'm not aware whether there are compilers that can generate 16-bit code and can be configured to use CMOV at the same time. Compilers that support CMOV and PPro-optimized instruction scheduling in 32-bit code are available and could be used to re-compile 32-bit C parts of Windows 95 system libraries.

Reply 8 of 29, by Rikintosh

User metadata
Rank Member
Rank
Member
Scali wrote on 2023-06-08, 08:41:
I think you are conflating the OS itself with third-party drivers. OpenGL drivers for Win9x do exist, and some of them include M […]
Show full quote
Rikintosh wrote on 2023-06-08, 00:18:

Maybe make use of some technologies that appeared after the development of 9x, like MMX, SSE, 3dnow, VESA, openGL acceleration...

I think you are conflating the OS itself with third-party drivers.
OpenGL drivers for Win9x do exist, and some of them include MMX and 3DNow!-optimizations (not sure about SSE-support under Win95, since the base OS does not support it, but it could be used anyway by drivers).
For example, here's a press release from NVIDIA from 1999, announcing their 3DNow!-optimized drivers for OpenGL and DirectX:
http://web.archive.org/web/19990830194911/htt … a_nvidia_1.html

Which also means that the base OS isn't all that relevant. There's not too much time spent in the regular OS-code. All the performance-heavy stuff is in things like networking, disk IO, graphics and such, for which you use third-party drivers, which can be optimized for any hardware you like.
So I don't think recompiling the code will make much of a dent in performance.

Yes, I know, but I wasn't referring to that. I meant that keeping these technologies in mind, it would be possible to recompile for example user32, gdi, and other graphics engine components, to get accelerated animations, like windows opening and closing in Chicago, quickly, using ogl calls or processing via software using mmx to draw animations on the screen.

Take a look at my blog: http://rikintosh.blogspot.com
My Youtube channel: https://www.youtube.com/channel/UCfRUbxkBmEihBEkIK32Hilg

Reply 9 of 29, by Scali

User metadata
Rank l33t
Rank
l33t

User32/GDI were always accelerated in hardware with the right drivers and hardware. That's why "Windows Accelerator" cards predated 3D accelerators.
Companies like S3 and Matrox built their empire on Windows acceleration.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 10 of 29, by Rikintosh

User metadata
Rank Member
Rank
Member
Scali wrote on 2023-06-08, 15:54:

User32/GDI were always accelerated in hardware with the right drivers and hardware. That's why "Windows Accelerator" cards predated 3D accelerators.
Companies like S3 and Matrox built their empire on Windows acceleration.

Yes, I know, but imagine a world where the Windows GUI has the same aero effects as later Windows, done using 3D acceleration, or advanced software processing.

I remember using a bunch of junk to spruce up my system between 2003 and 2006, and at some point I used a pretty cool 3D interface at the time.

Take a look at my blog: http://rikintosh.blogspot.com
My Youtube channel: https://www.youtube.com/channel/UCfRUbxkBmEihBEkIK32Hilg

Reply 11 of 29, by kolderman

User metadata
Rank l33t
Rank
l33t

Faster at what? Copying files around. Today Win9x is a shell that people like us use to launch games. The the performance of the OS has only minor bearing on the performance of a game, especially a 3D game, and to the extend Direct3D is part of "Win9X" the latest version of DX for Win98 was released in 2004 or something, so well and truly with P4 optimizations and the latest compiler tech of the time.

What would interest me? A Win9x that could easily install on a modern PC with driver support for everything.

Reply 12 of 29, by Rikintosh

User metadata
Rank Member
Rank
Member
kolderman wrote on 2023-06-08, 22:35:

Faster at what? Copying files around. Today Win9x is a shell that people like us use to launch games. The the performance of the OS has only minor bearing on the performance of a game, especially a 3D game, and to the extend Direct3D is part of "Win9X" the latest version of DX for Win98 was released in 2004 or something, so well and truly with P4 optimizations and the latest compiler tech of the time.

What would interest me? A Win9x that could easily install on a modern PC with driver support for everything.

Just for fun.

9x will never be compatible with a modern PC, not without an emulation layer, I think the most rational thing would be to create a linux distro that runs a hypervisor capable of emulating the main sound and video cards of the time. VirtualPC 2007 was the closest we got to that, before Microsoft decided to fuck it up. PowerPC mac guys have the same dilemma with OS9, and PPC OSX

Take a look at my blog: http://rikintosh.blogspot.com
My Youtube channel: https://www.youtube.com/channel/UCfRUbxkBmEihBEkIK32Hilg

Reply 13 of 29, by Big Pink

User metadata
Rank Member
Rank
Member
Rikintosh wrote on 2023-06-08, 21:51:

Yes, I know, but imagine a world where the Windows GUI has the same aero effects as later Windows, done using 3D acceleration, or advanced software processing.

Speeding up 9x so it can run aero bloat is giving with one hand only to take with the other. So right up Microsoft's alley.

I thought IBM was born with the world

Reply 14 of 29, by cyclone3d

User metadata
Rank l33t++
Rank
l33t++

Win9x is so incredibly fast on newer hardware that there really isn't a point anyway unless you want to run it on "period correct" hardware.

That being said, I would love it if there was a Win9x kernel that supported SMP. That would be really sweet.

Yamaha modified setupds and drivers
Yamaha XG repository
YMF7x4 Guide
Aopen AW744L II SB-LINK

Reply 15 of 29, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Rikintosh wrote on 2023-06-08, 21:51:
Scali wrote on 2023-06-08, 15:54:

User32/GDI were always accelerated in hardware with the right drivers and hardware. That's why "Windows Accelerator" cards predated 3D accelerators.
Companies like S3 and Matrox built their empire on Windows acceleration.

Yes, I know, but imagine a world where the Windows GUI has the same aero effects as later Windows, done using 3D acceleration, or advanced software processing.

I remember using a bunch of junk to spruce up my system between 2003 and 2006, and at some point I used a pretty cool 3D interface at the time.

There was a time in which the Windows GUI was being offloaded into a graphics chip.
But it was with Windows 3.10, not Windows 9x.

Windows 3.10 still had a TIGA interface driver, which worked in tandem with the real TIGA driver supplied with TIGA boards.

The graphics drivers for the TIGA compatible boards did essentially break the Windows GUI into separate parts and stored it on the RAM on the TIGA board.

If I understand correctly, the software uploaded to the graphics board then would draw the complete Windows GUI on its own.
This wasn't essentially fast, but it reduced overhead to the PC.

The difference was that the TMS chips were fully programmable graphics processors rather than just being a one trick pony.
We may think of them as the Roland LAPC-I, CM32L and MT-32 of the graphics card world.
They were both intelligent and flexible.

For performance, a fast&dumb framebuffer as found on a VLB graphics card was cheaper and more effective back in the day.
Fixed-function Windows accelerators, too.

Edit: The Matrox MGA cards also had drivers which stored certain GUI elements in the graphics memory, if memory serves.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 17 of 29, by serialShinobi

User metadata
Rank Newbie
Rank
Newbie

Well...what about the 80386 field programmable gate array? An FPGA with an 80386 core that needs access to 386/486 chipset ASICs. Add to the ASICs opensource BIOS to serve as software drivers. A well balanced FPGA and 80386 IP could offer a speed up at the gate level. But you basically need to get people on this website interested in order to make it happen.

A Virtual Machine that has its virtual hardware closely tied to the target OS (win3.11) with Component Object Model for communication channels like IRQ and DMA could be key.

https://blog.lse.epita.fr/2015/11/16/lsepc-intro.html

Reply 18 of 29, by Rikintosh

User metadata
Rank Member
Rank
Member
serialShinobi wrote on 2023-06-09, 19:50:

Well...what about the 80386 field programmable gate array? An FPGA with an 80386 core that needs access to 386/486 chipset ASICs. Add to the ASICs opensource BIOS to serve as software drivers. A well balanced FPGA and 80386 IP could offer a speed up at the gate level. But you basically need to get people on this website interested in order to make it happen.

A Virtual Machine that has its virtual hardware closely tied to the target OS (win3.11) with Component Object Model for communication channels like IRQ and DMA could be key.

https://blog.lse.epita.fr/2015/11/16/lsepc-intro.html

This idea is great, I've always wanted something small and more practical than a huge 28 lbs case.

but 386sx is painful. I believe the sweet spot would be something like a 166 mmx (at least for my needs).

I once envisioned a PCI-Express card that contained a super socket 7 era chipset, and a processor, as long as the ram, sound and video card were emulated under software, it would greatly reduce the PCEm/86box requirements.

Take a look at my blog: http://rikintosh.blogspot.com
My Youtube channel: https://www.youtube.com/channel/UCfRUbxkBmEihBEkIK32Hilg

Reply 19 of 29, by serialShinobi

User metadata
Rank Newbie
Rank
Newbie

This idea is great, I've always wanted something small and more practical than a huge 28 lbs case.

but 386sx is painful. I believe the sweet spot would be something like a 166 mmx (at least for my needs).

I once envisioned a PCI-Express card that contained a super socket 7 era chipset, and a processor, as long as the ram, sound and video card were emulated under software, it would greatly reduce the PCEm/86box requirements.

Well as many people may already know, I have had ideas that are, like yours, a large project.

But to try to break it down with your ideas in mind - you would still need your team of programmers to use opensource BIOS. Otherwise you get a machine that has proprietary BIOS. Either way it is a form of hardware drivers, mostly for the chipset. For the P166 w/ MMX and socket 7 (quoting retro web):

The Intel 430TX PCIset (430TX) consists of the 82439TX System Controller (MTXC) and the 82371AB PCI ISA IDE Xcelerator (PIIX4).

There's a level of magnitude difference in hardware level programming. You can see this by consulting data sheets for the above mentioned components. And then you need to have parts under test to get the proper input and output to various components, namely ASICs et al.

This is why I like the original IBM AT era so much because I now have this huge collection of books from the market in those days. Costed me hundreds of dollars in out of print books.

With this time frame I can learn the basics of hardware without the "microprocessor course" feel where it's
humbling to ask the instructor about embedded systems. You might know what I mean by this if you saw youtube videos. It's stuff you don't get to until much later in college.

Would rather do the real thing. Which lately, speaking of my 486 & 586 builds, has been more of a jumping off point.