2GB RAM - go X86 or x64? - Page 2 \ VOGONS

Reply 20 of 36, by Jo22

Posted on 2018-05-09, 15:58

Jo22 Offline

Rank l33t++

Rank: l33t++
Posts: 11622
Joined: 2009-12-13, 07:06
Location: Europe

I second that. It's not that I like to see Win32/x86 support to be removed from future releases,
but it makes little sense if you have to maintain two binaries for Win10 x86 and Win10 x64.
Back in the late 2000s, I thought Microsoft would quit making x86 releases after Windows XP.
I hoped they would release a modern x64 version with improved backwards compatibility (NTVDM emulator they used for Alpha/Mips, etc).
That way, people could have had focused solely on 64-Bit pointers and not worrying about older API issues,
since there wouldn't have been any prior x64 releases (except XP, maybe).

Edit: Or long story short, I hoped Windows would make the move
from Win32 to Win64 in the same way it moved from Win16 to Win32. 😀

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 21 of 36, by Falcosoft

Posted on 2018-05-09, 23:18

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

I was able to get 3,25GiB of main memory on XP, because I kept the video memory little (between 64MiB to 128MiB).
The memory regions past 3,25GiB were reserved for the PCI address space (like the 640KiB to 1MiB region was for ISA bus on DOS PCs).

Oh, that's cool. 😎 Looks like your chipset is a good one (doesn't litter up that space with uneccessary stuff).
I knew it was something close to 3.25GiB, but wasn't 100% sure about it. I had the limit at 3.25GiB, as far as I remember. 😀

Usually the available memory under 32 bit Windows is determined by the starting address of the video card's linear frame buffer not by the chipset.
Staring address of LFB:
0xC0000000 -> 3GB available
0xD0000000 -> 3.25GB available
0xE0000000 -> 3.5GB available
You can test this by checking the used memory resources of your VGA in device manager.
If you have an integrated VGA with an LFB address of 0xC0000000 and 256MB frame buffer you will have 2.75GB available memory.

Scali wrote:
32-bit applications running under a 64-bit OS are basically running in the same way as PAE: because of the extra addressing space, each process can get the full 4 GB of virtual address space, whereas under a 32-bit OS, they only got 2 GB (or 3 GB with a special boot option), because the OS had to reserve the rest for kernel, drivers and other shared memory-mapped stuff.

This extension to 4GB address space for 32 bit applications is not automatic. You have to explicitly set the large address aware PE flag to get it. Most 32 bit compilers do not set this flag, so most older 32 bit applications do not see the benefit. Also setting the flag on arbitrary 32 bit applications can have unwelcome effects (they do not expect pointers bigger than 2GB).

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 22 of 36, by Scali

Posted on 2018-05-09, 23:53

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Falcosoft wrote:
This extension to 4GB address space for 32 bit applications is not automatic. You have to explicitly set the large address aware PE flag to get it. Most 32 bit compilers do not set this flag, so most older 32 bit applications do not see the benefit. Also setting the flag on arbitrary 32 bit applications can have unwelcome effects (they do not expect pointers bigger than 2GB).

Compilers have no business generating binaries anyway.
That's the linker's job.
In fact, you can argue that since the compiler doesn't know what address space size you are compiling for, it doesn't matter. The compiler always uses a linear 4 GB address space internally.

Mind you, story still goes: the flag was originally introduced for PAE, and works the same under a 64-bit OS. They *can* use 4 GB address space, didn't say they automatically would.

Also, I don't see your point about pointers bigger than 2 GB. 2 GB would have 31 bits. There is no 31-bit datatype. Pointers always use a 32-bit unsigned integer datatype (uintptr_t) under the x86 model. And since it is 2s complement, it technically doesn't make a difference whether you use signed or unsigned values, unless you are doing some REALLY funky pointer arithmetic.
I think it's highly unlikely for most applications to be bothered by setting the flag. Most applications probably only use compiler-generated pointer arithmetic, which wouldn't be sensitive to the flag.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 23 of 36, by Joey_sw

Posted on 2018-05-10, 01:51

Joey_sw Offline

Rank Oldbie

Rank: Oldbie
Posts: 550
Joined: 2011-08-17, 12:03

36-bits PAE, is there any Graphic Cards+Drivers that utilizes it?
I'm under impression that graphic cards with 4GB+ Vram requires the x64 OS (can't be used on x86 PAE-capable OS).

-fffuuu

Reply 24 of 36, by Falcosoft

Posted on 2018-05-10, 08:48

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

duplicated below...

Last edited by Falcosoft on 2018-05-10, 10:37. Edited 1 time in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 25 of 36, by Falcosoft

Posted on 2018-05-10, 08:57

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

Scali wrote:

Also, I don't see your point about pointers bigger than 2 GB. 2 GB would have 31 bits. There is no 31-bit datatype. Pointers always use a 32-bit unsigned integer datatype (uintptr_t) under the x86 model. And since it is 2s complement, it technically doesn't make a difference whether you use signed or unsigned values, unless you are doing some REALLY funky pointer arithmetic.
I think it's highly unlikely for most applications to be bothered by setting the flag. Most applications probably only use compiler-generated pointer arithmetic, which wouldn't be sensitive to the flag.

Pointer arithmetic does not have to be that funky to cause problems. It's enough to to cast pointers to signed int and compare bigger/less than to get false results. Also storing difference of subtraction of pointer addresses in signed int is a practice I have already met. I have already tested 32-bit FSMP that hosted different 32-bit VST(i) plugins with large address aware flag set and VirtualAlloc() called with MEM_TOP_DOWN flag. Some really did not like it... Also Creative's SB Live utilities do not like 2GB+ addresses.

I'm under impression that graphic cards with 4GB+ Vram requires the x64 OS (can't be used on x86 PAE-capable OS).

I do not have 4GB+ VRAM to test under 32-bit but in theory addressing video RAM is a concern of the driver. On the CPU side there's usually only a 256MB area in the PCI address space. It's definitely true for 1/2 GB VRAM so I think 4GB+ is not different.
An analogy could be that under pure 16-bit real mode that has an address limit of only 1MB you can use the full VGA frame buffer (even 256MB) through VESA drivers/BIOS by using a 64KB area and bank switching.

Last edited by Falcosoft on 2018-05-10, 09:36. Edited 1 time in total.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 26 of 36, by Scali

Posted on 2018-05-10, 09:31

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Falcosoft wrote:
Pointer arithmetic does not have to be that funky to cause problems. It's enough to to cast pointers to signed int an compare bigger/less than to get false results.

How does that not qualify as funky?
You'd have to be an extremely bad programmer to not understand that a linear address space requires unsigned arithmetic, while at the same time thinking you're smart enough to perform some kind of sorting based on pointer addresses.

Falcosoft wrote:
Also storing difference of subtraction of pointer addresses in signed int is a practice I have already met.

That shouldn't matter though. There is no difference between signed and unsigned subtraction in 2s complement notation.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 27 of 36, by Falcosoft

Posted on 2018-05-10, 10:07

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

Scali wrote:

That shouldn't matter though. There is no difference between signed and unsigned subtraction in 2s complement notation.

It matters when the large pointer can overflow the type that holds the difference (signed int).
Subtracting a small pointer from a large pointer will produce a negative value in the signed int representing the pointer difference( e.g. 3GB - 1).
But I think we lost the point. There CAN be programs that have problems with large pointers even if bad programmers are required for this. Such bad programmers could be found even at Creative Labs...

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 28 of 36, by Scali

Posted on 2018-05-10, 10:43

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Falcosoft wrote:
It matters when the large pointer can overflow the type that holds the difference (signed int).

No, it doesn't, in 2s complement. That's my whole point.
'Overflow' is only a bit in a status register. If you don't check it (as in, conditional branch), the arithmetic is equivalent, the wraparound is the same. There *is* no difference between adding or subtracting signed or unsigned numbers. The CPU has only one set of add/sub opcodes.

Falcosoft wrote:
Subtracting a small pointer from a large pointer will produce a negative value in the signed int representing the pointer difference( e.g. 3GB - 1).

In 2s complement, the binary representation is exactly the same. 'Positive' or 'negative' values are simply different ways to view the same binary value.
They would only matter in specific branches, where a signed branch will look at different flags than an unsigned one. But we already covered that case in your first example.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 29 of 36, by agent_x007

Posted on 2018-05-10, 11:03

agent_x007 Offline

Rank Oldbie

Rank: Oldbie
Posts: 1729
Joined: 2016-01-19, 11:06

Falcosoft wrote:

I'm under impression that graphic cards with 4GB+ Vram requires the x64 OS (can't be used on x86 PAE-capable OS).

I do not have 4GB+ VRAM to test under 32-bit but in theory addressing video RAM is a concern of the driver. On the CPU side there's usually only a 256MB area in the PCI address space. It's definitely true for 1/2 GB VRAM so I think 4GB+ is not different.
An analogy could be that under pure 16-bit real mode that has an address limit of only 1MB you can use the full VGA frame buffer (even 256MB) through VESA drivers/BIOS by using a 64KB area and bank switching.

They do not require 64-bit OS.
However, there will be no newer* drivers for 32-bit OS from either AMD or Nvidia.
So you probably won't be able to find 32-bit drivers for Volta/Ampere based GPUs.
*Except critical security updates.

As for 8GB VRAM vs. 32-bit OS :
Radeon R9 390X + Windows 7 x86 SP1 : LINK
Radeon driver works fine up to 4096MB Dedicated (+ Dynamic ?), memory usage.
Above that point it's program will crash (tested only on Fire Strike only at this point).

GeForce GTX 1080 + Windows 7 x86 SP1 : LINK
Works like intended, however VRAM usage is always over 4096MB.
Not sure if it's to prevent crashes, but GPU-z reports 4GB usage at despktop.

No cheating here - I used Northwood based CPUs (no x64 bit support) 😀
Available system RAM for Windows was at 3200MB in both cases (from 4096MB installed).

Reply 30 of 36, by Falcosoft

Posted on 2018-05-10, 11:51

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

Scali wrote:
No, it doesn't, in 2s complement. That's my whole point. 'Overflow' is only a bit in a status register. If you don't check it (a […]
Show full quote

No, it doesn't, in 2s complement. That's my whole point.
'Overflow' is only a bit in a status register. If you don't check it (as in, conditional branch), the arithmetic is equivalent, the wraparound is the same. There *is* no difference between adding or subtracting signed or unsigned numbers. The CPU has only one set of add/sub opcodes.
In 2s complement, the binary representation is exactly the same. 'Positive' or 'negative' values are simply different ways to view the same binary value.
They would only matter in specific branches, where a signed branch will look at different flags than an unsigned one. But we already covered that case in your first example.

Thanks, but I know how 2'complement and x86's add/sub work. 'It matters' for me means that even if the binary representation is the same e.g. -1 and 4294967295 (0xFFFFFFFF) semantically are different. Although storing this value in signed int is not a problem itself the code later makes calculations and decisions based on the 'semantic' of this value(decisions ->first example as you have written).
Contrary to add/sub division and multiplication are sign-aware and compilers make different code for them (mul/div vs imul/idiv). So if the code later uses these instructions for calculations based on the singed int value you can get wrong results ( e.g. the result of -1/2 is different from 4294967295/2).

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 31 of 36, by Scali

Posted on 2018-05-10, 12:22

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Falcosoft wrote:
Contrary to add/sub division and multiplication are sign-aware and compilers make different code for them (mul/div vs imul/idiv). So if the code later uses these instructions for calculations based on the singed int value you can get wrong results ( e.g. the result of -1/2 is different from 4294967295/2).

Not entirely.
mul/imul only matters for specific cases.
If you are using 32-bit inputs and 32-bit outputs, then it doesn't matter (mul and div are just repeated addition/subtraction).
mul vs imul is only for 64-bit results, where the sign of the high dword needs to be adjusted. As long as you're only interested in the low dword, it doesn't matter.
This also explains why imul has two-operand and three-operand forms, but mul does not: the two/three operand forms are only for 32-bit multiplies, and in that case mul and imul are equivalent anyway.
http://www.felixcloutier.com/x86/IMUL.html

The two- and three-operand forms may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned.

Since pointers are never 64-bit in x86 code, they always get truncated, so there is no case where mul vs imul makes a difference in pointer arithmetic.
So are you sure you know how 2s complement works?

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 32 of 36, by Falcosoft

Posted on 2018-05-10, 14:29

Falcosoft Offline

Rank l33t

Rank: l33t
Posts: 2568
Joined: 2016-05-21, 13:46
Location: Pécs, Hungary

Scali wrote:
So are you sure you know how 2s complement works?

Please, just forget this supercilious attitude...
But the answer is no, I'm only sure that you deliberately miss the point and have a tendency of completely ignoring parts of posts that do not confirm your claims.

Since pointers are never 64-bit in x86 code, they always get truncated, so there is no case where mul vs imul makes a difference in pointer arithmetic.

If I remember correctly my only concrete example was a div/idiv:

( e.g. the result of -1/2 is different from 4294967295/2)

.
So is it also true that there is no case where div vs idiv makes a difference in pointer arithmetic?
I just mentioned mul/imul vs div/idiv as an example where the x86's has sign-aware instructions and where compilers can use different instructions based on the signed/unsigned type of variables. I have not claimed that using div/mul makes a difference on signed vs unsigned types. I claimed that using e.g. div on unsigned vs idiv on signed types can make a difference (that's what compilers do and that was my example).

But the whole conversation is about : Can there be situations/programs where 2GB+ pointers can cause problems? I claim yes. The fact is there are 32-bit programs/libraries/drivers that have problems with 2GB+ pointers. They exist. That's not a counter argument that it's only because of bad programmers.

Website, Youtube
Falcosoft Soundfont Midi Player + Munt VSTi + BassMidi VSTi
VST Midi Driver Midi Mapper
x86 microarchitecture benchmark (MandelX)

Reply 33 of 36, by Scali

Posted on 2018-05-10, 15:24

Scali Offline

Rank l33t

Rank: l33t
Posts: 4873
Joined: 2014-12-13, 14:24

Falcosoft wrote:
Please, just forget this supercilious attitude...

It might sound a bit facetious, but there is a difference between knowing 'something about 2s complement' (or perhaps having heard about mul vs imul etc), and actually understanding the subject in all relevant details.

Falcosoft wrote:
But the answer is no, I'm only sure that you deliberately miss the point and have a tendency of completely ignoring parts of posts that do not confirm your claims.

That's one way to look at it...
However, I don't think I have made any claims. I merely stated that you'd have to be doing some pretty funky pointer arithmetic for the 2 GB vs 4 GB to be an issue.
Addition, subtraction and multiplies are not affected, as we've covered so far.
Comparisons and divisions may or may not be affected, but only in specific cases, where the first mistake is to use signed integers for pointers, which counts as very 'funky' pointer arithmetic in my book.

Falcosoft wrote:
So is it also true that there is no case where div vs idiv makes a difference in pointer arithmetic?

No, but I never claimed otherwise.
In theory there are certainly cases where div vs idiv could matter.
However, I can't really think of any practical cases, since the issue is mainly when you divide pointers directly.
I can't think of any useful application of doing so.
In practice you would usually calculate an offset first, by taking the difference of two pointers.
However, that difference in 2s complement is not going to be affected by signed or unsignedness of any variables.

Any resulting divisions would only fail if that offset is larger than 2 GB. But is that even a practical situation? In the case where your entire process only has 2 GB, it is impossible to get such an offset in the first place, so your program would have failed anyway, if you needed more than 2 GB of memory.
And in the case where you enable the 4 GB flag, as long as the actual data stays below 2 GB, the offsets will not wraparound, and again you should not get affected.

But there is this 'uncanny valley' here... What kind of programmer would be smart enough to go and try to do super-fancy arithmetic with dividing pointers and whatnot, yet is clueless enough to not use unsigned datatypes?
I think that is a pretty rare combination. Most programmers are either too scared or ignorant of pointer arithmetic to even try, and compilers won't generate code that breaks. And (presumably) most programmers who are actually knowledgeable enough to do their own pointer arithmetic, will know what datatypes to use.

Falcosoft wrote:
But the whole conversation was about : Can there be situations/programs where 2GB+ pointers can cause problems? I claim yes.

And I agree there.

Falcosoft wrote:
The fact is there are 32-bit programs/libraries/drivers that have problems with 2GB+ pointers. They exist.

Certainly. In fact, the whole reason why PAE was never enabled on desktop versions of Windows was because of all the poorly written drivers that would break.
Then again, with drivers you would be more likely to perform all sorts of low-level operations, and pointer arithmetic would be more likely than in most applications and libraries. Heck, entire applications and libraries are written in languages that don't even support pointer arithmetic in the first place. There is an entire generation of programmers today that doesn't even know what pointers are.

Falcosoft wrote:
That's not a counter argument that it's only because of bad programmers.

It wasn't meant as one. Bad programmers do exist. In fact, "pointer arithmetic" is one of the points I have listed as things that separate the men from the boys when it comes to programming.

Heck, at work a few years ago, I saw a bunch of code that was made '64-bit compliant' by casting all pointers to 64-bit integers, regardless of the target platform. So yes, even poor 32-bit x86 CPUs would have to process the pointer arithmetic in 64-bit.
Okay, that 'solution' kinda sorta worked, but clearly the guy that did this had absolutely no clue what he was doing (and the code would just as easily break again for any architecture with pointers of more than 64-bits... The problem wasn't solved, merely moved around).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 34 of 36, by Azarien

Posted on 2018-05-10, 19:57

Azarien Offline

Rank Oldbie

Rank: Oldbie
Posts: 940
Joined: 2015-05-14, 07:14

Right now I'm using 32-bit Windows 10 on a machine with 4 GB RAM. I have PAE support hacked into the Windows kernel, so full 4 GB can be utilized. There's also 2 GB on HD7850.
Works stable.
I know that there are some drivers that don't work with PAE (Intel graphics, as far as I know), but everything works for me.

Reply 35 of 36, by ynari

Posted on 2018-05-29, 23:57

ynari Offline

Rank Member

Rank: Member
Posts: 430
Joined: 2014-05-29, 12:38
Location: Manchester, UK

I've used PAE enabled versions of Windows Server - they didn't work well with a number of network drivers that really should have supported PAE.

XP x64 is a real pain to use, as it's basically cut down Windows Server 2003, and little software knows how to work with it.

It is possible for 32 bit versions of Windows to support >3GB per process with the AWE functions, although this has various limitations.

Personally I wouldn't use x64 versions of Windows with less than 3GB unless you have a really specific use case.

Reply 36 of 36, by ATauenis

Posted on 2018-05-30, 14:33

ATauenis Offline

Rank Member

Rank: Member
Posts: 223
Joined: 2018-05-22, 13:06
Location: Moscow, Russia

Win7x32 will be a better OS for 2GB computers. It is consuming about 400 MB of RAM at blank desktop. Windows 8/8.1/10 consuming about 0,6GB here, and when there are only 2 GB of RAM, these 256 megabytes might be critical for perfomance. E.g. user of Win7 can open more tabs in web browser than a Win10 user and have no perfomance issues.

2×Soviet ZX-Speccy, 1×MacIIsi, 1×086, 1×286, 2×386DX, 1×386SX, 2×486, 1×P54C, 7×P55C, 6×Slot1, 4×S370, 1×SlotA, 2×S462, ∞×Modern.

Main menu