Mingw/Msys compile without need for mingwm10.dll?

Reply 20 of 23, by frobme

Posted on 2009-04-17, 17:28

frobme Offline

Rank Member

Rank: Member
Posts: 137
Joined: 2007-01-09, 09:03

Sigh. It's just amazing how important it is here for some people to get the last word. Yes, I read the examples. Yes, I get the differences. Yes, I've been working with GCC for more then ten years in a lot of environments, and I have personally had fast-math break on me in non floating point intensive code.

One of the best places to see the problems exemplified is the Gentoo repositories and forums; since it is regularly built on many different architectures and generally from source, they see quite a lot of compilation and linking bugs that most other single projects don't. You can search their bugzilla if you are curious. Or you could search KDE's bug backlog, which hates this flag.

I am simply pointing out that it has risks and questionable benefits on Intel architecture processors, hoping to save some people some trouble. If you are an on architecture that isn't optimized specifically against floats, the gains are quite substantial and much more worth any potential usage considerations.

The best rule for general optimization on GCC is to use -O2 and occasionally -O3 in combination with the appropriate -march flags, which picks an intelligent and well-proven set of sub flags automatically. In general the best additional improvement from there is not picking various flags that sound good but rather to do a profile guided opt with representative data.

-Frob

Reply 21 of 23, by wd

Posted on 2009-04-17, 17:44

wd Offline

Rank DOSBox Author

Rank: DOSBox Author
Posts: 10813
Joined: 2003-12-03, 21:23

Sigh.

It's just amazing how important it is here for some people to get the last word.

Don't know what you want with this statement.

Yes, I've been working with GCC for more then ten years in a lot of environments

I know what you want with this statement.

I am simply pointing out that it has risks and questionable benefits on Intel architecture processors

Up to now i've only seen a number of unconfirmed bug reports for older
gcc4 versions (those may be relevant if they happen for gcc3 as well or
for newer gcc4), confirmed+fixed bug reports, and the general statement
of "it's imprecise" which is not true.
I was just puzzled by the "famously broken" statement.

If you are an on architecture that isn't optimized specifically against floats, the gains are quite substantial and much more worth any potential usage considerations.

Yes you're right, though the target here (mingw/x86) is quite specific.

Btw. somewhere was a note (with a claimed reference to gcc docs) that not
using ffast-math results in 40bits precision calculation, whereas with ffast-math
and the corresponding cpu extension specification (sseX?) a much higher
precision (yet non-conforming with ieee BECAUSE of the higher precision)
is achieved. I could neither find a reference to that in the docs nor did verify
it, but it partly makes sense for x86.

Reply 22 of 23, by frobme

Posted on 2009-04-17, 18:44

frobme Offline

Rank Member

Rank: Member
Posts: 137
Joined: 2007-01-09, 09:03

Btw. somewhere was a note (with a claimed reference to gcc docs) that not using ffast-math results in 40bits precision calculati […]
Show full quote
Btw. somewhere was a note (with a claimed reference to gcc docs) that not
using ffast-math results in 40bits precision calculation, whereas with ffast-math
and the corresponding cpu extension specification (sseX?) a much higher
precision (yet non-conforming with ieee BECAUSE of the higher precision)
is achieved. I could neither find a reference to that in the docs nor did verify
it, but it partly makes sense for x86.

It wouldn't surprise me at all - depending on the underlying chip type, on Intel you might use the 32/40bit computational method, the actual float registers, or SSE2+ on P4s and later (which was faster than x87 registers starting at original p4 if I recall correctly). The SSE path can be super fast in comparison especially on Duo's and such than do mul/divs at the same time as an add, and is anywhere up to 128 bit precision, so quite a bit better than the original IEEE standard.

IEEE (the new version) covers 32/64/128 bit interchange formats, so these days only the 80bit Intel stuff should be non conforming.

-Frob

Reply 23 of 23, by wd

Posted on 2009-04-17, 18:54

wd Offline

Rank DOSBox Author

Rank: DOSBox Author
Posts: 10813
Joined: 2003-12-03, 21:23

Thanks for your explanation.

Main menu