VOGONS


First post, by keenmaster486

User metadata
Rank l33t
Rank
l33t

This is all in Open Watcom C++ 16 bit.

Examples of things I have done that have inexplicably inflated the size of the executable by inexcusable amounts:

  • Creating pointers to dynamically allocated instances of a class, inside of a function (can skyrocket the executable size, adding about 1-2K per instance, despite them all being allocated AT RUNTIME, NOT COMPILE TIME. If I do it in a for loop, the executable size does NOT increase, which is why I know this DOES NOT HAVE TO HAPPEN. Does the compiler literally create multiple copies of the class methods for each instance instead of using the same code and passing a different pointer for "this"? Dumbest thing I've ever seen. I hate it.)
  • Adding a single if block with a single function call and a return statement (adds 2K)
  • Multiplying a constant by 1.5 instead of performing ((constant/2) + constant) - adds 20K (!!!)

These are just the ones I've discovered. I'm sure my executable could be reduced by dozens of KB if INSANE compiler behaviors like "add 2K for no reason because you added a single trivial if statement" didn't exist.

World's foremost 486 enjoyer.

Reply 1 of 17, by middlenibble

User metadata
Rank Newbie
Rank
Newbie

A quickie: I presume you know about the -os flag which instructs the compiler to optimise for size rather than speed. Just noting it down for the benefit of our readership.
So... I cloned the open watcom repo and unleashed Opus on it with the mission to explain the various 16 bit optimisations the compiler pulls. The report is here: https://github.com/ggeorgovassilis/public/blo … tions_report.md

Then I gave it your questions. The answer is a rather lengthy, nerdy, fascinating read. The tl;dr:

Complaint 1: new ClassName() adds 1–2KB per instance (but not in a loop). Root cause: C++ exception-handling state tables + runtime pull-in per new expression. Why a loop doesn't have this problem: In a loop, there is only one new expression in the source code. [There's a lengthy explanation how the compiler sets up error handling].

Complaint 2: Single if block with one function call + return adds 2KB. The return is the trigger, not the if. The real cost comes from the early return statement inside the conditional block. In a C++ function that has local objects with destructors (or any state-table participation), every return statement must unwind the current state — i.e., destruct all live objects back to state 0 before leaving.

Complaint 3: Multiplying by 1.5 instead of (constant/2 + constant) adds 20KB
Root cause: the floating-point literal 1.5 forces the linker to pull in the entire 8087 floating-point emulator library.

Reply 2 of 17, by mkarcher

User metadata
Rank l33t
Rank
l33t
middlenibble wrote on 2026-03-04, 05:55:

Root cause: the floating-point literal 1.5 forces the linker to pull in the entire 8087 floating-point emulator library.

I don't think it's the entire math library, as it consists of many object files, each of them is linked in on demand. But if the compiler defaults to floating point emulation, it will link the entire 8087 emulator. 20K is a typical size of a 16-bit floating point emulation library.

Reply 3 of 17, by middlenibble

User metadata
Rank Newbie
Rank
Newbie
mkarcher wrote on 2026-03-04, 06:32:

I don't think it's the entire math library, as it consists of many object files, each of them is linked in on demand. But if the compiler defaults to floating point emulation, it will link the entire 8087 emulator. 20K is a typical size of a 16-bit floating point emulation library.

Good point! Note how it said "emulator" and not "math" 😀

Reply 4 of 17, by mkarcher

User metadata
Rank l33t
Rank
l33t
middlenibble wrote on 2026-03-04, 06:41:

Good point! Note how it said "emulator" and not "math" 😀

I'm sorry, I obviously wasn't reading your post carefully enough. I missed the word "emulator", even though I quoted it.

Reply 5 of 17, by BloodyCactus

User metadata
Rank Oldbie
Rank
Oldbie

as others have said, it depends a lot on flags, I like -oneatx for optimisations and -5 to use 586 optmised layout (will work fine on 386) and use an FPU so no emulated math calls

--/\-[ Stu : Bloody Cactus :: [ https://bloodycactus.com :: http://kråketær.com ]-/\--

Reply 6 of 17, by keenmaster486

User metadata
Rank l33t
Rank
l33t

Interesting stuff.

Yes, I'm aware of the exception handling, although I didn't think it would cause this much grief. The compiler makes me enable it in order to use the string library. Now I wonder whether that's worth it.

Here are my compiler flags:

-xst -0 -ml -ot -oh -or -s -d0 -ol -ol+

Here's what I've tried:

  • Changing -xst to -xss (compiler hardlocks or quits with an exception depending on whether I run it in DOSBox or natively)
  • Changing -xst to -xs (gains me maybe a couple KB, nothing very useful, not worth the speed decrease, doesn't fix anything from my OP)
  • Removing all optimization flags and using -os only (gains a few KB, doesn't change any of the behaviors in my OP)

I don't buy the AI's explanation about the if block. I removed the early return and changed it to a break (gets out of the loop and lets the function complete normally), and it still inflates the executable by 1K. There is a single function call within the if block, to a void function that sets two global variables to different values, and that's all. However, that function accepts a string as a parameter, so maybe the exception handler is the culprit here once again.

I wish I could disable exception handling for certain blocks of code.

Now as for that arithmetic operation that "forces" the linker to pull in the floating point library - what utter nonsense. That is a #define constant, and the calculation could be performed at compile time. Or, perhaps, you could do something much more sane by default, like casting that int constant to a double long and multiplying by 1.5*65536 (calculated at compile time) then bitshifting right... how about it pulls in a giant library when I tell it to, not when it infers from something I've typed that I somehow wanted that?

BloodyCactus wrote on 2026-03-04, 13:29:

I like -oneatx for optimisations

I have tried this (the OW manual suggests -otexan, same thing in a different order). The -ox option enables -ob (branch prediction), which I have found introduces undefined behavior sometimes, depending on where things are in memory. The program just doesn't run stable with this option enabled, and it doesn't gain enough speed to warrant it.

World's foremost 486 enjoyer.

Reply 7 of 17, by jmarsh

User metadata
Rank Oldbie
Rank
Oldbie
keenmaster486 wrote on 2026-03-04, 17:59:

Or, perhaps, you could do something much more sane by default, like casting that int constant to a double long and multiplying by 1.5*65536 (calculated at compile time) then bitshifting right...

(x*3)>>1 if x is guaranteed to be positive, else (x*3)/2.

how about it pulls in a giant library when I tell it to, not when it infers from something I've typed that I somehow wanted that?

The compiler isn't allowed to make assumptions when it comes to undefined behavior.

Reply 8 of 17, by keenmaster486

User metadata
Rank l33t
Rank
l33t

I know I'm being grumpy about this. But it's very frustrating to be afraid of dynamically instantiating classes or adding trivial conditionals because of an outsized influence those things have on the executable size of all things.

World's foremost 486 enjoyer.

Reply 9 of 17, by Harry Potter

User metadata
Rank Oldbie
Rank
Oldbie

If class allocation is causing heavy bloat when optimizing for speed in OW, try putting the allocation and deallocation code in a separate module, optimize the module for size and optimize the rest of your code for speed.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 10 of 17, by keenmaster486

User metadata
Rank l33t
Rank
l33t

I would try that, but optimizing the entire program for size only gains me a couple of KB and doesn't solve any of these problems.

I'm going to try removing all of my uses of the string library and compiling with exceptions disabled to see if that helps.

World's foremost 486 enjoyer.

Reply 11 of 17, by wbahnassi

User metadata
Rank Oldbie
Rank
Oldbie

Is this std::string? Also, for the function do you pass the string by reference (or pointer) or do you pass it by value? The latter would cause full copy code to run.

Turbo XT 12MHz, 8-bit VGA, Dual 360K drives
Intel 386 DX-33, Speedstar 24X, SB 1.5, 1x CD
Intel 486 DX2-66, CL5428 VLB, SBPro 2, 2x CD
Intel Pentium 90, Matrox Millenium 2, SB16, 4x CD
HP Z400, Xeon 3.46GHz, YMF-744, Voodoo3, RTX2080Ti

Reply 12 of 17, by keenmaster486

User metadata
Rank l33t
Rank
l33t

It's Open Watcom's string library, which is afaik something different.

Yes, I was passing by value. Good point. Maybe that was having an effect.

In any case, I got rid of OW strings and switched to using C strings everywhere, which let me ditch the OW string lib and turn off exception handling. This gained me about 35KB, which is nice, but did once again not change any of the behaviors listed in my OP.

World's foremost 486 enjoyer.

Reply 13 of 17, by Harry Potter

User metadata
Rank Oldbie
Rank
Oldbie

keenmaster486: you just need to optimize the module that allocates the memory as size. If you're right that, when optimizing for speed, every memory allocation attempt costs 1-2k, then that must mean the allocation code is written in-line. In that case, my idea should help at least significantly.

Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

Reply 14 of 17, by st31276a

User metadata
Rank Member
Rank
Member

Afaik most competent string classes do refcounting and only copy on write, the copy constructor should create a new object, but with reference to the copied string's data.

No idea what OW does, but good job ditching the entire attempt entirely.

(You could of course implement your own string class, with only the functionality you actually need...)

Reply 15 of 17, by wbahnassi

User metadata
Rank Oldbie
Rank
Oldbie
st31276a wrote on Today, 06:09:

Afaik most competent string classes do refcounting and only copy on write, the copy constructor should create a new object, but with reference to the copied string's data.

No idea what OW does, but good job ditching the entire attempt entirely.

(You could of course implement your own string class, with only the functionality you actually need...)

Got a link to such a string implementation? I'm curious as IMO such optim would be totally unreliable.

Turbo XT 12MHz, 8-bit VGA, Dual 360K drives
Intel 386 DX-33, Speedstar 24X, SB 1.5, 1x CD
Intel 486 DX2-66, CL5428 VLB, SBPro 2, 2x CD
Intel Pentium 90, Matrox Millenium 2, SB16, 4x CD
HP Z400, Xeon 3.46GHz, YMF-744, Voodoo3, RTX2080Ti

Reply 16 of 17, by st31276a

User metadata
Rank Member
Rank
Member

I know the Qt framework's QString and QByteArray classes do that. Have not looked in depth at stl yet.

Reply 17 of 17, by st31276a

User metadata
Rank Member
Rank
Member

https://gist.github.com/alf-p-steinbach/c5379 … 558bb514204e755

It seems as older C++ standards had COW strings, but it has been removed since, due to reliability as you state.

Some Qt docs about their approach -
https://doc.qt.io/qt-6/implicit-sharing.html
https://doc.qt.io/qt-6/qstring.html