VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

I just found out something strange. When I implement a simple if-else clause, the compiler seems to always compile it to a jump to the else clause, instead of the else clause being the one not jumped to.

So if I have the following code:

	counter = (counter!=PWM->Amplitude); //Are we not to expire?
counter |= PWM->output; //Are we not to expire(already expired)?
if (counter) //To give a result?
{
result = PWM->result; //Get the result!
}
else //Are we to expire?
{
//Finished PWM sample!
PWM->output = AMPENV_RESULT_SILENCE; //We're finished! Return to 0V always for the rest of the period!
result = PWM->result = 0; //No output anymore!
}
return result; //Give the proper output as a 16-bit sample!

The compiler will generate assembly code that makes the if(counter) roughly equivalent to a "jz elseclauseaddress". This is kind of counter-intuitive, since (as far as I know) having an if condition being true is usually supposed to jump to the code within it, otherwise normally continuing executing the next instruction(which is supposed to be the else clause).

So you would expect a simple "if(x) whatifx else whatifnotx" to be compiled into a:

cmp x,0(might be omitted)
jnz whatifx
whatifnotx
jmp done
whatifx:
whatifx
done:

But instead, it reverses the whole thing, making it jump to the else section if x is zero and not jump if x isn't zero? Why would it do this(it destroys properly using the CPU caches)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 7, by spiroyster

User metadata
Rank Oldbie
Rank
Oldbie

Interesting, yes it does seem counter-intuitive, perhaps the compiler author was a pessimist o.0

It looks like your conditional is fuzzy there? The only explicit value that can be checked is 0, maybe this has a bearing in the algorithm used to generate the resultant instructions (not that it makes a difference in the grand scheme of things in terms of your program).

Mind you, whats counter intuitive, it may seem counter-intuitive to you (with a higher knowledge of the programs intention, and biased to what you think your most important code path is through any particular block), but to a compiler there is no right or wrong with a conditional, just one state and the other. I have no idea tbh, but suspect it's the result of some optimization algorithm which itself resolved to something weird (yet correct... -0.000 == Pessimists version of nothing). What compilation flags were used? if you change the flags, how different (if at all), is this resultant assembly?

Reply 2 of 7, by dr_st

User metadata
Rank l33t
Rank
l33t

Without prior knowledge, there is no reason why one is more intuitive than the other, so the compiler author can implement it either way.

Modern compilers try to guess which one is more likely, and optimize the code so that the jump happens less frequently than non-jump (seeing as 'jump' is an expensive CPU instruction, because it involves flushing most of the pipeline). So if counter is more often non-zero, the code generated by the compiler in your example makes sense.

It gets more complicated, since modern CPUs have branch predictors that try to guess whether a jump is going to happen or not, and optimize the pipeline accordingly. In modern compilers it is also possible to give "hints" that suggest which condition is more likely, e.g., likely() and unlikely() macros.

if (likely(x>0))
do_something();
else
do_something_else();

https://cloakedthargoid.wordpress.com/ - Random content on hardware, software, games and toys

Reply 3 of 7, by superfury

User metadata
Rank l33t++
Rank
l33t++

In the case of the first code in my first post, Visual C++ executes the jz to the else block 15 out of 16 times it reaches that point, resulting in a big performance hit(it jumps 15 out of 16 times, of which the part containing the jump is executed quite a lot: it's generating PWM samples for 6 channels on 2 virtual chips at a rate of 3.57 million times per second(Game Blaster audio softsynth). Each sample jumped 15 out of 16 PWM subsamples, so that's 15x2x6x3.57M cache misses each second. This would be reduced by 14x2x6x3.57M if the compiler would just properly assume most of the time the if isn't followed, executing the return statement(currently not used anymore due to newly createn PDM sampling instead of PWM sampling(which uses a lookup table instead now).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 7, by Azarien

User metadata
Rank Oldbie
Rank
Oldbie
superfury wrote:

In the case of the first code in my first post, Visual C++ executes the jz to the else block 15 out of 16 times it reaches that point, resulting in a big performance hit(it jumps 15 out of 16 times, of which the part containing the jump is executed quite a lot: it's generating PWM samples for 6 channels on 2 virtual chips at a rate of 3.57 million times per second(Game Blaster audio softsynth). Each sample jumped 15 out of 16 PWM subsamples, so that's 15x2x6x3.57M cache misses each second.

Are you sure there are cache misses? Because modern CPUs ("modern" meaning Pentium and later...) have branch prediction mechanisms. If the particular branch is taken 15 out of 16 times, after several iterations the CPU will note itself "aha, this branch is almost always taken, I'll queue whatever there is as if the branch were taken", resulting in a cache miss only once out of 16 times, and the jump being almost free (one cycle or so). Or, if there is a fixed pattern (15 times taken, 1 time not taken) it might notice even that resulting in nearly 100% hit.

Last edited by Azarien on 2017-03-29, 10:10. Edited 1 time in total.

Reply 5 of 7, by Scali

User metadata
Rank l33t
Rank
l33t
dr_st wrote:

It gets more complicated, since modern CPUs have branch predictors that try to guess whether a jump is going to happen or not, and optimize the pipeline accordingly.

Yes, on some CPUs a branch is actually 'free' as long as it was predicted correctly (eg the Motorola 68060), because it is processed by a 'side channel' in the pipeline, so it can be executed alongside regular instructions, not taking any processing time.
I believe on all modern x86 processors the actual 'jcc' instruction still takes 1 cycle/slot in the pipeline, but taken/not-taken doesn't make a difference, it's all about matching prediction or not.

There are some fixed heuristics, such as a jump to a lower address is always assumed to be taken (it is likely a loop), and a jump to a higher address is assumed to be not-taken (probably a special-case like error-handling).

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/

Reply 6 of 7, by Azarien

User metadata
Rank Oldbie
Rank
Oldbie
Scali wrote:

There are some fixed heuristics, such as a jump to a lower address is always assumed to be taken (it is likely a loop), and a jump to a higher address is assumed to be not-taken (probably a special-case like error-handling).

These may be the defaults, but x86 CPUs keep history of recent jumps, and if particular jump behaves consistently (e.g. taken - not taken - taken - not taken) it will adapt to that.

Reply 7 of 7, by Scali

User metadata
Rank l33t
Rank
l33t
Azarien wrote:

These may be the defaults, but x86 CPUs keep history of recent jumps, and if particular jump behaves consistently (e.g. taken - not taken - taken - not taken) it will adapt to that.

Of course. That's what (dynamic) branch prediction actually is.
The heuritiscs are just the 'starting point', the initial prediction.
There's also a thing as 'predication', where you can give hints as to what the starting point should be (which is what the aforementioned macros such as likely() will amount to).
Some CPUs (not x86 of course, too much legacy, too little sophistication) allow you to put an entire bitmask of the behaviour into the jump opcode, so you pre-initialize the prediction. This can be done at compile-time by profiling the code. Eg, you could put 16 bits worth of 'history' into the opcode.

http://scalibq.wordpress.com/just-keeping-it- … ro-programming/