I haven't looked at it in detail as it doesn't look like I'd be able to compile it on my Linux machine anyway, but here are a couple of suggestions. Function calls can be expensive, so have you tried inlining all the functions for each instruction? If the compiler respects that, you could avoid a function call on each emulated opcode.
A couple of your opcodes also include conditional statements, which can incur a performance hit if the host CPU's branch predictor gets it wrong. You might want to see how often those instructions are used, and if it's a common one, see if you can replace the branch with an alternative (like an array lookup or similar.) Best to benchmark a lot though, because you can waste a lot of time doing this for little benefit!
Of course before you do any of this, it'd probably be a good idea to use a code profiler to work out exactly which parts of the code are being used the most, so you know where to concentrate your efforts. Otherwise you might end up spending time optimising something that is rarely used, making little change to the overall emulation speed.