I think I'm about to close this case. It's not fully solved, some mytery remains, but there are no more clues to keep it going. Here's a brief summary in case someone finds it useful, or at least interesting.
The Intel's own manual supplement for 286/287 has this to say about FPREM:
An important use for FPREM is to reduce arguments (operands) of periodic transcendental functions to the range permitted by these instructions. For example, the FPTAN (tangent) instruction requires its argument to be less than PI/4. Using PI/4 as a modulus, FPREM will reduce an argument so that it is in range of FPTAN. Because FPREM produces an exact result, the argument reduction does not introduce roundoff error into the calculation, even if several iterations are required to bring the argument into range. (The rounding of PI does not create the effect of a rounded argument, but of a rounded period.)
FPREM also provides the least-significant three bits of the quotient generated by FPREM (in C3 , C1, C0). This is also important for transcendental argument reduction, because it locates the original angle in the correct one of eight PI/4 segments of the unit circle (see table 2-4). If the quotient is less than 4, then C0 will be the value of C3 before FPREM was executed. If the quotient is less than 2, then C3 will be the value of C1 before FPREM was executed.
This explains why the lowest 3 bits of quotient are useful, and returned in status bits - and also that it's broken for small numbers, and useless unless the code has a way to set C3 and C1 to zero before executing FPREM. Which can be done but it's tricky and requires extra code and care.
Except... not. FPREM seems to work properly for all dividend / divisor pairs, even those that result in small quotient less than 4 or 2. In fact my tests didn't fail on 8087 either, and my original assumption was the 8087 had this issue but it was corrected in 287, yet for some reason the docs were not fully updated. It's important to note that starting with 287XL datasheets there is no longer any mention of FPREM glitching, just that it produces a correct 3-bit status.
What's more, the same manual has a code example on how to derive sine and cosine value using FPREM and FPTAN (just like in the explanation) and guess what, there isn't any attempt to work around those supposed glitchy bits for small numbers. In fact that code seems buggy because it uses FABS to bring negative arguments to positive range, except it's done after FPREM - in a clever attempt to hide some of the logic execution time while the 287 is busy anyway. But that won't work as FPREM will then calculate a negative quotient (since dividend, the original argument, is negative and divisor is positive PI/4 constant) and the status bits will be 3 lowest bits of a negative value - which means those will be negated vs what you'd get on positive values. So the octant detection logic will fail and the whole thing will return invalid values. Also, FABS has a hidden side-effect of setting C1 to zero so it'd actually make even more sense to have it in front of FPREM if the glitch was actually a thing.
That's not the only weird thing about that code, it also makes a special case for zero, and there are comments suggesting that FPTAN can't deal with zero argument while it can - and that is actually clearly indicated in other parts of the manual. Speaking of, the manual also says FSCALE can't deal with zero as exponent argument and that isn't the case either. Yet again that seems to work properly on 8087 too.
Well, it's not the first manual or datasheet that I know to have errors in it but it's Intel, a very popular chip, and more than one document is affected. And I do wonder how this whole FPREM/FSCALE thing came to be in the first place. Perhaps the original design of the 8087 was flawed but that was corrected even before it got manufactured? Or maybe some of the oldest 8087 are actually buggy but I certainly haven't found any 287 code with any workarounds for these supposed issues.