VOGONS


Reply 20 of 34, by maxtherabbit

User metadata
Rank l33t
Rank
l33t
Horun wrote on 2020-03-05, 02:19:
maxtherabbit wrote on 2020-03-04, 23:48:

I have a 8087 in my V20 system, I can test on that looks like matze 79 already has this combo covered

do modern browsers and OS even recognize COM as an executable format? I doubt it

No 🤣 and do not think anyone will be running a modern OS on a 286 😜 I used DOS 6.

the OS you run on your 286 has fuckall to do with attaching files to a vogons post

Reply 21 of 34, by Shagittarius

User metadata
Rank Oldbie
Rank
Oldbie

Intel 80286, 8 Mhz, Real Mode, 1/2/8/7
Intel 80287, 5 Mhz, 2/2/2/4

download/file.php?mode=view&id=78291

Attachments

Reply 22 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie
Shagittarius wrote on 2020-03-05, 14:58:

Intel 80286, 8 Mhz, Real Mode, 1/2/8/7
Intel 80287, 5 Mhz, 2/2/2/4

Thanks! You'll be happy to know your 287 works as it should (and not as Intel says it would).

Reply 23 of 34, by Per

User metadata
Rank Newbie
Rank
Newbie

I have the posibility of testing on an IBM 5162 with a 287XL, and an 8088/87 system that has DOS (int 2#h) but is otherwise not IBM compatible (missing any int 1xh and the entire chipset).

I also got some 287 NPUs that are not XL, but seems like there's quite a few results from those already.

Reply 24 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie
Per wrote on 2020-03-06, 15:16:

I have the posibility of testing on an IBM 5162 with a 287XL, and an 8088/87 system that has DOS (int 2#h) but is otherwise not IBM compatible (missing any int 1xh and the entire chipset).

I also got some 287 NPUs that are not XL, but seems like there's quite a few results from those already.

You've picked my curiosity, what kind of machine is that 8088/87? I only use 2 functions of int 21h for these test programs, all of them should work if COM executables are acceptable.

Reply 25 of 34, by Per

User metadata
Rank Newbie
Rank
Newbie
Deunan wrote on 2020-03-06, 17:12:

You've picked my curiosity, what kind of machine is that 8088/87? I only use 2 functions of int 21h for these test programs, all of them should work if COM executables are acceptable.

It's the x86 add-on expansion board for the Tiki-100. It runs both the 8088 and 8087 at the same 4MHz base-clock as the Z80 on the main motherboard. It looks about like this:

Webp.net-resizeimage.jpg
Filename
Webp.net-resizeimage.jpg
File size
310.82 KiB
Views
608 views
File license
CC-BY-4.0

(there's a bit of mess going on, the cable for the floppy passes under everything to avoid going through the powersupply, and there's some video sync-signals going across to the parallel port for demoscene-support. The memory expansion is home-built in order to max out the RAM.)

And for the results:

DSC_0121.JPG
Filename
DSC_0121.JPG
File size
1.37 MiB
Views
608 views
File license
CC-BY-4.0

Reply 26 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

That's a very interesting setup. What is the purpose of that add-on, if it has any specific one? I looked it up on wikipedia ,something to do with Rev. D?
I like how many various systems were built around Z80, including some gaming consoles. A lot of people, especially C64 owners, consider the 6502 to be a superior chip but I think it was Z80 that ultimately won the contest.

Anyway, thanks for running the tests. All of them passed, so I have no idea what Intel is trying to tell us about the fprem instruction. And I even diassembled some old 286 code to see what sin() / cos() implementation did, and I don't see any attempts to work around the supposed problem. I'm stumped.

So... anyone has 287 datasheets for non-Intel chips? AMD, Cyrix, IIT? Or maybe an Intel datasheet that doesn't say "advance information" in the header, because this looks a bit like a copy of 8087 docs with some timings updated. I have the following datasheets:
- Intel 8087
- Intel 80287 (marked "advance information")
- Intel 80287XL

And an old book or two, back from when DOS was still the king. I just noticed a single sentence in one of these books, that mentions a particular class of numbers that are supposedly problematic on 287, I can't find anything more about that. Perhaps it was transcribed from some errata docs. Since I have a 286/287 system, I will test that myself over the weekend.

Reply 27 of 34, by Per

User metadata
Rank Newbie
Rank
Newbie
Deunan wrote on 2020-03-06, 19:35:

That's a very interesting setup. What is the purpose of that add-on, if it has any specific one? I looked it up on wikipedia ,something to do with Rev. D?

The intention of the card is to add support for CP/M-86 applications and MS-DOS. The stock Tiki-100 without the card only has the Z80, and runs a clone of CP/M 2.2. The developers didn't like that they were basically forced to use the Z80, and the expansion was probably made because the x86 was the CPU that "made sense" to use at that time and price-point.

The rev.D is kind of the opposite, that's an IBM-PC compatible with a Z80 expansion card for half-broken backwards compatibility.

Reply 28 of 34, by Jo22

User metadata
Rank l33t++
Rank
l33t++
Per wrote on 2020-03-06, 20:23:

The rev.D is kind of the opposite, that's an IBM-PC compatible with a Z80 expansion card for half-broken backwards compatibility.

Wait, even more broken than the backwards compatibility of some CP/M emulators using NEC V20/V30 8080 emulation mode ? 😉
(The extremely popular Turbo Pascal software required z80 compatibility)

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 29 of 34, by Horun

User metadata
Rank l33t++
Rank
l33t++
Deunan wrote on 2020-03-06, 19:35:
So... anyone has 287 datasheets for non-Intel chips? AMD, Cyrix, IIT? Or maybe an Intel datasheet that doesn't say "advance info […]
Show full quote

So... anyone has 287 datasheets for non-Intel chips? AMD, Cyrix, IIT? Or maybe an Intel datasheet that doesn't say "advance information" in the header, because this looks a bit like a copy of 8087 docs with some timings updated. I have the following datasheets:
- Intel 8087
- Intel 80287 (marked "advance information")
- Intel 80287XL

You probably already checked this archive but just in case: http://ohwc.narod.ru/man-dat/cpu.html

Hate posting a reply and then have to edit it because it made no sense 😁 First computer was an IBM 3270 workstation with CGA monitor. Stuff: https://archive.org/details/@horun

Reply 30 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

I did, but the only thing there that's an actual datasheet and not a 2-page leaflet is the AMD doc, but that one only gives the electrical parameters and just says the product is Intel-compatible. All Intel ones on the other hand are just copies of each other with some small changes.

I was also looking for the 1984 supplement to 286 manual, couldn't find that but there is a compiled work from 1985: iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX 286 NUMERIC SUPPLEMENT
Googling that isn't easy, you need to know the title and use it as keywords or all you going to get to is useless links to all those wiki copies. And to add insult to injury the supplement is at the end of the document, including a separate TOC, so a cursory glance at the first few pages almost made me skip it as a duplicate copy of the original 1983 manual.

I'm going to study that, and then maybe disassemble a few more old programs to be sure. Since I don't plan on supporting 8087 all I need to know if the problem really does exist on 287 or not, so that I can skip the workaround as it lowers performance.

Reply 31 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

I think I'm about to close this case. It's not fully solved, some mytery remains, but there are no more clues to keep it going. Here's a brief summary in case someone finds it useful, or at least interesting.

The Intel's own manual supplement for 286/287 has this to say about FPREM:

An important use for FPREM is to reduce arguments (operands) of periodic transcendental functions to the range permitted by these instructions. For example, the FPTAN (tangent) instruction requires its argument to be less than PI/4. Using PI/4 as a modulus, FPREM will reduce an argument so that it is in range of FPTAN. Because FPREM produces an exact result, the argument reduction does not introduce roundoff error into the calculation, even if several iterations are required to bring the argument into range. (The rounding of PI does not create the effect of a rounded argument, but of a rounded period.)

FPREM also provides the least-significant three bits of the quotient generated by FPREM (in C3 , C1, C0). This is also important for transcendental argument reduction, because it locates the original angle in the correct one of eight PI/4 segments of the unit circle (see table 2-4). If the quotient is less than 4, then C0 will be the value of C3 before FPREM was executed. If the quotient is less than 2, then C3 will be the value of C1 before FPREM was executed.

This explains why the lowest 3 bits of quotient are useful, and returned in status bits - and also that it's broken for small numbers, and useless unless the code has a way to set C3 and C1 to zero before executing FPREM. Which can be done but it's tricky and requires extra code and care.

Except... not. FPREM seems to work properly for all dividend / divisor pairs, even those that result in small quotient less than 4 or 2. In fact my tests didn't fail on 8087 either, and my original assumption was the 8087 had this issue but it was corrected in 287, yet for some reason the docs were not fully updated. It's important to note that starting with 287XL datasheets there is no longer any mention of FPREM glitching, just that it produces a correct 3-bit status.

What's more, the same manual has a code example on how to derive sine and cosine value using FPREM and FPTAN (just like in the explanation) and guess what, there isn't any attempt to work around those supposed glitchy bits for small numbers. In fact that code seems buggy because it uses FABS to bring negative arguments to positive range, except it's done after FPREM - in a clever attempt to hide some of the logic execution time while the 287 is busy anyway. But that won't work as FPREM will then calculate a negative quotient (since dividend, the original argument, is negative and divisor is positive PI/4 constant) and the status bits will be 3 lowest bits of a negative value - which means those will be negated vs what you'd get on positive values. So the octant detection logic will fail and the whole thing will return invalid values. Also, FABS has a hidden side-effect of setting C1 to zero so it'd actually make even more sense to have it in front of FPREM if the glitch was actually a thing.
That's not the only weird thing about that code, it also makes a special case for zero, and there are comments suggesting that FPTAN can't deal with zero argument while it can - and that is actually clearly indicated in other parts of the manual. Speaking of, the manual also says FSCALE can't deal with zero as exponent argument and that isn't the case either. Yet again that seems to work properly on 8087 too.

Well, it's not the first manual or datasheet that I know to have errors in it but it's Intel, a very popular chip, and more than one document is affected. And I do wonder how this whole FPREM/FSCALE thing came to be in the first place. Perhaps the original design of the 8087 was flawed but that was corrected even before it got manufactured? Or maybe some of the oldest 8087 are actually buggy but I certainly haven't found any 287 code with any workarounds for these supposed issues.

Reply 32 of 34, by root42

User metadata
Rank l33t
Rank
l33t

Finally got around to test it on the 386DX with 387:

https://youtu.be/IFueAukNyxk

YouTube and Bonus
80486DX@33 MHz, 16 MiB RAM, Tseng ET4000 1 MiB, SnarkBarker & GUSar Lite, PC MIDI Card+X2+SC55+MT32, OSSC

Reply 33 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

What do you know, the plot thickens once again. I've run some more tests and FSCALE does indeed have limitations on 287. Here's the results of two different approaches to e^x function, the A variant was written to avoid FSCALE issues, the B one was not (and is faster):

12a x=-0.800000 (-1.154156) e=0.449329 12b x=-0.800000 (-1.154156) e=0.449329 13a x=-0.700000 (-1.009887) e=0.496585 13b x=-0.70 […]
Show full quote

12a x=-0.800000 (-1.154156) e=0.449329
12b x=-0.800000 (-1.154156) e=0.449329
13a x=-0.700000 (-1.009887) e=0.496585
13b x=-0.700000 (-1.009887) e=0.496585
14a x=-0.600000 (-0.865617) e=0.548812
14b x=-0.600000 (-0.865617) e=0.000000
15a x=-0.500000 (-0.721348) e=0.606531
15b x=-0.500000 (-0.721348) e=inf
16a x=-0.400000 (-0.577078) e=0.670320
16b x=-0.400000 (-0.577078) e=0.000000
17a x=-0.300000 (-0.432809) e=0.740818
17b x=-0.300000 (-0.432809) e=inf
18a x=-0.200000 (-0.288539) e=0.818731
18b x=-0.200000 (-0.288539) e=0.000000
19a x=-0.100000 (-0.144270) e=0.904837
19b x=-0.100000 (-0.144270) e=0.000000
20a x=0.000000 (0.000000) e=1.000000
20b x=0.000000 (0.000000) e=1.000000
21a x=0.100000 (0.144270) e=1.105171
21b x=0.100000 (0.144270) e=inf
22a x=0.200000 (0.288539) e=1.221403
22b x=0.200000 (0.288539) e=inf
23a x=0.300000 (0.432809) e=1.349859
23b x=0.300000 (0.432809) e=0.000000
24a x=0.400000 (0.577078) e=1.491825
24b x=0.400000 (0.577078) e=inf
25a x=0.500000 (0.721348) e=1.648721
25b x=0.500000 (0.721348) e=0.000000
26a x=0.600000 (0.865617) e=1.822119
26b x=0.600000 (0.865617) e=inf
27a x=0.700000 (1.009887) e=2.013753
27b x=0.700000 (1.009887) e=2.013753
28a x=0.800000 (1.154156) e=2.225541
28b x=0.800000 (1.154156) e=2.225541
29a x=0.900000 (1.298426) e=2.459603
29b x=0.900000 (1.298426) e=2.459603

FSCALE works OK for zero, just not values between 0 and 1 - only the Intel manual explains this properly. Other documents, not so much. I wonder why it didn't trip my previous test - perhaps because I used 1 as a base for FSCALE so that it would be easier to write down results. Should've used PI or e instead.

Reply 34 of 34, by Deunan

User metadata
Rank Oldbie
Rank
Oldbie

Another update - so it looks to be a HW (microcode) bug. FSCALE expects an integer argument but will take anything and chop the fractional part off (that is, always round toward zero).

That chop is where the problem is. It works properly for all numbers except 0 < |x| < 1 range - here it will only chop the first 23 bits of the mantissa. So if the value was loaded from a 32-bit format (float), it'll work if FSCALE is used right after the load. If a value with higher precision is calculated or loaded like 64 or 80-bit one (double/extended) , and has more bits set to 1 in the mantissa after the first 23, the instruction will fail. My test was flawed and didn't trigger the bug because I only ever loaded 32-bit floats.

None of this is explained anywhere, possibly there were some errata docs but I can't find those now. Could've been under NDA. But now that I know this, the fix seem simple - just store the value to a 32-bit float in memory and re-load it again, right? Well, sure, that works, but it's costly. FSTP+FLD add some 130+ 287 cycles and that alone made my faster function slower than the original one. Adding a test to weed out numbers greater than 1 is also not viable, that is another 60+ cycles wasted and at that point you can just choose a different code path that skips FSCALE alltogether - which is what my other code does.

In the end I was able to slightly optimize the original procedure and left that in for 287 code. I dare say it doesn't get any better than this 😀