PCEm. Another PC emulator.

Schedules and announcements about program releases.

Re: PCEm. Another PC emulator.

Postby hail-to-the-ryzen » 2017-6-04 @ 00:33

In x86_ops_misc.h, there are several cases of CPU division (x86) which have a higher cycle cost in 486/Pentium. Did the time to complete a division operation increase between the 386 and the 486 design?
Code: Select all
                case 0x30: /*DIV AL,b*/
                src16 = AX;
                if (dst) tempw = src16 / dst;
                if (dst && !(tempw & 0xff00))
                {
                        AH = src16 % dst;
                        AL = (src16 / dst) &0xff;
                        flags_rebuild();
                        flags |= 0x8D5; /*Not a Cyrix*/
                }
                else
                {
                        x86_int(0);
                        return 1;
                }
                CLOCK_CYCLES(is486 ? 16 : 14);
...
                case 0x30: /*DIV AL,b*/
                src16 = AX;
                if (dst) tempw = src16 / dst;
                if (dst && !(tempw & 0xff00))
                {
                        AH = src16 % dst;
                        AL = (src16 / dst) &0xff;
                        flags_rebuild();
                        flags |= 0x8D5; /*Not a Cyrix*/
                }
                else
                {
                        x86_int(0);
                        return 1;
                }
                CLOCK_CYCLES(is486 ? 16 : 14);
...
                case 0x30: /*DIV AX,w*/
                templ = (DX << 16) | AX;
                if (dst) templ2 = templ / dst;
                if (dst && !(templ2 & 0xffff0000))
                {
                        DX = templ % dst;
                        AX = (templ / dst) & 0xffff;
                        setznp16(AX); /*Not a Cyrix*/                                               
                }
                else
                {
//                        fatal("DIVw BY 0 %04X:%04X %i\n",cs>>4,pc,ins);
                        x86_int(0);
                        return 1;
                }
                CLOCK_CYCLES(is486 ? 24 : 22);
...
                case 0x30: /*DIV AX,w*/
                templ = (DX << 16) | AX;
                if (dst) templ2 = templ / dst;
                if (dst && !(templ2 & 0xffff0000))
                {
                        DX = templ % dst;
                        AX = (templ / dst) & 0xffff;
                        setznp16(AX); /*Not a Cyrix*/                                               
                }
                else
                {
//                        fatal("DIVw BY 0 %04X:%04X %i\n",cs>>4,pc,ins);
                        x86_int(0);
                        return 1;
                }
                CLOCK_CYCLES(is486 ? 24 : 22);
...
                case 0x30: /*DIV EAX,l*/
                if (divl(dst))
                        return 1;
                setznp32(EAX); /*Not a Cyrix*/
                CLOCK_CYCLES((is486) ? 40 : 38);
                break;
...
                case 0x30: /*DIV EAX,l*/
                if (divl(dst))
                        return 1;
                setznp32(EAX); /*Not a Cyrix*/
                CLOCK_CYCLES((is486) ? 40 : 38);
                break;

Edit: and here
Code: Select all
static int opENTER_l(uint32_t fetchdat)
{
        uint16_t offset = getwordf();
        int count = (fetchdat >> 16) & 0xff; cpu_state.pc++;
        uint32_t tempEBP = EBP, tempESP = ESP, frame_ptr;
       
        PUSH_L(EBP); if (cpu_state.abrt) return 1;
        frame_ptr = ESP;
       
        if (count > 0)
        {
                while (--count)
                {
                        uint32_t templ;
                       
                        EBP -= 4;
                        templ = readmeml(ss, EBP);
                        if (cpu_state.abrt) { ESP = tempESP; EBP = tempEBP; return 1; }
                        PUSH_L(templ);
                        if (cpu_state.abrt) { ESP = tempESP; EBP = tempEBP; return 1; }
                        CLOCK_CYCLES(3);
                }
                PUSH_L(frame_ptr);
                if (cpu_state.abrt) { ESP = tempESP; EBP = tempEBP; return 1; }
                CLOCK_CYCLES(3);
        }
        EBP = frame_ptr;
       
        if (stack32) ESP -= offset;
        else          SP -= offset;
        CLOCK_CYCLES((is486) ? 14 : 10);
        return 0;
}
hail-to-the-ryzen
Newbie
 
Posts: 61
Joined: 2017-3-09 @ 01:34

Re: PCEm. Another PC emulator.

Postby SarahWalker » 2017-6-04 @ 07:18

Yes, there are a handful of instructions that are slower on 486 than on 386. RCL/RCR are more examples of this.
SarahWalker
Newbie
 
Posts: 35
Joined: 2016-5-12 @ 17:07

Re: PCEm. Another PC emulator.

Postby hail-to-the-ryzen » 2017-6-04 @ 09:07

Thank you for the information. The MIPS rating of the emulated 486 CPUs seems to also agree with real benchmarking. However, the Pentium CPUs have a much higher MIPS rating than the 486. Doesn't this indicate that the Pentium is processing instructions at a higher rate and that the cycle count does not fully model the overall CPU speed? If this is true, then are there any software that would be affected?
hail-to-the-ryzen
Newbie
 
Posts: 61
Joined: 2017-3-09 @ 01:34

Re: PCEm. Another PC emulator.

Postby Scali » 2017-6-04 @ 10:57

hail-to-the-ryzen wrote:Thank you for the information. The MIPS rating of the emulated 486 CPUs seems to also agree with real benchmarking. However, the Pentium CPUs have a much higher MIPS rating than the 486. Doesn't this indicate that the Pentium is processing instructions at a higher rate and that the cycle count does not fully model the overall CPU speed? If this is true, then are there any software that would be affected?


The Pentium is a superscalar pipeline with two parallel execution pipes (the U-pipe and the V-pipe), so in theory it can process about twice as fast as the 486.
I assume PCem emulates the two pipes correctly, resulting in the much higher MIPS rating, as expected.

See also here: https://en.wikipedia.org/wiki/Instructions_per_second
486DX2-66: 25.6 MIPS
486DX4-100: 70 MIPS
Pentium 100: 188 MIPS
Normalize that for clock speed:
486DX2-66: 25.6/66 = 0.39
486DX4-100: 70/100 = 0.70
Pentium 100: 188/100 = 1.88
Advances in caching and memory technology probably explain why it's far more than twice as fast in the Dhrystone test.
Scali
l33t
 
Posts: 2783
Joined: 2014-12-13 @ 14:24

Re: PCEm. Another PC emulator.

Postby SarahWalker » 2017-6-04 @ 11:57

I'd take the measurements on that wiki page with an enormous pinch of salt - no way is there that much difference between a DX2/66 and DX4/100!

But yes, PCem does emulate the Pentium's dual integer pipelines, hence the major performance difference. Have a look at codegen_timing_pentium.c.
SarahWalker
Newbie
 
Posts: 35
Joined: 2016-5-12 @ 17:07

Re: PCEm. Another PC emulator.

Postby Scali » 2017-6-04 @ 12:54

SarahWalker wrote:I'd take the measurements on that wiki page with an enormous pinch of salt - no way is there that much difference between a DX2/66 and DX4/100!


True... at least, if both CPUs were tested in an otherwise equal system.
Then again, if the DX2-66 had no L2 cache, but the DX4-100 did, perhaps that would explain these numbers somewhat.
Anyway, bottom line is: the leap from 486 to Pentium was one of the largest in x86 history.
Scali
l33t
 
Posts: 2783
Joined: 2014-12-13 @ 14:24

Re: PCEm. Another PC emulator.

Postby hail-to-the-ryzen » 2017-6-19 @ 01:23

Testing UT99 non-mmx software renderer in PCem and noted the lighting issue which was previously documented. However, even though the cause is likely the gouraud-shaded triangle routines, it doesn't seem that the colors are mistakenly mismatched. This can be verified by the working mmx software renderer which shows that the light sources are reflecting off models across a greater area than the non-mmx artifacts. I think it is more likely that the artifacts are from insufficient precision in those routines.


Edit: the PCem author is correct. The non-mmx software rendering lighting issue is a color mismatch in the gouraud-shaded triangle routines. However, the blue and green are mismatched:
dRY = (R2-R1) / dY
dGY = (B2-B1) / dY
dBY = (G2-G1) / dY

Tested in v400 and v428.
hail-to-the-ryzen
Newbie
 
Posts: 61
Joined: 2017-3-09 @ 01:34

Previous

Return to Release Announcements

Who is online

Users browsing this forum: No registered users and 2 guests