VOGONS


Common log format

Topic actions

First post, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Every single emulator I looked at has the ability to log executed instruction in one way or the other but the format is always different, therefore making log comparisons impossible. If we can come up with a common format, that can convey as much information as possible emulator developers can help each other.

Attacking the problem at the cycle level is hard and we can probably leave that for version 2. But we can come up with a format that logs instruction and registers. I don't like the AT&T dissasembly so we can use the Intel one.

So how about something like this

<32bit address> <instruction bytes hex> <mnemonic> <destination> [, source1]

So for example

000f004f 43 INC BX
000f537b EB 1C JMP 0x1C

For destination/source addresses we have the option to either display the calculated address "[0x45C2]" or decoded address "[DS:DI+3]".

The next step would be to come up with a register file print format where we can dump the registers in same format, again for easier diff-ing.

What do you think?

EDIT: Current spec version: https://sites.google.com/site/capex86/common

Last edited by vladstamate on 2017-10-28, 13:26. Edited 1 time in total.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 1 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

An issue that will rise is the fact that due to timing being different, we will need an ability (which CAPE already has) to disable logging for certain ranges like BIOS. This is extremely useful if we were to compare logs but the INT 08 come at a different times. So all we have to do is disable logging for any instruction above 0xF0000. This is not to say that different times for INT 08 is not a bug, but it is a timing bug. I have this feature to also only allow certain ranges, useful to say compare our VGA BIOS execution (only allow logs of instruction between 0xC0000 and 0xC8000).

The key is to start simple, have common ground and extend it from there.

For example, an emulator could load the log of another one, and, as it executes instructions it could compare live against the log and assert (or break in the debugger) when it spots a difference.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 2 of 90, by hottobar

User metadata
Rank Newbie
Rank
Newbie

Comparing the execution of two different emulators that execute the same(-ish) program is extremely difficult due to so many external variables like IRQs and different DOS's behaviour for different setups. I had an hard time comparing different execution logs for the same program, in the same emulator but slightly different OS environment. I ended up creating a special BIOS image that tries to test a 386 as much as possible, without the OS and with IRQs disabled. Unfortunately it's incomplete and incapable of detecting many subtle emulation bugs., but if enyone is interested: https://github.com/barotto/test386.asm

Anyhow, I could try to help for a common log format which is better than nothing for sure.

Reply 3 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've tried the test386.asm, compiling with masm(with a window full of the same warnings about an unfinished string) and tried running it in UniPCemu's Compaq Deskpro 386 emulation as a BIOS ROM (just some support added to the code to load diagnostic non-odd/even ROMs). It seems to go wrong right away(POST diagnostic code 00h, no normal parallel/serial file output is supported though(just serial mouse and modem and parallel Dosbox-based Disney Soundsource/Covox speech thing). Thus two parallel and one UART port is emulated.

The POST diagnostics card(port 80h) reports 0x00. So the first check?

Edit: The diagnostic port is never even written to(port 0x84, I've modified the POST port define to reflect that), before crashing in SDL2.dll due to a NULL pointer(SDL2 2.0.6)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 5 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

I'll try to put up a release of CAPE soon to support the logging format that I proposed above. Any inclination as to what to do with address? [0x1234] or [DS:BX] or DS:[BX] or ... ?

As for things like IRQ, limiting the range of addresses you log on will solve 99% of those issues (except for when IRQ handlers are not the BIOS original ones).

The other issue is how do you want to deal with prefixes?

1) segment override prefixes: for me they count as actual instruction but we can also roll them in the next instruction

000f004f 2E CS:
000f0050 89 34 MOV AX, CS:[BX]

versus

000f004f 2E 89 34 MOV AX, CS:[BX]

2) repetition prefixes. Those for me also count as actual instructions but they can be rolled as well:

REPZ
STOSW

or

REPZ STOSW

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 6 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

Didn't expect that as well. Usually SDL has ran without any problems, but this latest build seems to be crashing internally(within the SDL.dll's own thread, according to Visual Studio Community 2017?)? I've never ever have that happened when trying to run my emulator:S It's supposed to be stable(according to it's download page at libsdl.org)?

I don't have any special parallel/serial port hardware installed, only mouse and modem(serial) and Dosbox-based audio extension for Covox Speech Thing/Disney Sound Source(parallel) combination hardware. Although that shouldn't cause any problems, since it's only output?

I don't se any test386.asm thread you've mentioned?

@vladstamate: My emulator dumps them in a single-line form (e.g. first all read instruction bytes including prefixes between (), followed by a space and then the decoded instruction in readable format). That of course is followed by the entire register state, sorted by category on seperate lines(general purpose, segment, remaining registers) and finally some simple dump of hardware status(atm it's PIC status lines followed by Video adapter x,y coordinates for CRT and virtual coordinates in rendering status).

Of course some simplified formats are supported as well, but the decoding of prefixes cannot be seperated from each other like your emulator does: UniPCemu threats all prefix during it's instruction fetching, only getting a complete picture after the instruction is completely read, after which it starts decoding the instruction(setting up pointers for starting execution, like current instruction handler pointer, processor state handler(which can switch between the instruction, task switch and interrupt states. running one of those three at a time in parallel to the BIU).

Edit: In the examples you've mentioned, it's more like the latter cases you've mentioned, except no linear addresses are logged, but CS:(E)IP addresses instead. In between are the memory accesses done by the instruction(the instruction that's executed followed it's transferring data, which itself follows any other memory transactions(like interrupts firing). Stuff like reading of descriptors is logged as well. All memory accesses are logged as three-stage translation logs(on single line when enabled): virtual->linear->physical(as applied) or in reverse order when reading from memory or hardware(hardware input/output isn't logged itself, just RAM accesses).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 90, by hottobar

User metadata
Rank Newbie
Rank
Newbie

You can limit the address space of logged instructions, but many interrupt handlers are DOS programs below the 0xa0000 limit.

For addresses I'm currently using the DosBox approach so, for example, this is a typical log entry for me:
0028:800073B5 8D 14 9A lea edx,[edx+ebx*4]

So
<segment:offset> <instruction bytes hex> <mnemonic> <dest>,<src>

<segment> is the value of CS so in protected mode it's the gdt selector.
Note that 800073B5 is a valid offset when paging in on.

Imho segment prefixes should be on the same line of the executed instruction.
Same thing for REPs and operand/address size prefixes.

Last edited by hottobar on 2017-10-15, 14:35. Edited 1 time in total.

Reply 9 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

@hottobar: That's about the same format UniPCemu dumps it's instructions in, except it puts () arround the <instruction bytes hex> part, with no spaces between hex opcode byte representations. It's also what it dumps in one of it's formatting methods for logging(doesn't apply to some logging methods as well as single-line format logging).

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie
hottobar wrote:
You can limit the address space of logged instructions, but many interrupt handlers are DOS programs below the 0xa0000 limit. […]
Show full quote

You can limit the address space of logged instructions, but many interrupt handlers are DOS programs below the 0xa0000 limit.

For addresses I'm currently using the DosBox approach so, for example, this is a typical log entry for me:
0028:800073B5 8D 14 9A lea edx,[edx+ebx*4]

So
<segment:offset> <instruction bytes hex> <mnemonic> <dest>,<src>

<segment> is the value of CS so in protected mode it's the gdt selector.
Note that 800073B5 is a valid offset when paging in on.

Imho segment prefixes should be on the same line of the executed instruction.
Same thing for REPs and operand/address size prefixes.

I like this format. Lets go with that. I don't think we need ( ) as this is quite clean. I prefer CAPS for instructions and registers but I can change CAPE to go to lower case.

Question though. You seem to have a mix of single spaces and double spaces. Can we make it all single space? It makes much easier to parse:

0028:800073B5 8D 14 9A lea edx,[edx+ebx*4]

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 11 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

So all in caps and using the format given in the above post? That can be easily converted in UniPCemu to allow the new format(instructions are already in all caps atm).

Then there's the issue on memory accesses themselves, as well as register states. Both would have to be included, as they both carry information processed by the instruction. When logging fully, this is what UniPCemu generates atm(the test386.asm BIOS ROM executing) in it's "Always log, even during skipping, single line format, simplified" formatting setting):

00:00:56:77.09532: 		Physical(r):FFFFFFF0=EA(ê); Paged(r):FFFFFFF0=EA(ê); Physical(r):FFFFFFF1=45(E); Paged(r):FFFFFFF1=45(E); Physical(r):FFFFFFF2=00( ); Paged(r):FFFFFFF2=00( ); Physical(r):FFFFFFF3=00( ); Paged(r):FFFFFFF3=00( )
00:00:56:78.03260: Physical(r):FFFFFFF4=F0(ð); Paged(r):FFFFFFF4=F0(ð); Physical(r):FFFFFFF5=32(2); Paged(r):FFFFFFF5=32(2); Physical(r):FFFFFFF6=37(7); Paged(r):FFFFFFF6=37(7); Physical(r):FFFFFFF7=2F(/); Paged(r):FFFFFFF7=2F(/)
00:00:56:78.03300: F000:FFF0 (EA450000F0)JMP F000:0045
00:00:56:78.03376: Physical(r):000F0045=FA(ú); Paged(r):000F0045=FA(ú); Physical(r):000F0046=B4(´); Paged(r):000F0046=B4(´); Physical(r):000F0047=01(); Paged(r):000F0047=01(); Physical(r):000F0048=9E(ž); Paged(r):000F0048=9E(ž)
00:00:56:78.03448: Physical(r):000F0049=73(s); Paged(r):000F0049=73(s); Physical(r):000F004A=24($); Paged(r):000F004A=24($); Physical(r):000F004B=72(r); Paged(r):000F004B=72(r); Physical(r):000F004C=23(#); Paged(r):000F004C=23(#)
00:00:56:78.03520: Physical(r):000F004D=F4(ô); Paged(r):000F004D=F4(ô); Physical(r):000F004E=B4(´); Paged(r):000F004E=B4(´); Physical(r):000F004F=40(@); Paged(r):000F004F=40(@); Physical(r):000F0050=9E(ž); Paged(r):000F0050=9E(ž)
00:00:56:78.03856: Physical(r):000F0051=75(u); Paged(r):000F0051=75(u); Physical(r):000F0052=1C(); Paged(r):000F0052=1C(); Physical(r):000F0053=74(t); Paged(r):000F0053=74(t); Physical(r):000F0054=1D(); Paged(r):000F0054=1D()
00:00:56:78.03916: F000:0045 (FA)CLI
00:00:56:78.03972: F000:0046 (B401)MOVB AH, 01
00:00:56:78.04060: F000:0048 (9E)SAHF Physical(r):000F0055=F4(ô); Paged(r):000F0055=F4(ô); Physical(r):000F0056=B4(´); Paged(r):000F0056=B4(´); Physical(r):000F0057=04(); Paged(r):000F0057=04(); Physical(r):000F0058=9E(ž); Paged(r):000F0058=9E(ž)
00:00:56:78.04092: F000:0049 (7324)JAE 006F
00:00:56:78.04184: F000:004B (7223)JB 0070 Physical(r):000F0070=72(r); Paged(r):000F0070=72(r); Physical(r):000F0071=DC(Ü); Paged(r):000F0071=DC(Ü); Physical(r):000F0072=74(t); Paged(r):000F0072=74(t); Physical(r):000F0073=E2(â); Paged(r):000F0073=E2(â)
00:00:56:78.04256: Physical(r):000F0074=7A(z); Paged(r):000F0074=7A(z); Physical(r):000F0075=E8(è); Paged(r):000F0075=E8(è); Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î)
00:00:56:78.04332: Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó); Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô)
00:00:56:78.04404: Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸); Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( )
00:00:56:78.04492: F000:0070 (72DC)JB 004E Physical(r):000F004E=B4(´); Paged(r):000F004E=B4(´); Physical(r):000F004F=40(@); Paged(r):000F004F=40(@); Physical(r):000F0050=9E(ž); Paged(r):000F0050=9E(ž); Physical(r):000F0051=75(u); Paged(r):000F0051=75(u)
00:00:56:78.04572: Physical(r):000F0052=1C(); Paged(r):000F0052=1C(); Physical(r):000F0053=74(t); Paged(r):000F0053=74(t); Physical(r):000F0054=1D(); Paged(r):000F0054=1D(); Physical(r):000F0055=F4(ô); Paged(r):000F0055=F4(ô)
00:00:56:78.04648: Physical(r):000F0056=B4(´); Paged(r):000F0056=B4(´); Physical(r):000F0057=04(); Paged(r):000F0057=04(); Physical(r):000F0058=9E(ž); Paged(r):000F0058=9E(ž); Physical(r):000F0059=7B({); Paged(r):000F0059=7B({)
00:00:56:78.04784: Physical(r):000F005A=14(); Paged(r):000F005A=14(); Physical(r):000F005B=7A(z); Paged(r):000F005B=7A(z); Physical(r):000F005C=17(); Paged(r):000F005C=17(); Physical(r):000F005D=F4(ô); Paged(r):000F005D=F4(ô)
00:00:56:78.04824: F000:004E (B440)MOVB AH, 40
00:00:56:78.04852: F000:0050 (9E)SAHF
00:00:56:78.04948: F000:0051 (751C)JNE 006F Physical(r):000F005E=B4(´); Paged(r):000F005E=B4(´); Physical(r):000F005F=80(€); Paged(r):000F005F=80(€); Physical(r):000F0060=9E(ž); Paged(r):000F0060=9E(ž); Physical(r):000F0061=79(y); Paged(r):000F0061=79(y)
00:00:56:78.05056: F000:0053 (741D)JE 0072 Physical(r):000F0072=74(t); Paged(r):000F0072=74(t); Physical(r):000F0073=E2(â); Paged(r):000F0073=E2(â); Physical(r):000F0074=7A(z); Paged(r):000F0074=7A(z); Physical(r):000F0075=E8(è); Paged(r):000F0075=E8(è)
00:00:56:78.05128: Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î); Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó)
00:00:56:78.05200: Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô); Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸)
00:00:56:78.05272: Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( ); Physical(r):000F0080=9E(ž); Paged(r):000F0080=9E(ž); Physical(r):000F0081=73(s); Paged(r):000F0081=73(s)
00:00:56:78.05360: F000:0072 (74E2)JE 0056 Physical(r):000F0056=B4(´); Paged(r):000F0056=B4(´); Physical(r):000F0057=04(); Paged(r):000F0057=04(); Physical(r):000F0058=9E(ž); Paged(r):000F0058=9E(ž); Physical(r):000F0059=7B({); Paged(r):000F0059=7B({)
00:00:56:78.05428: Physical(r):000F005A=14(); Paged(r):000F005A=14(); Physical(r):000F005B=7A(z); Paged(r):000F005B=7A(z); Physical(r):000F005C=17(); Paged(r):000F005C=17(); Physical(r):000F005D=F4(ô); Paged(r):000F005D=F4(ô)
00:00:56:78.05500: Physical(r):000F005E=B4(´); Paged(r):000F005E=B4(´); Physical(r):000F005F=80(€); Paged(r):000F005F=80(€); Physical(r):000F0060=9E(ž); Paged(r):000F0060=9E(ž); Physical(r):000F0061=79(y); Paged(r):000F0061=79(y)
00:00:56:78.05760: Physical(r):000F0062=0C(); Paged(r):000F0062=0C(); Physical(r):000F0063=78(x); Paged(r):000F0063=78(x); Physical(r):000F0064=11(); Paged(r):000F0064=11(); Physical(r):000F0065=F4(ô); Paged(r):000F0065=F4(ô)
00:00:56:78.05796: F000:0056 (B404)MOVB AH, 04
00:00:56:78.05824: F000:0058 (9E)SAHF
00:00:56:78.05920: F000:0059 (7B14)JNP 006F Physical(r):000F0066=B4(´); Paged(r):000F0066=B4(´); Physical(r):000F0067=41(A); Paged(r):000F0067=41(A); Physical(r):000F0068=9E(ž); Paged(r):000F0068=9E(ž); Physical(r):000F0069=77(w); Paged(r):000F0069=77(w)
00:00:56:78.06012: F000:005B (7A17)JP 0074 Physical(r):000F0074=7A(z); Paged(r):000F0074=7A(z); Physical(r):000F0075=E8(è); Paged(r):000F0075=E8(è); Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î)
00:00:56:78.06192: Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó); Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô)
00:00:56:78.06272: Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸); Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( )
00:00:56:78.06348: Physical(r):000F0080=9E(ž); Paged(r):000F0080=9E(ž); Physical(r):000F0081=73(s); Paged(r):000F0081=73(s); Physical(r):000F0082=1B(); Paged(r):000F0082=1B(); Physical(r):000F0083=F4(ô); Paged(r):000F0083=F4(ô)
00:00:56:78.06440: F000:0074 (7AE8)JP 005E Physical(r):000F005E=B4(´); Paged(r):000F005E=B4(´); Physical(r):000F005F=80(€); Paged(r):000F005F=80(€); Physical(r):000F0060=9E(ž); Paged(r):000F0060=9E(ž); Physical(r):000F0061=79(y); Paged(r):000F0061=79(y)
00:00:56:78.06508: Physical(r):000F0062=0C(); Paged(r):000F0062=0C(); Physical(r):000F0063=78(x); Paged(r):000F0063=78(x); Physical(r):000F0064=11(); Paged(r):000F0064=11(); Physical(r):000F0065=F4(ô); Paged(r):000F0065=F4(ô)
00:00:56:78.06584: Physical(r):000F0066=B4(´); Paged(r):000F0066=B4(´); Physical(r):000F0067=41(A); Paged(r):000F0067=41(A); Physical(r):000F0068=9E(ž); Paged(r):000F0068=9E(ž); Physical(r):000F0069=77(w); Paged(r):000F0069=77(w)
00:00:56:78.06660: Physical(r):000F006A=04(); Paged(r):000F006A=04(); Physical(r):000F006B=76(v); Paged(r):000F006B=76(v); Physical(r):000F006C=0B(); Paged(r):000F006C=0B(); Physical(r):000F006D=EB(ë); Paged(r):000F006D=EB(ë)
00:00:56:78.06684: F000:005E (B480)MOVB AH, 80
00:00:56:78.06700: F000:0060 (9E)SAHF
00:00:56:78.06800: F000:0061 (790C)JNS 006F Physical(r):000F006E=0B(); Paged(r):000F006E=0B(); Physical(r):000F006F=F4(ô); Paged(r):000F006F=F4(ô); Physical(r):000F0070=72(r); Paged(r):000F0070=72(r); Physical(r):000F0071=DC(Ü); Paged(r):000F0071=DC(Ü)
00:00:56:78.06888: F000:0063 (7811)JS 0076 Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î); Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó)
00:00:56:78.06964: Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô); Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸)
00:00:56:78.07032: Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( ); Physical(r):000F0080=9E(ž); Paged(r):000F0080=9E(ž); Physical(r):000F0081=73(s); Paged(r):000F0081=73(s)
00:00:56:78.09584: Physical(r):000F0082=1B(); Paged(r):000F0082=1B(); Physical(r):000F0083=F4(ô); Paged(r):000F0083=F4(ô); Physical(r):000F0084=B4(´); Paged(r):000F0084=B4(´); Physical(r):000F0085=95(•); Paged(r):000F0085=95(•)
00:00:56:78.09784: F000:0076 (78EE)JS 0066 Physical(r):000F0066=B4(´); Paged(r):000F0066=B4(´); Physical(r):000F0067=41(A); Paged(r):000F0067=41(A); Physical(r):000F0068=9E(ž); Paged(r):000F0068=9E(ž); Physical(r):000F0069=77(w); Paged(r):000F0069=77(w)
00:00:56:79.00220: Physical(r):000F006A=04(); Paged(r):000F006A=04(); Physical(r):000F006B=76(v); Paged(r):000F006B=76(v); Physical(r):000F006C=0B(); Paged(r):000F006C=0B(); Physical(r):000F006D=EB(ë); Paged(r):000F006D=EB(ë)
00:00:56:79.00308: Physical(r):000F006E=0B(); Paged(r):000F006E=0B(); Physical(r):000F006F=F4(ô); Paged(r):000F006F=F4(ô); Physical(r):000F0070=72(r); Paged(r):000F0070=72(r); Physical(r):000F0071=DC(Ü); Paged(r):000F0071=DC(Ü)
00:00:56:79.00376: Physical(r):000F0072=74(t); Paged(r):000F0072=74(t); Physical(r):000F0073=E2(â); Paged(r):000F0073=E2(â); Physical(r):000F0074=7A(z); Paged(r):000F0074=7A(z); Physical(r):000F0075=E8(è); Paged(r):000F0075=E8(è)
00:00:56:79.00404: F000:0066 (B441)MOVB AH, 41
00:00:56:79.00420: F000:0068 (9E)SAHF
00:00:56:79.00508: F000:0069 (7704)JA 006F Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î); Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó)
00:00:56:79.00636: F000:006B (760B)JBE 0078 Physical(r):000F0078=76(v); Paged(r):000F0078=76(v); Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó); Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô)
00:00:56:79.00712: Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸); Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( )
00:00:56:79.00784: Physical(r):000F0080=9E(ž); Paged(r):000F0080=9E(ž); Physical(r):000F0081=73(s); Paged(r):000F0081=73(s); Physical(r):000F0082=1B(); Paged(r):000F0082=1B(); Physical(r):000F0083=F4(ô); Paged(r):000F0083=F4(ô)
00:00:56:79.00856: Physical(r):000F0084=B4(´); Paged(r):000F0084=B4(´); Physical(r):000F0085=95(•); Paged(r):000F0085=95(•); Physical(r):000F0086=9E(ž); Paged(r):000F0086=9E(ž); Physical(r):000F0087=75(u); Paged(r):000F0087=75(u)
00:00:56:79.00944: F000:0078 (76F3)JBE 006D Physical(r):000F006D=EB(ë); Paged(r):000F006D=EB(ë); Physical(r):000F006E=0B(); Paged(r):000F006E=0B(); Physical(r):000F006F=F4(ô); Paged(r):000F006F=F4(ô); Physical(r):000F0070=72(r); Paged(r):000F0070=72(r)
Show last 8 lines
00:00:56:79.01284: 		Physical(r):000F0071=DC(Ü); Paged(r):000F0071=DC(Ü); Physical(r):000F0072=74(t); Paged(r):000F0072=74(t); Physical(r):000F0073=E2(â); Paged(r):000F0073=E2(â); Physical(r):000F0074=7A(z); Paged(r):000F0074=7A(z)
00:00:56:79.01360: Physical(r):000F0075=E8(è); Paged(r):000F0075=E8(è); Physical(r):000F0076=78(x); Paged(r):000F0076=78(x); Physical(r):000F0077=EE(î); Paged(r):000F0077=EE(î); Physical(r):000F0078=76(v); Paged(r):000F0078=76(v)
00:00:56:79.01432: Physical(r):000F0079=F3(ó); Paged(r):000F0079=F3(ó); Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô); Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž)
00:00:56:79.01520: F000:006D (EB0B)JMP 007A Physical(r):000F007A=B4(´); Paged(r):000F007A=B4(´); Physical(r):000F007B=D4(Ô); Paged(r):000F007B=D4(Ô); Physical(r):000F007C=9E(ž); Paged(r):000F007C=9E(ž); Physical(r):000F007D=B8(¸); Paged(r):000F007D=B8(¸)
00:00:56:79.01596: Physical(r):000F007E=00( ); Paged(r):000F007E=00( ); Physical(r):000F007F=00( ); Paged(r):000F007F=00( ); Physical(r):000F0080=9E(ž); Paged(r):000F0080=9E(ž); Physical(r):000F0081=73(s); Paged(r):000F0081=73(s)
00:00:56:79.01668: Physical(r):000F0082=1B(); Paged(r):000F0082=1B(); Physical(r):000F0083=F4(ô); Paged(r):000F0083=F4(ô); Physical(r):000F0084=B4(´); Paged(r):000F0084=B4(´); Physical(r):000F0085=95(•); Paged(r):000F0085=95(•)
00:00:56:79.01740: Physical(r):000F0086=9E(ž); Paged(r):000F0086=9E(ž); Physical(r):000F0087=75(u); Paged(r):000F0087=75(u); Physical(r):000F0088=17(); Paged(r):000F0088=17(); Physical(r):000F0089=F4(ô); Paged(r):000F0089=F4(ô)

That is already in a parsable format, seperated by tabs. Although it still contains timestamps of execution currently(which I might need to be able to disable when logging this for compatibility with other logs(it's enabled by the logging function by default).

Also, one other thing: UniPCemu uses B/W/D suffixes on instructions that have different operand sizes(Byte/Word/Dword). Although this usually isn't included in most emulators, it does make it clear what kind of operation is performed(and helps detect faults in operand and memory access sizes). For example NEGW [CS:0000] vs NEGD [CS:0000], which isn't usually visible with simple disassembly(except when adding different keywords, which increases the log size by a lot(e.g. "dword ptr" "word ptr" "byte ptr"), as well as complicating the whole ModR/M decoding process(that would need different decoders for all operand sizes instead of a single one for 16-bit and a single one for 32-bit offsets)).

The register log currently used is dependant on the logging mode, as well as the current CPU mode(real vs protected vs virtual 8086) and the emulated CPU(removing some registers from the log or 16-bit registers only):

void debugger_logregisters(char *filename, CPU_registers *registers, byte halted, byte isreset)
{
if ((DEBUGGER_LOG==DEBUGGERLOG_ALWAYS_NOREGISTERS) || (DEBUGGER_LOG==DEBUGGERLOG_ALWAYS_SINGLELINE) || (DEBUGGER_LOG==DEBUGGERLOG_DEBUGGING_SINGLELINE) || (DEBUGGER_LOG==DEBUGGERLOG_ALWAYS_SINGLELINE_SIMPLIFIED) || (DEBUGGER_LOG==DEBUGGERLOG_DEBUGGING_SINGLELINE_SIMPLIFIED)) return; //Don't log the register state?
if (!registers || !filename) //Invalid?
{
dolog(filename,"Log registers called with invalid argument!");
return; //Abort!
}
if (EMULATED_CPU<=CPU_80286) //Emulating 80(1)86 registers?
{
#ifndef LOGFLAGSONLY
dolog(filename,"Registers:"); //Start of the registers!
dolog(filename,"AX: %04X, BX: %04X, CX: %04X, DX: %04X",registers->AX,registers->BX,registers->CX,registers->DX); //Basic registers!
if (registers->CR0&1) //Protected mode?
{
dolog(filename,"CS: %04X, DS: %04X, ES: %04X, SS: %04X, TR: %04X, LDTR:%04X",registers->CS,registers->DS,registers->ES,registers->SS,registers->TR,registers->LDTR); //Segment registers!
}
else //Real mode?
{
dolog(filename,"CS: %04X, DS: %04X, ES: %04X, SS: %04X",registers->CS,registers->DS,registers->ES,registers->SS); //Segment registers!
}
dolog(filename,"SP: %04X, BP: %04X, SI: %04X, DI: %04X",registers->SP,registers->BP,registers->SI,registers->DI); //Segment registers!
dolog(filename,"IP: %04X, FLAGS: %04X",registers->IP,registers->FLAGS); //Rest!
if (EMULATED_CPU==CPU_80286) //80286 has CR0 as well?
{
dolog(filename, "CR0: %04X", (registers->CR0&0xFFFF)); //Rest!
dolog(filename,"GDTR: " LONGLONGSPRINTX ", IDTR: " LONGLONGSPRINTX,(LONG64SPRINTF)registers->GDTR.data,(LONG64SPRINTF)registers->IDTR.data); //GDTR/IDTR!
}
#endif
dolog(filename,"FLAGSINFO:%s%c",debugger_generateFlags(registers),decodeHLTreset(halted,isreset)); //Log the flags!
//More aren't implemented in the 80(1/2)86!
}
else //80386+? 32-bit registers!
{
dolog(filename,"Registers:"); //Start of the registers!
#ifndef LOGFLAGSONLY
dolog(filename,"EAX: %08x, EBX: %08x, ECX: %08x, EDX: %08x",registers->EAX,registers->EBX,registers->ECX,registers->EDX); //Basic registers!

if (registers->CR0&1) //Protected mode?
{
dolog(filename,"CS: %04X, DS: %04X, ES: %04X, FS: %04X, GS: %04X SS: %04X, TR: %04X, LDTR:%04X",registers->CS,registers->DS,registers->ES,registers->FS,registers->GS,registers->SS,registers->TR,registers->LDTR); //Segment registers!
}
else //Real mode?
{
dolog(filename,"CS: %04X, DS: %04X, ES: %04X, FS: %04X, GS: %04X SS: %04X",registers->CS,registers->DS,registers->ES,registers->FS,registers->GS,registers->SS); //Segment registers!
}

dolog(filename,"ESP: %08x, EBP: %08x, ESI: %08x, EDI: %08x",registers->ESP,registers->EBP,registers->ESI,registers->EDI); //Segment registers!
dolog(filename,"EIP: %08x, EFLAGS: %08x",registers->EIP,registers->EFLAGS); //Rest!

dolog(filename, "CR0: %08X; CR1: %08X; CR2: %08X; CR3: %08X", registers->CR0, registers->CR1, registers->CR2, registers->CR3); //Rest!
dolog(filename, "CR4: %08X; CR5: %08X; CR6: %08X; CR7: %08X", registers->CR4, registers->CR5, registers->CR6, registers->CR7); //Rest!

dolog(filename, "DR0: %08X; DR1: %08X; DR2: %08X; CR3: %08X", registers->DR0, registers->DR1, registers->DR2, registers->DR3); //Rest!
if (EMULATED_CPU>=CPU_PENTIUM) //Pentium has DR4?
{
dolog(filename, "DR4: %08X; DR6: %08X; DR5&7: %08X", registers->DR4, registers->DR6, registers->DR7); //Rest!
}
else //DR4=>DR6
{
Show last 9 lines
			dolog(filename, "DR6: %08X; DR5&7: %08X", registers->DR6, registers->DR7); //Rest!
}

dolog(filename,"GDTR: " LONGLONGSPRINTX ", IDTR: " LONGLONGSPRINTX,(LONG64SPRINTF)registers->GDTR.data,(LONG64SPRINTF)registers->IDTR.data); //GDTR/IDTR!
#endif
//Finally, flags seperated!
dolog(filename,"FLAGSINFO:%s%c",debugger_generateFlags(registers),(char)(halted?'H':' ')); //Log the flags!
}
}

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 12 of 90, by hottobar

User metadata
Rank Newbie
Rank
Newbie
vladstamate wrote:
Question though. You seem to have a mix of single spaces and double spaces. Can we make it all single space? It makes much easie […]
Show full quote

Question though. You seem to have a mix of single spaces and double spaces. Can we make it all single space? It makes much easier to parse:

0028:800073B5 8D 14 9A lea edx,[edx+ebx*4]

Sorry, I mangled spacing when pasting the log entry.
Actually my current format is the following (in sprintf format):

%04X:%08X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %s

Translated in english: <segment:offset> <instruction hexes as "1-byte space" repeated 15 times> <disassebly>

An instruction byte is replaced by 2 spaces if the byte is not defined, so I always have a fixed width column of 45 characters.

Example:

0028:80282D6C 80 7A FF 00                                  cmp byte [edx-0001],00

The disassembler is the DosBox's one.

Reply 13 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

@superfury: I agree, there are a lot of useful information to help with debugging, but we should tackle things one at a time. For now lets worry about instruction format. Then we can agree on things like register dump, memory accesses, etc.

@hottobar: I like that format, except for the empty spaces. When we are dumping millions of instructions that can easily double the log size for no actual useful reason. We should just print the instruction bytes followed by space then instruction decoding. Compact is important.

Also it seems we are going with no caps. Me and Superfury will have to change our dissasemblers since we both output caps. Since Dosbox and IBMulator already do no caps and it will be harder to get someone to change Dosbox for this.

I am willing to make necessary changes to my emulator because common log format will help tremendously with debugging. If we manage to agree we'll have 4 emulators with same format and we might be able to change bochs too.

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 14 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Here is the result of changing CAPE to match the new log format. There are still some issues. This is the log from running test386.asm (first few instructions):

f000:00000045 fa cli 
f000:00000046 b4 01 mov ah,0x1
f000:00000048 9e sahf
f000:00000049 73 24 jae 0x24
f000:0000004e b4 40 mov ah,0x40
f000:00000050 9e sahf
f000:00000051 75 1c jnz 0x1c
f000:00000056 b4 04 mov ah,0x4
f000:00000058 9e sahf
f000:00000059 7b 14 jpo 0x14
f000:0000005e b4 80 mov ah,0x80
f000:00000060 9e sahf
f000:00000061 79 0c jns 0xc

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 15 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

@vladstamate: wondering about something, though: shouldn't the (conditional) jumps display the destination ip address, like UniPCemu does? Displaying the raw unsigned byte/word value won't give any clarity? When displaying the IP+disp8 result instead(like UniPCemu does) works both as normal disassemblers do with their labeling(like 0xfff0: jmp ... and ... jmp 0xfff0 for the jump to that offset)? That way you easily see the jump being (not) taken, as well as that the address is (in)correct? Although UniPCemu always displays in hex format, making 0x or h pre/suffix unneeded, like the opcode bytes?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 16 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

Good point. Yes I think you are right that displaying the final jmp would be more useful. Should that be a 32bit EIP style address? Like this:

f000:00000051 75 1c jnz 0x6d

As for you second point, ok, lets dump the 0x in front, I agree it is superfluous. Making it like this:

f000:00000051 75 1c jnz 0000006d

Does that look better?

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 17 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

yes. And should the size of 16-bits use four or eight hex digits? Like when it's in real/virtual 8086 or 16-bit protected mode? Although eight would make it use a lot of size, but incompatible with huge unreal mode(32-bit EIP used in unreal mode)? The same applies to the execution address?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 18 of 90, by vladstamate

User metadata
Rank Oldbie
Rank
Oldbie

I think we should keep 8 digits for anything related to EIP, like execution address or jump/loop/call destination. You are right that it uses more space but makes it for easy searching. Since addresses (well offsets) are always 8 digits we can search for them in log files much easier.

Also for simplicity, so we do not have different formats when you need EIP as opposed to IP.

We pad with 0 all hex numbers.

Segment: 4 digits (f000)
EIP: 8 digts (00000045)
opcode bytes: 2 digits each (1f 05)

YouTube channel: https://www.youtube.com/channel/UC7HbC_nq8t1S9l7qGYL0mTA
Collection: http://www.digiloguemuseum.com/index.html
Emulator: https://sites.google.com/site/capex86/
Raytracer: https://sites.google.com/site/opaqueraytracer/

Reply 19 of 90, by superfury

User metadata
Rank l33t++
Rank
l33t++

Is IP even used on the CPU instruction fetching with 32-bit CPUs? Is it dependant on the operand size of the CS descriptor, or never used? So e.g. loading low word without loading high word as well?

Last edited by superfury on 2017-10-27, 13:27. Edited 3 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io