VOGONS


First post, by foul_owl

User metadata
Rank Newbie
Rank
Newbie

I know that dosbox running in debug mode can log instructions executed to a file, but I was wondering if it could dump the entire exe loaded in memory to an assembly file.

I know that dos programs will sometimes compress themselves, so having something like this I think would be useful.

By tracing through the execution, it seems like dosbox could be used to separate out code and data.

I realize that this is probably beyond the scope of the dosbox project, but dosbox seems exactly suited to this task.

Any information would be extremely helpful, thanks!

(running Ubuntu)

Reply 1 of 3, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Compressed DOS executables can often be unpacked with UNP or similar tools; however, programs that are encrypted or use unusual compression methods are another matter.

Decompiling is no easy thing, but there are specialized programs for it, such as Sourcer for DOS.

A tracing disassembly to separate code and data is an interesting concept, but seems tricky to do. It would have to follow all code branches, even those that are rarely executed. That seems straightforward when you have a call, jump, or conditional jump; but with indexed calls and jumps, where the index is computed... that would be a trick. Example: you arrive at a CALL BX instruction, what are valid/possible values for BX?

Reply 2 of 3, by ripa

User metadata
Rank Oldbie
Rank
Oldbie

I've been thinking of the exact same feature. My current process (when debugging 3 stooges sound for example ) has been to add a breakpoint in Dosbox to some interesting location, dumping the code and data segments to BIN files after hitting the breakpoint, and then loading the BINs into IDA free, which does a pretty good automatic analysis.

A tracing disassembly to separate code and data is an interesting concept, but seems tricky to do. It would have to follow all code branches, even those that are rarely executed. That seems straightforward when you have a call, jump, or conditional jump; but with indexed calls and jumps, where the index is computed... that would be a trick. Example: you arrive at a CALL BX instruction, what are valid/possible values for BX?

I don't think that would be a problem with this idea, since Dosbox knows exectly which memory locations it executes as code and which it reads or writes as data. You could separate code and data by just playing the game. Of course not all code branches would be taken and not all data would be used, but "unvisited" locations could be heuristically categorized as code or data based on, say, proximity to already recognized data or code.

For example, that CALL BX instruction's memory location would be classified as code, and the next instruction (where-ever BX sends the instruction pointer to) would be again classified as code when Dosbox executes it.

Reply 3 of 3, by ripsaw8080

User metadata
Rank DOSBox Author
Rank
DOSBox Author

I think it's true that tracing would do more with a computed jump or call than other types of disassemblers (at least the ones I'm familiar with), but you're still only taking execution paths based on what you do or don't do in the program. There are also issues of overlays, self-modifying code, and probably many others that don't come to mind immediately. I've debugged games that write temporary code byte-by-byte onto the stack and then execute it (not wonderful for the recompiling core, I'm sure). It's not too difficult for a person to figure out these things and then deal with them, but implementing such intelligence in a program is not trivial.