squelch41 wrote on 2024-11-24, 10:09:
I'd love to understand how you figured out how to do this (in simple terms, I'm a doctor and have no engineering training!)
How did you know where the bios was looking for the option rom
I will try my best to give a simple description of the process but if something isn't clear, feel free to ask questions.
I have read the code with the help of Hiew, an interactive hex-editor and a disassembler tool for viewing and patching binary files.
And used a little bit of ndisasm for the initial search, because it produces text that can be easily be processed by other tools, such as unix utilities like grep and sort.
As you probably already know, option ROMs are code modules which act like BIOS add-ons, usually enhancing it by providing some kind of hardware support functions. Each option ROM starts with a 3-byte header: 0x55, 0xAA and the size in 512-byte units (including this header). If you add every byte of the option ROM, it should be divisable by 256 (or add up to 0 in an 8-bit register which has the same effect) - that is the checksum algorithm. Most of the time free space is filled with 0xFF and the last byte is used to make the checksum correct. The main BIOS scans the memory address space for option ROMs and does a far call to each of them at the offset of 3 bytes - just after the header, then expects the option ROM to do a far return back to the main BIOS.
Here are some screenshots - that I have edited just a little bit to make them more descriptive.
The attachment 20241124_11h34m13s_grim.png is no longer available
First I opened the whole file in Hiew, switch to disassembly mode (F4, Decode; or just press Enter twice), then set up the base offset (Ctrl+F5, "E0000", Enter) so the last byte would have the address of FFFFF - as it does on the hardware when the ROM chip is mapped to the bus (on 386+ it is actually FFFFFFFF but it doesn't matter now).
BIOS ROM code runs in 16-bit real mode. That means that the current memory location the CPU is executing instructions from is specified by a pair of two registers: CS (code segment) and IP (instruction pointer). Each of those registers is 16-bit. CPU combines them (CS*16+IP) resulting in a 20-bit effective address. So by setting the base offset, we are looking at the same addresses as the CPU would use.
You can see that a single hex byte is highlighted - it is the cursor, and the position of the cursor can be seen in the top (status) line: 000F09DA.
Byte patterns can be interpreted as a stream of CPU instructions which the disassembly mode of Hiew does. Each line represents a single instruction - the left column shows the offset and hex bytes, the right column shows an x86 assembly language representation.
If you already know the address you want to examine (like if you just want to follow these screenshots), you can press F5, type the address in hex and press Enter. Hiew always treats every byte as code, even if it is not and sometimes gets confused - x86 instructions have variable length and one needs to synchorize the start of the instruction stream. If the synchronization is lost, you can force it by selecting the first byte with the cursor and pressing '/' - Hiew will restart the disassembly from that specific byte offset.
Some instructions can have references to each other - for those Hiew prints a gray digit. A digit to the left of the instruction is a label (xref-from), a digit to the right is a pointer (xref-to) with an arrow showing the relative direction of the target address. If you press the corresponding digit on a keyboard, Hiew would jump to the target.
The attachment 20241124_11h34m13s_grim.png is no longer available
This is a ROM scan loop - it can be found by searching for a "55 aa" byte pattern. To do a byte search, press F7, type in the hex values and press Enter - Hiew will jump to the first result. Then press Shift+F7 to advance to the next result.
It sets the data segment (DS) to BX (F09DE), SI to 0 (F09E0) and loads a 16-bit word from DS:SI to AX (F09E2), then compares it to "55 AA" (F09E5). If it has matched, it loads the next byte (F09E9) which specifies the ROM size in 512-byte units (conversion is done at F09F3..F09FC), then calculates the checksum by running a nested loop (F09FE..F0A03, see the "3" xrefs) which reads every byte and adds it to an 8-bit register, AH. Then checks if the checksum is zero. If it is, updates some internal variables pointed by ES:DI (F0A06..F0A14), moves the DI pointer back to the start of the option ROM (F0A14) and runs the ROM code (F0A1A) and goes to the end of the loop (F0A22).
If one of the checks has failed, it goes to F0A24. This path adds 0x40 to BX (F0A24) and then another 0x40 (F0A2E) if the start address was flagged by the least-significant bit (F0A27..F0A2C). If the result of addition does not exceed DX, restart the loop (F0A32..F0A36).
That menas that at when we start the ROM scan loop, BX is loaded with the minimum segment address and DX with the maximum - that sets the scan range.
Now we know how the ROM scan works, let's find how it is called. The function that does the ROM scan starts at F09CB - the instruction following the "ret" from the previous function. By pressing '+' at F09CB, a bookmark will be added - the top line will have a diamond in the first position in the middle, which means Alt-1 would go back to that location. If we select that address and press F6, Hiew will scan the whole file for the references and print a list of them.
The attachment 20241124_11h38m32s_grim.png is no longer available
Following these references, we will get those fragments of code. Here a bookmark would be useful for going back, pressing F6 again and following the next reference.
The attachment 20241124_11h38m48s_grim.png is no longer available
The first one outputs the POST code 0x31 and calls the ROM scan function with a range of C800..E000.
The second one sets the least-significant bit of BX to 1, indicating a larger (0x80) step and scans F000..F800.
The third one scans a range of E000..EC00.
squelch41 wrote on 2024-11-24, 10:09:
I wrote a small piece of code (stub) that calls VGA option rom first, then XTIDE. I have put that code at the next free location, EE000.
The attachment 20241124_12h07m29s_grim.png is no longer available
Hiew can not only disassemble the code, but also can do the reverse. If you press F3 to go into the edit mode, then press Tab, it will open up the Assembler window. You can type a single instruction and press Enter, and this instruction will be assembled into bytes and would overwrite the current location at the cursor, moving the cursor down. Once you have entered your last instruction, press Esc to close the window and F9 to save the changes into the file.
First I save (EE000) AX on the stack - it may be not required but generally it is good to restore every register you have changed once you return from the function. In normal calling conventions it is called "callee-save register" but old BIOSes might not follow any of them, so it is safer to do so.
Then I create a far return address. Normally you can call a function with "call" or "call far" instructions - they save the return address (which is the address of the next instruction) to the stack and jump to the specified address. When your function is done, it executes a "ret" (Hiew calls it "retn" which means "ret near") or "retf" instruction, which does the reverse - pops the address of the stack and goes back. But no one can stop you from using "call" without a "ret" or "ret" without a "call". Far calls put a wide pointer to the stack - both the segment and the offset, allowing inter-segment calls, while near calls just use the offset, keeping CS (code segment register) the same.
So the next instruction (EE001) pushes the current code segment to the stack. And the next one (EE002), "call", pushes the address of the next instruction.
Now the stack looks like this:
- old AX value
- E000 (code segment)
- E005 (offset) <-- stack pointer points here
We pop (EE005) the offset from the stack into AX (that's why we have saved it - here we are overwriting it) and add 9 to it, which makes AX=0xE00E, a pointer to the instruction following the next "ret". Then we push (EE009..EE00D) the new AX and a constant 0x00C0 to the stack. Here is how it looks now:
- old AX value
- E000 (code segment)
- E00E
- 00C0 <-- stack pointer points here
Once we do a "ret" (EE00D), CPU will pop 0x00C0 off the stack and go to that location. It is the entry point of the VGA option ROM - it had a "jmp 0000E00C0" instruction at E0003 which I overwrote with another jump to 0000E0613.
After the VGA option ROM has finished initializing, it would reach the "retf" instruction. That instruction will pop another two elements from the stack, E00E (new IP) and E000 (new CS), returning to an effective address of EE00E. Now only the old value of AX is stored on the stack.
We restore (EE00E) the old AX and push the current code segment again. XTIDE option ROM is located at EC000 (segment=0xEC00) which has a difference of 0xC000 bytes from the VGA option ROM that we need to compensate - option ROMs expect the CS register to be at their start. So we add 0xC00 to the CS value we have pushed to the stack. As 386 cannot address the stack directly, we need to use another register, BP (base pointer). So the values of BP and SP are swapped, the memory 16-bit word ("w,") location pointed by BP (with a displacement of 0) is added with a constant 0xC00 and the values of BP and SP are swapped back. And we push a constant value of 3. Here is the stack now:
- EC00
- 3 <-- stack pointer points here
And lastly, a far return is made which pops 3 to IP and 0xEC00 to CS, passing the control flow to the XTIDE's entry point. The last byte (0x13) at EE01C is not an actual instruction, it is there just to make the checksum correct.
I could have done just "jmp 0EC00:00003" but that would make this fragment of code depend on the size of XTIDE option ROM. Maybe it is overcomplicated but it is the first idea I had back then.
So the last part - how does the code at EE000 gets executed? The VGA option ROM had a 3-byte jump instruction at E0003 which isn't enough to encode a large forward jump to a location so far away - it can only do a relative jump in the range of -32768..32767 bytes but the distance to the stub code is 57338 bytes. To overcome this limitation, I have overwritten an error message at E0613:
The attachment 20241124_14h34m09s_grim.png is no longer available
It saves AX (E0613), sets AX to 0xE000, restores AX while putting 0xE000 on the stack and does a near return, effectively just jumping to E000:E000, which results in an effective address of EE000. Probably also overcompiled, could have just done a single jump here.