First post, by Wilczek_h
Hi,
This first article is going to be an unorganized dump of my mind. Maybe later notes will be more organized. Anyway, let's get started. 😀
I spent quite some time in the last 3 months reverse engineering Sid Meier's Railroad Tycoon. Why would I do that you ask? I stopped playing that awesome game when I hit the upper limit of the money amount the game can handle even on the hardest level. (Why couldn't they just pick a 4-byte integer to represent money?) My kids also like this game, but for them the challenge is not hitting the limit, but rather the competition, which is tough for them. They do not care about competition, they just want to build railroads and manage trains and that is it. I also find the 320x200 resolution quite suboptimal on large screens nowadays.
So, I thought I would give it a try and after 3 months of debugging in DOSBox (and DOSBox too), identifying code that is relevant for porting and actually porting it I have got this far (I could not figure how to embed this video here, so I am sharing the link):
Video showing the ported parts of the game (Make sure to bump up the quality settings in the player to actually see something)
What do I have already?
- I know how to extract images (PIC). This took me more or less a week. However, this does not seem to be a big achievement as others have already figured out the format itself.
- BUT, I could also figure out how to load and play PAN animation files, see the video above 😀. (This was around a month, pretty tough, and I cannot say that I can fully understand it, even though it is working…)
- It was pretty easy to figure out the font file format, just by looking at the binary.
- I also know how to load maps and parts of saved games. I can render the map with rails and stations, including the rails of the opponents at any resolution.
- I can also render the signal lights on stations.
- I can render the city names (pixel perfect).
I started with building DOSBox from source. Why? Because sometimes I also had to debug DOSBox itself to be able to progress. Example: I wanted to know the CS:IP values when a file gets open. Solution: breakpoint in DOSBox (dos_files.cpp/DOS_OpenFile), get the register variables and call it a day. (Btw., IDA Pro 5.0 was not helpful at all.)
The good things: Railroad Tycoon has been relatively easy to debug, understand and port. From the memory handling point of view the game is pretty static. So far I bumped into 2 malloc calls only and those only happened during PAN file processing. (Counting the malloc calls from the point when the micro prose logo is displayed.) Most of the things is loaded or written to fixed memory locations.
Another good thing is that only a part of the code needs to be disassembled as many functions or behaviors can be simply developed, for example: menus, file handling, etc.
However, there are some gray zones. To understand how tracks, stations and trains are rendered I had to debug the drawing code. Interestingly, the game transforms the tracks.pic into another format to save on iterations when drawing as it contains a lot of transparent pixels. That is absolutely not modern hardware friendly, so I had to change the drawing logic and decided not to port the tracks.pic preprocessor code and the infra asset drawing code as they would not be needed at all. Here it is how the original game stores the infra assets in memory:
Another mind blowing drawing solution I discovered is how the game renders the "blue" assets. Like tracks in the ocean or river landing parts. Uhh. It renders the original assets first, then replaces one or more of the on screen colors with different ones in the given cell. So, while the ported GDI version did the same, in the final(-ish) OpenGL version I simply precalculated all the blue assets in memory, created a texture from it and only addressed the texture at runtime. I had to be careful though, an infra asset is 20x20 pixels large, while a cell is only 16x16, so I only had to convert the inner 16x16 pixels of each asset to "blue".
The bad thing: RRT uses overlay extensively. I mean a lot. Therefore, many times I went through code that was handling overlays, that turned out to be a huge waste of time. After a while you get the pattern and can just skip huge chunks of ASM code. However, the real pain is that if you just step over code that does the overlay load and replaces code segments in memory, then you usually get a crash from the game saying: "Overlay not found". This makes debugging pretty difficult from time to time. Also, code gets overwritten, so if you previously had a function at XXXX:YYYY the next time you might have something else there, yet the code calling the same location. I had to be careful and check each and every function each time it got called after an overlay interrupt. (because call XXXX:YYYY before the overlay might have called a different function then call XXXX:YYYY after the overlay interrupt.)
The signal drawing code was harder to port to OpenGL. The original code draws the station and then calculates the lights at one or both ends and then picks an 8x8 pixels square in which the teal color is replaced with the calculated signal light color. Pretty easy when you can read from A000:xxxx and write to A000:xxxx directly. I did not want to do glReadPixels, FBO and co., so I decided to write a pixel shader that gets the source and destination colors as parameters and draws the station again at a given location, but discarding all pixels that do not match the source color.
Another weird thing is that the game stores a lookup table at 2815:1660 that are actually byte offsets of each row on the screen. For example: [2815:1660]=0, [2815:1662]=0x140… I did not need these offsets in the end, but it took some time to eliminate most of the references.
The map: the map is just a PIC file actually. It utilizes only 15 "colors", but there are more than 15 possible cell types in the game. Like industries and producers. Not everything is stored in the map, some tables are hardcoded in the game's executable. So far, I could not decide whether the logic that generates industries and producers is extremely clever or utterly insane. There is a very important uint16_t value at 17ED:8CF6 at runtime, or at 0x3736 in the SVE file. I call it the seed value. This seed value, the map color at the given location and the cell coordinates are all used to generate an index into the lookup table to get the industry or producer index. Take a look at the function in memory at 02BB:3033. Originally, I would have expected all this information to be stored in the map, as there would be enough room I guess, but no. How did I figure this out? The problem was that city sizes, industries and producers all changed each time a new map was generated in the original game. So, I saved a generated world for each scenario and loaded them back for investigation. This gave me a little bit of deterministic behavior 😀. The lookup tables are in the game code and there are 3 of them:
Interestingly, the opponents' stations are encoded into the map, but the tracks are stored separately.
Other great sources that were essential to my research:
- https://stanislavs.org/helppc/int_21.html
- Intel's CPU manual. Yes, this is a very important read when CPU instructions need to be implemented as C/C++ functions. 😀
What do I consider an MVP (minimum viable product)?
- all features of the original are working in the port (except sound)
- resolution independent map rendering (already working). This screenshot was taken on my other monitor (max. res. 3440x1440) utilizing all the available space to render the map:
- money bug fixed
- "unlimited" saved games (at least more than 4 😀)
- play alone feature
Once this is done, other limitations can be eliminated, like the limited number of stations and trains.
How far will I get? I do not know. I will have less time for this project until ~October comes again, but I will try to keep this project alive. The hard thing will be to find and port the non-deterministic, but essential code fragments, like the code that drives opponents, or the code that washes bridges away.
This became quite a long and random "article" scratching the surface, but I hope you enjoyed it. I will try to share more information in the future. Debugging, identifying code and porting take weeks/months, so do not expect daily reports here 😀, but once I have something, I will post it here.
Take care and best regards,
Wilczek