VOGONS


First post, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

I've previously found general AI models fairly unusable at generating retro assembly programs, but the recently released Claude 4 looks to have made the jump into useful territory.

Claude Sonnet models' scoring in percentage over time in my semi-public DOS FASM benchmark, easy/medium difficulty for someone with some experience in assembly:

The attachment nkghuk.png is no longer available

Token price has remained the same. Zero-shot, no test-time thinking.

Sample output for the tasks 'display information about mouse input' and 'load and display a paletted image':
https://leikareipa.github.io/dosbox/#/ai-asm- … 4-sonnet/mouse/
https://leikareipa.github.io/dosbox/#/ai-asm- … -sonnet/parrot/

Reply 1 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Google Gemini 3 Pro Preview which was released today also scored 79% in this benchmark.

Due to the way the benchmark is graded a score of 75% in a given test means the AI did what was asked and a score of 100% means it did what was asked and took the initiative to go an extra mile in adding something useful or whatever. So a score of 75% in each of the six tests means the benchmark has saturated. In the case of Claude 4 I recall most results were 75% but a few either 50%, i.e. mostly but not totally correct, or 100%, so not quite saturated overall. Gemini 3 Pro scores 75% in all but one test where it scores 100%, so this benchmark that was almost impossible for AI about 1.5 years ago has now saturated.

It's possible that the prompts have leaked into the training data, but the responses haven't been public, and in any case it's easy for someone else to come up with more retro assembly coding prompts to probe it.

I don't remember if Claude 4 was available for free at that time, but Google provides free preview access to Gemini 3 Pro, and of course being Google they have a wider audience to begin with. In theory this has direct implications for the retro scene since much of it relies on software.

Reply 2 of 19, by igully

User metadata
Rank Newbie
Rank
Newbie

I have personally found AI is pretty bad at assembly retro coding.

However, it is a good tool to find a function you could use for a certain job, and then do it yourself. A sort of dummy online Ralf's Brown Interrupt List with awful examples that most of the time won't work at all.

Reply 3 of 19, by gerry

User metadata
Rank l33t
Rank
l33t
igully wrote on 2025-11-20, 15:54:

I have personally found AI is pretty bad at assembly retro coding.

I have tried a little and agree, even when given context and references there can be small mistakes, and a small mistake in assembly usually means a frozen/crashed program. However it is quite good at creating variations on existing source, e.g. adding something from program A to program B. Perhaps it is simply that LLMs have access to so much more material for modern and current toolsets, that being the focus for the humans! Still, that's coming from sparse experience. I can imagine someone who has created a full library of referencing and examples not already 'trained' can do better with it

Reply 4 of 19, by elszgensa

User metadata
Rank Oldbie
Rank
Oldbie

Did you assume they already know everything (and have perfect recall), or did you acknowledge that their memory is based on lossy compression and prime them with some documentation? I've found their logic to be mostly reasonable (unless the task is underspecified), but more factual things like "was this function already available on win95" or "what fields does this struct have" can be prone to hallucinations. They're smart (at least in certain ways) but definitely not omniscient. Now, I'm not usually going as low as Assembly, mostly just C, but I found that "refreshing" their memory helps things along quite a bit. Can be some random somewhat on-topic stuff thrown their way before specifying their task, but of course the more relevant the better, e.g. for Win9x code it's something like MSDN docs and system headers; for DOS assembly I'd try providing a copy of the manual of the target assembler, and the RBIL plus maybe something about any video modes to be used.

Reply 5 of 19, by igully

User metadata
Rank Newbie
Rank
Newbie

It does not require a copy of the manual of the target assembler, just that you mention it in your prompt. The knowledge of interrupt lists is already built-in there. The problem is in the tiny details much like @gerry mentions above.

Another useful thing obtainable from these AIs is that it enables you to sometimes understand the cryptic compiler error messages; where I usually paste the offending line and compiler message, and I get to understand how to correct my error. It also suggests me solutions, which I tend to ignore because they are prone to fail miserably due to lack of arithmetic context.

Reply 6 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie
igully wrote on 2025-11-20, 15:54:

I have personally found AI is pretty bad at assembly retro coding.

However, it is a good tool to find a function you could use for a certain job, and then do it yourself. A sort of dummy online Ralf's Brown Interrupt List with awful examples that most of the time won't work at all.

Does it make you wonder that you find AI can't even write functional sample snippets while someone else finds it can write programs to spec?

AI can also be used in decompilation and disassembly. It's quite good at guessing the intent of a block of obscure looking auto decompiled code for example.

Reply 7 of 19, by elszgensa

User metadata
Rank Oldbie
Rank
Oldbie
igully wrote on 2025-11-22, 01:21:

The knowledge of [stuff] is already built-in there.

There's statistical approximations of facts and concepts, meaning it's close but not quite perfect - and that's exactly where the "errors in tiny details" creep in, and why giving the model a way to look up and verify any hard facts that you rely on helps. During training, frontier LLMs routinely get stuffed with dozens of terabytes of data. Do you honestly believe they manage to retain all of it, perfectly, in the resulting ~200GB of model data (or whereever the largest ones are at right now)?

Reply 8 of 19, by igully

User metadata
Rank Newbie
Rank
Newbie

For disassembling, I find the long-time available non-AI tools are already pretty good at what they do. But your mileage may vary, and a "second opinion" might indeed help under some specific circumstances.

Perhaps, using other more popular and hardware abstracted programming languages such as C, current AIs perform much better and can write complete programs. I haven´t tested this, but assume this could be the case. It is that my focus right now is exactly as the title of this thread "retro assembly coding".

My comments are based on real life experience, not on wild expectations:
If a tool says it can do X and solve B, I hope that it delivers both, not more (the "more" can be considered as a bonus). The problem with AI is that it is not reliable in what it says it can deliver by a long stretch, like for example sometimes giving you a complete code example which can never compile under any assembler.

However, as I mentioned above, I find it useful in a couple of circumstances which I do make profit from regarding retro assembly coding. So, I would say it is just another useful tool to incorporate, but not a gamechanger and far from the pipe dream of having an extra coder at your command. That of course, can only improve as AI models evolve.

Reply 9 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

I think you misunderstand this kind of AI specifically because you approach it as a tool and not as another coder. The interface is the hint, in this case natural language dialogue. It may also be that this kind of interface just doesn't suit you, too fuzzy or whatever.

Reply 10 of 19, by igully

User metadata
Rank Newbie
Rank
Newbie

It really does not work as another coder regarding retro-assembly. It is not about the user input, but on the attention to detail in the generated code by the AI. I invite you to try the most popular ones by asking small and easy tasks, and you will quickly get a glance at the current limitations. Uncompilable code, changing the assembler syntax at a whim in the same snippet, using illegal addressing modes, losing pointers, introducing illegal instructions, is just the tip of the iceberg. The current AI models just fail to replace a coder, but are valuable at certain tasks.

Another point in favor about current AIs regarding retro assembly coding, is that the code generated, despite its lack of quality, is very well commented, which is something many coders usually neglect, and wrongly assume another coder should understand right away. AI's are currently extremely good in this specific aspect as they tend to provide plenty of in-line comments and general logic structure explanations.

Reply 11 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Good. Can you make a table grading the current popular AI models in retro assembly coding?

Reply 12 of 19, by ratfink

User metadata
Rank Oldbie
Rank
Oldbie

Would AI be good for adding comments to code that's already been written (but poorly documented)? That sounds very useful. We tried this at work and it was reasonably ok last year, at a basic level (can't expect it to understand the user view of what the code is for) .

Reply 13 of 19, by eM-!3

User metadata
Rank Newbie
Rank
Newbie

Few weeks ago I've managed to complete reboot.com in ASM for DOS using ChatGPT. I guess it was fairly easy as it looks github is filled with school tasks like this. I also had some success when asking to translate source from one assembler to another (e.g. NASM to TASM). But I failed in every other thing in ASM.

Reply 14 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Only Claude 4 and Gemini 3 can do it.

Reply 16 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

Claude Opus 4.5 was released some hours ago. Again an improvement in coding.

Testing it against Gemini 3 Pro Preview with slightly harder assembly tasks since the previous ones got too easy. In this case small self-playing games: snake, space invaders, pong, tic-tac-toe. DOS executables with FASM syntax. The model had to do graphics, gameplay logic, timing, and basic AI. Unlike in previous tests the model was allowed follow-up prompts to fix issues or missing functionality.

Claude: https://leikareipa.github.io/dosbox/#/ai-asm- … opus-4-5/games/
Gemini: https://leikareipa.github.io/dosbox/#/ai-asm- … -preview/games/

Scoring:
81% Claude Opus 4.5 (16k thinking)
63% Gemini 3 Pro Preview (high thinking)

It's immediately clear that Claude's games are more polished. On average Claude also required the fewest prompts, 2 per game, while Gemini needed 4.

Syntax errors weren't particularly common. Typically one per game, where a game would be 500~1000 lines of code. Both models were always able to fix them when prompted.

Don't know if Claude Opus 4.5 is available for free. Running these tests via the API cost about $3 so probably not many want to be doing that. Then again with all models these assembly coding tasks typically generate longer thinking chains than some easier languages.

Reply 17 of 19, by zyzzle

User metadata
Rank Member
Rank
Member

Devise very hard tasks, such as a Pac-Man text mode game or a tunnel text mode game and see how Claude does.

How about a text-mode simulation of the Oregon Trail game or a nice game of Hunt the Wumpus as an Assembler DOS .COM file? That would take us to 1970s assembler level coding ability!

This is progress. Next step is CGA graphics, I guess?

Reply 18 of 19, by Bondi

User metadata
Rank Oldbie
Rank
Oldbie

Impressing stuff.
Do you think that it would be possible to have a device driver written this way?
I have a pcmcia card, dos drivers are lost for it. What if I fed card CIS to these more advanced coding AI? Could it then write an enabler? The CIS contains all the info about resources that the card needs to work.

PCMCIA Sound Cards chart
archive.org: PCMCIA software, manuals, drivers

Reply 19 of 19, by vvbee

User metadata
Rank Oldbie
Rank
Oldbie

I doubt it but who's preventing you from trying. With these models the outcome is still dependent on the human's skill as well, like choose the wrong model or take the wrong approach with it and you get a worse outcome.