VOGONS


First post, by bcorp

User metadata
Rank Newbie
Rank
Newbie

Hi all,

I've been working on something a bit unusual — an AI agent that runs natively
on Windows XP SP3, targeting real period hardware (Pentium III/IV class,
64-512 MB RAM).

It's a terminal-based tool written in Zig that connects to Ollama (or any
OpenAI-compatible API) over local HTTP. You type questions in natural language
and it runs diagnostic tools via function calling — system info, processes,
network, disk, services, etc.

Some implementation details that might interest this crowd:

- Binary is ~750 KB, single-threaded, no CRT/MSVC dependency
- TUI uses Win32 Console API with CP437 box-drawing characters
- Automatic CP850 → UTF-8 conversion for localized Windows output
- UTF-8 → ASCII sanitization for safe console rendering
- Compatibility shim: RtlGetSystemTimePrecise redirected to
GetSystemTimeAsFileTime (doesn't exist on XP)
- Cross-compiles from any modern OS via Zig's toolchain
- os_version_min set to .xp in the build config

The LLM itself runs on a separate modern machine on the network — this is
purely the client/agent side. Tested with llama3, qwen2, mistral on Ollama.

Screenshots and source: https://github.com/benmaster82/retro-agent

Would love to hear from anyone who tests it on actual XP hardware.
I've been testing in VMs but real-world feedback on Pentium III/IV era
machines would be invaluable.

Reply 1 of 2, by gerry

User metadata
Rank l33t
Rank
l33t

interesting and forgive any ignorance, is the LLM that runs separately offline (eg ollama seems to have that option)? that in itself is interesting, takes a lot of GB and fast CPU to cope with all that. Or maybe not for this purpose.

natural language interface may be useful for those not versed in XP commands, and as it can run things that's interesting - look like diagnostic output an input for the LLM.

And i just thought od having an offline LLM on a single thread Pentium 3 😀 there's a challenge!

Hope you get some interest 😀

Reply 2 of 2, by bcorp

User metadata
Rank Newbie
Rank
Newbie
gerry wrote on Today, 17:18:
interesting and forgive any ignorance, is the LLM that runs separately offline (eg ollama seems to have that option)? that in […]
Show full quote

interesting and forgive any ignorance, is the LLM that runs separately offline (eg ollama seems to have that option)? that in itself is interesting, takes a lot of GB and fast CPU to cope with all that. Or maybe not for this purpose.

natural language interface may be useful for those not versed in XP commands, and as it can run things that's interesting - look like diagnostic output an input for the LLM.

And i just thought od having an offline LLM on a single thread Pentium 3 😀 there's a challenge!

Hope you get some interest 😀

Thanks garry, appreciate the kind words!

Good question — yes, Ollama can run fully offline once you've downloaded
a model. No internet needed after that. The XP machine and the Ollama
host just need to see each other on the local network (plain HTTP, no
TLS — XP's TLS stack is too old anyway).

For hardware on the Ollama side, it depends on the model. Something like
llama3 8B quantized (Q4) / Qwen2.5:7b needs about 4-6 GB RAM and runs fine on any
modern CPU — even without a GPU, just slower.

BUT — if you don't want to dedicate hardware to running the LLM, Ollama
also has cloud-hosted models since v0.12. You just run:

ollama run gpt-oss:20b-cloud

and it connects to their cloud backend. Models like gpt-oss (20B and
120B), deepseek-v3.1, and qwen3-coder are available. They work through
the same OpenAI-compatible API at /v1/chat/completions, so retro-agent
works with them out of the box — just point --url at the machine running
Ollama.

There's also a free tier, so you can test the whole setup without spending
anything. Useful if you just want to try it without setting up a big model
locally.

And since retro-agent works with any OpenAI-compatible endpoint, you
could also point it at other providers (OpenRouter, Together.ai, etc.)
if you prefer.

The XP machine itself does almost nothing compute-wise — it sends JSON
over HTTP, parses the tool calls, runs the local commands, and sends back
the output. Heavy lifting is 100% on the Ollama side (local or cloud).

> And I just thought of having an offline LLM on a single thread Pentium 3
> there's a challenge!

Ha! That would be something. I've seen tinyllama running on a Raspberry
Pi at ~2 tokens/second — a Pentium III would probably manage... eventually.
Maybe one day someone will get llama.cpp cross-compiled for XP x86. Now
THAT would be the ultimate retro AI setup.

If you or anyone here wants to try it, the prebuilt binary is in the
GitHub releases — just copy the .exe over and point it at an Ollama
instance (local or cloud). No install needed.