Utility to strip out unsupported DOS characters (Text) \ VOGONS

Utility to strip out unsupported DOS characters (Text)

Topic actions

First post, by BinaryDemon

Posted on 2026-04-13, 15:06

BinaryDemon Offline

Rank Oldbie

Rank: Oldbie
Posts: 890
Joined: 2018-01-17, 00:35

So this is post might not exactly belong here because I’m not necessarily looking for a dos utility (but it could be!).

My issue is probably with my searching - is there a more accurate wording for ‘dos compatible plain text’?

Specifically my problem is setting up dos for the reading of E-Books. Most sources, even in plain text, contain characters that dos displays improperly - the biggest offenders seem to be “right single apostrophe” and “left single apostrophe”. Reading E-books on DOS is already a challenge due to lines/character restraints, throwing in a bunch of garbage characters detracts further.

Currently I use something like notepad in windows to prepare the text files for dos - just search and replace the offensive characters but it’s very manual. What do you guys recommend?

Thanks in advance!

Reply 1 of 10, by keenmaster486

Posted on 2026-04-13, 15:21

keenmaster486 Offline

Rank l33t

Rank: l33t
Posts: 3185
Joined: 2016-02-16, 02:04
Location: Gnosticus IV

Those ebooks are probably UTF-8.

DOS uses ASCII codepage 437: https://en.wikipedia.org/wiki/Code_page_437

Typically, text editors designed for programming can convert between character encoding schemes.

World's foremost 486 enjoyer.

Reply 2 of 10, by jakethompson1

Posted on 2026-04-13, 15:29

jakethompson1 Offline

Rank l33t

Rank: l33t
Posts: 2162
Joined: 2015-11-17, 04:16

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Reply 3 of 10, by BinaryDemon

Posted on 2026-04-13, 17:10

BinaryDemon Offline

Rank Oldbie

Rank: Oldbie
Posts: 890
Joined: 2018-01-17, 00:35

jakethompson1 wrote on 2026-04-13, 15:29:
On *nix iconv is the tool for converting between character sets. […]
Show full quote

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably the issue, iconv is not available by default and I had to install glibc apps and conversion tools.

But even then it spits out the error- illegal input sequence at position 0, so I thought maybe they weren’t legit UTF-8 format.

Apparently: iconv file name

Is supposed to tell you the encoding but same error.
Illegal input sequence at position 0. It’s a shame I can’t just force a conversation without knowing the original format.

Maybe I’ll download a more standard Live Distro and try this again later .

Reply 4 of 10, by BinaryDemon

Posted on 2026-04-13, 17:14

BinaryDemon Offline

Rank Oldbie

Rank: Oldbie
Posts: 890
Joined: 2018-01-17, 00:35

keenmaster486 wrote on 2026-04-13, 15:21:

Those ebooks are probably UTF-8.

DOS uses ASCII codepage 437: https://en.wikipedia.org/wiki/Code_page_437

Typically, text editors designed for programming can convert between character encoding schemes.

Do you have an example of one? The issues I’m hitting with using most dos word processors is that some of these files are two large, and I’d rather not break them into small pieces.

Reply 5 of 10, by keenmaster486

Posted on 2026-04-13, 17:31

keenmaster486 Offline

Rank l33t

Rank: l33t
Posts: 3185
Joined: 2016-02-16, 02:04
Location: Gnosticus IV

Oh, I was just referring to modern editors such as Sublime or VSCode, or even xed. Not sure what utility you would use that runs in DOS.

World's foremost 486 enjoyer.

Reply 6 of 10, by DaveDDS

Posted on 2026-04-13, 18:27

DaveDDS Offline

Rank Oldbie

Rank: Oldbie
Posts: 1457
Joined: 2021-05-09, 23:07
Location: Canada

I do have a little DOS tool I wrore years ago to do adjustments like this to text files.

1Convert Text File
2
3use:    CTF  infile outfile[or .] [options]
4
5opts:   -Nc         - Don't output char 'c'
6        c1=c2       - change char c1 to be c2
7        c1-c2=c3    -       ""    from c1 to c2 to be c3
8        c1-c2=c3+   -                    ""             (and increment)
9
10character values:
11    'x' `x` "x"     = character matching x
12    n..         = characger with hex value: nnn
13        %n=binary       @n=octal    .n=decimal      $n=hexidecimal (default)
14
15outfile = . means write to console
16
17eg: ctf file . '$'='?'                     		 all '$'s changed to '?'s
18    ctf file . 'a'-'z'='A'+                 	 lowercase converted to upper
19    ctf f1 f2 -N00                          	 remove 00 chars
20    ctf f1 f2 00-1F=0 7F-FF=0 0A=0A 0D=0D -n0    remove all non-printable ASCII
21                               ^-----^----- replace LF and CR removed by 00-1F

If it would help I could dig it out. clean it up a bit and make it available.

- Dave ; https://dunfield.themindfactory.com ; "Daves Old Computers" ; SW dev addict best known:
ImageDisk: rd/wr ANY floppy PChardware can ; Micro-C: compiler for DOS+ManySmallCPU ; DDLINK: simple/small FileTrans(w/o netSW)via Lan/Lpt/Serial

Reply 7 of 10, by BinaryDemon

Posted on 2026-04-13, 21:22

BinaryDemon Offline

Rank Oldbie

Rank: Oldbie
Posts: 890
Joined: 2018-01-17, 00:35

I've seen linux commands that will strip out single characters too, I was hoping for something that would basically force cp437 compliance. Delete all unsupported characters.

Reply 8 of 10, by cookertron

Posted on 2026-04-13, 21:49

cookertron Offline

Rank Member

Rank: Member
Posts: 148
Joined: 2017-08-07, 22:01

This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size).

The attachment txtclean.zip is no longer available

usage:
txtclean input.txt output.txt

Asus P5A v1.06, Gigabyte GA-6BXDS, Soyo SY-5EMA (faulty), Viglen 486, Asus SP97-V

Reply 9 of 10, by jakethompson1

Posted on 2026-04-13, 21:55

jakethompson1 Offline

Rank l33t

Rank: l33t
Posts: 2162
Joined: 2015-11-17, 04:16

BinaryDemon wrote on 2026-04-13, 17:10:
Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably th […]
Show full quote

jakethompson1 wrote on 2026-04-13, 15:29:
On *nix iconv is the tool for converting between character sets. […]
Show full quote

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably the issue, iconv is not available by default and I had to install glibc apps and conversion tools.

But even then it spits out the error- illegal input sequence at position 0, so I thought maybe they weren’t legit UTF-8 format.

Apparently: iconv file name

Is supposed to tell you the encoding but same error.
Illegal input sequence at position 0. It’s a shame I can’t just force a conversation without knowing the original format.

Maybe I’ll download a more standard Live Distro and try this again later .

That's unfortunate that it doesn't work, as it (and Kermit, which maybe deserves a second look since it has a native DOS version) is smart enough to do things like substitute curly " and ' with ASCII equivalents rather than just stripping them out.

Reply 10 of 10, by BinaryDemon

Posted on 2026-04-13, 22:09

BinaryDemon Offline

Rank Oldbie

Rank: Oldbie
Posts: 890
Joined: 2018-01-17, 00:35

cookertron wrote on 2026-04-13, 21:49:
This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size). […]
Show full quote

This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size).

The attachment txtclean.zip is no longer available

usage:
txtclean input.txt output.txt

This works perfectly , thanks! The first document I tested it on, it did 2114 corrections and the resulting document is much easier to read.

Go to top of page Go to top of page

Back to Software