VOGONS


First post, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie

So this is post might not exactly belong here because I’m not necessarily looking for a dos utility (but it could be!).

My issue is probably with my searching - is there a more accurate wording for ‘dos compatible plain text’?

Specifically my problem is setting up dos for the reading of E-Books. Most sources, even in plain text, contain characters that dos displays improperly - the biggest offenders seem to be “right single apostrophe” and “left single apostrophe”. Reading E-books on DOS is already a challenge due to lines/character restraints, throwing in a bunch of garbage characters detracts further.

Currently I use something like notepad in windows to prepare the text files for dos - just search and replace the offensive characters but it’s very manual. What do you guys recommend?

Thanks in advance!

Reply 1 of 10, by keenmaster486

User metadata
Rank l33t
Rank
l33t

Those ebooks are probably UTF-8.

DOS uses ASCII codepage 437: https://en.wikipedia.org/wiki/Code_page_437

Typically, text editors designed for programming can convert between character encoding schemes.

World's foremost 486 enjoyer.

Reply 2 of 10, by jakethompson1

User metadata
Rank l33t
Rank
l33t

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Reply 3 of 10, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie
jakethompson1 wrote on Today, 15:29:
On *nix iconv is the tool for converting between character sets. […]
Show full quote

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably the issue, iconv is not available by default and I had to install glibc apps and conversion tools.

But even then it spits out the error- illegal input sequence at position 0, so I thought maybe they weren’t legit UTF-8 format.

Apparently: iconv file name

Is supposed to tell you the encoding but same error.
Illegal input sequence at position 0. It’s a shame I can’t just force a conversation without knowing the original format.

Maybe I’ll download a more standard Live Distro and try this again later .

Reply 4 of 10, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie
keenmaster486 wrote on Today, 15:21:

Those ebooks are probably UTF-8.

DOS uses ASCII codepage 437: https://en.wikipedia.org/wiki/Code_page_437

Typically, text editors designed for programming can convert between character encoding schemes.

Do you have an example of one? The issues I’m hitting with using most dos word processors is that some of these files are two large, and I’d rather not break them into small pieces.

Reply 5 of 10, by keenmaster486

User metadata
Rank l33t
Rank
l33t

Oh, I was just referring to modern editors such as Sublime or VSCode, or even xed. Not sure what utility you would use that runs in DOS.

World's foremost 486 enjoyer.

Reply 6 of 10, by DaveDDS

User metadata
Rank Oldbie
Rank
Oldbie

I do have a little DOS tool I wrore years ago to do adjustments like this to text files.

Convert Text File

use: CTF infile outfile[or .] [options]

opts: -Nc - Don't output char 'c'
c1=c2 - change char c1 to be c2
c1-c2=c3 - "" from c1 to c2 to be c3
c1-c2=c3+ - "" (and increment)

character values:
'x' `x` "x" = character matching x
n.. = characger with hex value: nnn
%n=binary @n=octal .n=decimal $n=hexidecimal (default)

outfile = . means write to console

eg: ctf file . '$'='?' all '$'s changed to '?'s
ctf file . 'a'-'z'='A'+ lowercase converted to upper
ctf f1 f2 -N00 remove 00 chars
ctf f1 f2 00-1F=0 7F-FF=0 0A=0A 0D=0D -n0 remove all non-printable ASCII
^-----^----- replace LF and CR removed by 00-1F

If it would help I could dig it out. clean it up a bit and make it available.

- Dave ; https://dunfield.themindfactory.com ; "Daves Old Computers" ; SW dev addict best known:
ImageDisk: rd/wr ANY floppy PChardware can ; Micro-C: compiler for DOS+ManySmallCPU ; DDLINK: simple/small FileTrans(w/o netSW)via Lan/Lpt/Serial

Reply 7 of 10, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie

I've seen linux commands that will strip out single characters too, I was hoping for something that would basically force cp437 compliance. Delete all unsupported characters.

Reply 8 of 10, by cookertron

User metadata
Rank Member
Rank
Member

This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size).

The attachment txtclean.zip is no longer available

usage:
txtclean input.txt output.txt

Asus P5A v1.06, Gigabyte GA-6BXDS, Soyo SY-5EMA (faulty), Viglen 486, Asus SP97-V

Reply 9 of 10, by jakethompson1

User metadata
Rank l33t
Rank
l33t
BinaryDemon wrote on Today, 17:10:
Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably th […]
Show full quote
jakethompson1 wrote on Today, 15:29:
On *nix iconv is the tool for converting between character sets. […]
Show full quote

On *nix iconv is the tool for converting between character sets.

When reading old DOS README files with box drawing characters and such, you can read them in a modern way via iconv -f cp437 -t utf-8
For the reverse, you'd reverse the to/from.
There has to be a windows/dos equivalent or cross-compiled version out there.

Converting between character sets was also a specialty of Kermit which is all open source now.

Iconv seemed like a great option for me until I started going down the rabbit hole. I’m using TinyCoreLinux which is probably the issue, iconv is not available by default and I had to install glibc apps and conversion tools.

But even then it spits out the error- illegal input sequence at position 0, so I thought maybe they weren’t legit UTF-8 format.

Apparently: iconv file name

Is supposed to tell you the encoding but same error.
Illegal input sequence at position 0. It’s a shame I can’t just force a conversation without knowing the original format.

Maybe I’ll download a more standard Live Distro and try this again later .

That's unfortunate that it doesn't work, as it (and Kermit, which maybe deserves a second look since it has a native DOS version) is smart enough to do things like substitute curly " and ' with ASCII equivalents rather than just stripping them out.

Reply 10 of 10, by BinaryDemon

User metadata
Rank Oldbie
Rank
Oldbie
cookertron wrote on 35 minutes ago:
This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size). […]
Show full quote

This 16-bit DOS command line tool will remove all none DOS characters from a text file (of any size).

The attachment txtclean.zip is no longer available

usage:
txtclean input.txt output.txt

This works perfectly , thanks! The first document I tested it on, it did 2114 corrections and the resulting document is much easier to read.