VOGONS


First post, by doshea

User metadata
Rank Oldbie
Rank
Oldbie

I've reverse-engineered enough of the HyperReader!/HyperWriter! version 3 (.HW3) file format to be able to generate a fairly high-fidelity HTML representation of the contents of the later PC-SIG CD-ROMs. I've written Python code which converts from HW3 to XML, and XSLT to convert XML to HTML with CSS. None of what I've written is intended to be retro - I'm not trying to use old versions of CSS, XSLT, etc., although my knowledge is dated so they're probably a little retro! I plan to release the conversion code on GitHub, but I want to tidy it up first and was hoping for some advice from people who might know more about some of these more modern technologies. I'm hoping that since this is a retro project and I'm asking on a retro forum, people might have more sympathy for my clueless questions than if I was asking on Stack Overflow or something 😁

Schema questions:

Out of laziness I haven't written an XML schema (I don't know how and don't know which type I should make if I do) and plan to release it like that initially, but I want the generated XML to look okay - and for the unwritten schema to seem "normal" - if it's not too hard. I took a look at https://www.liquid-technologies.com/xml-schem … xsd-conventions and wanted to check if these are normal conventions that are typically followed:

All Element and attributes should use Upper Camel Case (UCC), e.g. (PostalAddress), and should avoid hyphens, spaces or other syntax.

Is this a common convention or just the opinion of the author? I'm mostly familiar with DocBook XML which just runs words together and keeps everything in lowercase, e.g. it uses "linkend" not "LinkEnd". I actually used hyphens to separate words, but it should be pretty easy to change.

Names should not include the name of the containing structure, e.g. CustomerName should be Name within the parent element Customer.

As above, is this something that is generally done? If so, what are you supposed to do if for example you had say Customer and Company elements which both had sub-elements for their names, but the valid attributes differ between a customer's name and a company's name? Even if an XML schema could indicate that the valid attributes differ depending on the parent (I have no idea if that is the case) it seems confusing to me.

Avoid the use of mixed content.

I'm using XML to mark up a document, e.g.:

<para style="6"><link type="text" number="11">BBS/Co<anchor number="14" unknown="0" unknown2="0" />mmunications/Networks</link></para>

I assume it's reasonable to use mixed content in this case? Is there something better I should be doing?

(and yes, I'm aware that when translating that to HTML, I can't actually nest the A elements that are called for by that sample XML; I've already dealt with that)

CSS question:

These documents have what are effectively tab characters and tab stops in them. They seem to behave differently from tabs in most word processors though: encountering a tab when rendering a line doesn't mean that the output position advances to the next tab stop after the current output position, but instead encountering the Nth tab on a line means that the output position advances to the Nth tab stop if that is ahead of the current output position, otherwise the output position is unchanged. I think that might make it a bit easier to handle them, but I have no idea how to implement this in CSS. Perhaps the JavaScript solution at https://stackoverflow.com/a/6311838/251448 would be a good starting point but I'm not sure. I'd be interested to hear whether anyone has any suggestions.

WordPerfect Magazine

In case anyone is interested, this is something which was also released in the .HW3 format, and I'm also able to parse and convert those files to XML too, but the HTML conversion needs a lot more work and I'm deferring that for now.

Thanks in advance for any advice and assistance!