VOGONS


EXMS86 (XMS for your 8086)

Topic actions

Reply 60 of 89, by wierd_w

User metadata
Rank Oldbie
Rank
Oldbie

1) only needs to be considered when an xms program requests a locked address.

2) already an issue with an xms function being called at the same time an ems one is. Logically, this xms emulator is allocating these ems pages, and as far as the ems driver is concerned, already owns them, and is already paging these owned pages in and out as is needed to satisfy these requests. The only magic here would be clever misuse of virtual addresses, such that 'any above 1mb location with requested lock' is allocated from these 'clever' locations, that 'just happen' to become the address of the pageframe when the upper bits are omitted. There's 4gb of logical address, so being 'wasteful' to pull off this trick is maybe ok.

Reply 61 of 89, by digger

User metadata
Rank Oldbie
Rank
Oldbie
mateusz.viste wrote on 2025-07-30, 18:14:

It would be nice to know what applications exactly need a kludge in the first. Ie. applications that are 8086-compatible AND XMS-capable AND EMS-unaware AND use DMA-over-XMS.
The "Legend of Kyrandia 2" has been mentioned earlier, but it's a 200M download so I somehow doubt it's 8086-compatible. I've checked the first part though, and it works well with EXMS86 (sound included).

As mentioned, Dune II is also 8086 compatible, XMS-capable, EMS-unaware and highly likely also uses DMA-over-XMS, since the installer mentions that digital sound effects will only be available if (enough) XMS memory is present in the system. Dune II is also a game by Westwood Studios, like Legend of Kyrandia 2. but it's a lot smaller than 200M. If I'm not mistaken, the installer is only about 5MB or so.

Dune II uses Miles ADV drivers (but can be swapped with DIGPAK drivers). That might help you debug exactly how digital audio playback is performed in the game.

Oh wait, I just realized that the game likely does not use DMA-over-XMS, if you can swap the digital audio driver with a driver that plays back the digital audio on a non-DMA sound device, which I described successfully doing in that linked thread. 🤔

Reply 62 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member
digger wrote on 2025-07-31, 08:31:

As mentioned, Dune II is also 8086 compatible, XMS-capable, EMS-unaware and highly likely also uses DMA-over-XMS, since the installer mentions that digital sound effects will only be available if (enough) XMS memory is present in the system. Dune II is also a game by Westwood Studios, like Legend of Kyrandia 2. but it's a lot smaller than 200M. If I'm not mistaken, the installer is only about 5MB or so.

I tried Dune 2. Without EXMS86 it plays music and sfx, but no speech. With EXMS86 it plays music, sfx and speech (music is disabled during the intro, but I guess this is normal, it's either music or speech there - later in the game the music works fine). I tested this on 86box with an emulation of an 8088 with a LoTech card and a Sound Blaster Pro card. I had to set the SB card to IRQ 7, for some reasons speech wasn't working when the card was set to IRQ 5 (surely unrelated to EXMS86).

The game is totally playable on a 8088, very cool.

digger wrote on 2025-07-31, 08:31:

Oh wait, I just realized that the game likely does not use DMA-over-XMS, if you can swap the digital audio driver with a driver that plays back the digital audio on a non-DMA sound device

Either that, or the game has a graceful fallback to a non-DMA mode when the XMS driver refuses to expose the physical addresses of XMS regions.

http://mateusz.fr

Reply 63 of 89, by DosFreak

User metadata
Rank l33t++
Rank
l33t++
mateusz.viste wrote on 2025-07-17, 12:22:
Checked these two. […]
Show full quote
zb10948 wrote on 2025-07-16, 21:51:

mem shows no XMS, Wolf3D shows no XMS?

Checked these two.

It appears that MS MEM doesn't even try to detect XMS on machines that are not at least a 286:
LINK REMOVED TO POSSIBLY LEAKED MICROSOFT CODE

As for Wolf3D, the starting screen doesn't actually show the amount of available memory, but rather the amount of memory that it was able to allocate. First, it looks for EMS and allocates as much as possible (ie. everything). Then, it looks for XMS and asks the XMM driver about how much memory is available - and my driver answers "0 bytes", because there is no longer any EMS memory left that I could use to back my XMS emulation.

If you'd like to play WOLF3D with EXMS86, then you need to instruct the game not to reserve all the EMS memory. It's actually easy: just run "WOLF3D NOEMS".

Removed link to Microsoft source code. If the op is using this code to work on their projects then likely this thread will need to be closed.

How To Ask Questions The Smart Way
Make your games work offline

Reply 64 of 89, by wierd_w

User metadata
Rank Oldbie
Rank
Oldbie

Was it from released dos 4 sources?

It would be period correct, and kosher.

Reply 65 of 89, by GemCookie

User metadata
Rank Member
Rank
Member

No, it was a repository that supposedly contained the MS-DOS 6.0 source code.

Gigabyte GA-8I915P Duo Pro | P4 530J | GF 6600 | 2GiB | 120G HDD | 2k/Vista/10
MSI MS-5169 | K6-2/350 | TNT2 M64 | 384MiB | 120G HDD | DR-/MS-DOS/NT/2k/XP/Ubuntu
Dell Precision M6400 | C2D T9600 | FX 2700M | 16GiB | 128G SSD | 2k/Vista/11/Arch/OBSD

Reply 66 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member
GemCookie wrote on 2025-07-31, 21:04:

No, it was a repository that supposedly contained the MS-DOS 6.0 source code.

It's on github, easy to find.

GemCookie wrote on 2025-07-31, 21:04:

If the op is using this code to work on their projects then likely this thread will need to be closed.

You need to define "using".

http://mateusz.fr

Reply 67 of 89, by mkarcher

User metadata
Rank l33t
Rank
l33t
digger wrote on 2025-07-30, 07:48:

That's the cool thing about this forum. I keep learning new stuff from the DOS PC era. Thanks. 🙂

Nevertheless, this thread is about EXMS86, not the design(?) of the DMA System of IBM-compatible computers, so we shouldn't derail that thread too much with off-topic posts. If you are interested in a more detailed discussion, I suggest you open a new thread if you have further questions. Feel free to send me a PM with a link if you are afraid I miss it otherwise.

digger wrote on 2025-07-30, 07:48:

Wasn't it still the case that ISA DMA was kind of a bottleneck, especially on anything faster than a 286, since it always had to access the system RAM at 5MHz?

While you have a point in that the transfer rate of ISA DMA ist quite low, and the base frequency of the DMA controller in AT computers is half the ISA clock, i.e. around 4MHz, that is even less than the 4,77MHz in the PC/XT, calling it a "bottleneck" is exaggerating it.

digger wrote on 2025-07-30, 07:48:

It's kind of disappointing that it ended up being the primary I/O method for playing back digitized audio samples on popular sound cards back in the day, causing major headaches w.r.t. DOS sound compatibility once PCs moved beyond ISA slots.

For ISA sound cards, ISA DMA ist the method of choice for data transfer. Having a central multi-channel DMA controller on the mainboard instead of some DMA technique on each sound card also is a smart choice, as it makes card design easier. The issue with non-ISA sound cards is that there was no standard how PCI cards can be target of the central (ISA) DMA controller, which is a design choice when PCI was specified.

digger wrote on 2025-07-30, 07:48:

If I understand correctly, from the 386 and up, this became such a bottleneck, that even driving a Covox-like dumb LPT DAC directly would be take up less overhead than relying on ISA DMA for digital sound playback. Or does that deserve more nuance?

This paragraph mirrors the sentiment about ISA DMA, which clearly is deficient, and in my opinion was barely adequate for the original PC, but never kept up, yet it is that far exaggerated that one has to consider it factually false. So, let's put the rants aside and get back to the facts.

ISA DMA is inconvenient to use, as the DMA controller used in the original IBM PC is actually designed for 8080 or 8085 systems with a total address space of 64K. IBM added a "page register" that supplies four extra address bits per channel, but these page registers are not tied to the address bits in the DMA controller, so a transfer always stays inside one 64K block. You can't do a transfer starting at physical address 60K (which is in the first 64K block) and ending at address 70K (which is in the second 64K block). The IBM mainboard BIOS rejects a floppy operation like this with error code 9 ("DMA segment overrun"). It is quite likely you never knew this, even if you were writing programs that directly access the floppy drive using INT 13h. This is because DOS installs a shim layer above the BIOS which transparently works around this limitation, so the disk operating system actually did care about making disk operations easier. As the second DMA controller in the AT is wired to address 64K words (128K bytes, but each transfer needs to be performed from an even address), you get barriers every 128K instead of every 64K, but the issue persists, as the AT still uses the 8237A meant for 8-bit systems. If the AT hard drive controller would have used a 16-bit DMA channel instead of port I/O, all hard disk transfers would have required word alignment, which is something XT software didn't care about, and is likely a contributing factor why the AT hard drive controller did not use DMA.

Indeed, the clock speed of the 8237A is limited to 5 MHz, and the controller is operated at 3MHz in the original 6MHz AT, and at 4 to 4.2 MHz in later systems. The standard configuration of the ISA DMA controller requires 4 click cycles per transfer (+ some overhead), so at 4MHz, a transfer rate of 1 mega transfer per second was theoretically obtainable, which is 2MB/s. This is faster than the transfer rate of an MFM hard drive. Typical controllers first transferred a sector from the drive surface into controller RAM, and only after getting the signal from the hardware CRC comparator that the checksum is OK, the sector is transferred from the controller to the host. The hardware CRC comparator checks the CRC while the data is read from the drive, so ther is no notable latency for the CRC check, and the transfer to the host can begin immediately after reading the sector+its CRC. In case the CRC mismatched, the correction was a slow process performed on the controller, but let's omit this fringe case for now. A desirable design uses interleave 2, which records the sectors in the order 1 - 10 - 2 - 11 - 3 - 12 - 4 - 13 - 5 - 14 - 6 - 15 - 7 - 16 - 8 - 17 - 9. This means that the data from sector 1 can be transferred from the controller to the host while sector 10 passes the drive head. Thus 50% of the time is used for reading data from the disk to the controller and the other 50% is used to transfer data from the controller to the host. At 3600rpm (60 rotations per second), we can read 17 sectors in two revolutions, so 30 * 17 = 510 sectors can be read in a second, yielding a net transfer rate of 261 KB/s. As only 50% of the time is spent on transferring data from the controller to the host, that transfer would require twice the speed, i.e. arounf 520KB/s. This is way below the 2MB/s theoretical (and likely 1.6MB/s practical) limit of ISA DMA, so contrary to what some people say, when IBM designed the AT, ISA DMA was not prohibitively slow for hard drive transfers. Even if you used very modern MFM drives that have 50% higher transfer rates, you still end up at around 780KB/s required DMA rate. While faster hard drives did exist in 1984, they were not available at a form factor or price point that are a good fit for the IBM AT. So while the transfer rate is limited, disregard any claim that the rates are unusable for any practical purpose.

Furthermore, it is true that DMA transfers at their quite low rate put a burden on the ISA bus. If you run a WSS-type card in the less efficient "single mode DMA" variant at 48kHz 16 bit stereo (around 200KB/s), expect ISA transfer rates to go down by 33%. This is because WSS-type cards use an 8-bit channel, and "single mode" being the least efficient mode of the three available modes. Basically, you have

  • Block mode: The device request DMA, the DMA controller obtains bus ownership and then issues as many DMA read or or write cycles from/to the device as the DMA controller is set up for. It doesn't matter whether the device stops requesting DMA during the transfer, the transfer will be continued until the programmed number of bytes/words have been transferred.
  • Single mode: The device requests DMA, the DMA controller obtains bus ownership, transfers a single byte or word and then release bus ownership. If after releasing bus ownership, the device is still requesting DMA, the DMA controller again enters bus arbitration to obtain the bus for the next byte or word.
  • Demand mode: This mode a cross-over of the previous two modes: If the device requests DMA, the DMA controller obtains the bus and starts sending/requesting multiple bytes/words just like in block mode, but the device can interrupt the transfer any time it likes by stopping to request DMA. In that case, the current transfer gets finished and the DMA controller releases the bus, even if it still has some bytes to transfer.

So, assuming the device wants to transfer a certain number of bytes (or words, but ignore that for now), and the DMA controller is set up for exactly that size, and the device keeps its DMA request line active all the time, it will flawlessly work with all modes, even without knowing what mode the DMA controller is in! The DMA controller is quite primitive. It can not interrupt a transfer if a higher priority request comes in. As long as bus ownership for one DMA channel is established, the DMA controller will keep the bus assigned for that channel. This effectively means block transfers can not be interrupted, even for memory refresh! Also, demand mode transfers can only be interrupted if the device temporarily releases the DMA request line. Single mode transfers can be interrupted any time. As we know that we can get around 1 transfer per microsecond, the information that you may not "own" the ISA bus for longer than 15µs immediately shows that transfers exceeding 15 bytes or words may not be transferred in block mode, and in demand mode, the device needs to "cooperate" by periodically releasing the bus. In single mode, re-arbitration happens after every byte/word transferred, so there is no issue with RAM refresh in single mode.

WSS-type cards are meant to be used in demand mode. The AD1848 (or later compatible chips) and the SB16 have a FIFO the card, and they transfer a couple of bytes or word in demand mode, until the FIFO is full, and then release the bus until the FIFO is nearly empty. This is more efficient than arbitrating for the ISA bus for every byte/word. The sound chip that is meant to interface demand mode will not notice if the DMA controller is in single mode instead - except for the worse performance delivered in single mode.

So, ISA DMA does put a clearly relevant burden on the ISA bus, which will reduce the bandwidth available to writing to ISA graphics card (which is not good for games). On the other hand, the comparison with simple LPT DACs is nonsense. Using a device like that in the background means that for every sample the processor will get interrupted, needs to save the current execution address and processor state, look up the timer interrupt handler, jump there, push some registers to the stack, output a sample to the DAC, pop the registers stored manually and then return back to the interrupted task. This is clearly more overhead even than DMA in single byte mode for 16-bit stereo sound. Things might turn against DMA if the parallel port sound device has a FIFO, so you don't have to fill every single byte. The Disney Sound Source works this way, but as the parallel port usually uses 8-bit I/O at default wait states (which are set quite high usually to be PC compatible), I don't think you can beat DMA. Now, if EMM386 is loaded, things for non-DMA sound playback look even worse because EMM386 is basically a virtual machine monitor (or hypervisor) that runs the DOS tasks. As monitor programm, EMM386 receives all interrupts, and handles them in protected mode. When a DOS program is interrupted, the processor switches from virtual-8086-mode (a sub-mode of protected mode) to standard protected mode (which takes ~100 clock cycles) to have EMM386 handle the interrupt. EMM386 is then supposed to "forward" the interrupt to the virtualized DOS task, which means it has to switch back to virtual-8086-mode. I assume it can set up the processor in a way that returning from the interrupt (for example the timer interrupt to send samples to a parallel-port DAC) does not require another round trip through standard protected mode. Every kind of DMA is more efficient than having a high-frequency timer interrupt with EMM386 loaded.

In the end, (non-busmaster) ISA DMA was only used in cases where high throughput wasn't required, but background transfers were important. Sound cards fit this pattern perfectly. The programming model of ISA DMA is awful, and if EMM386 virtualizes RAM, it also needs to virtualize DMA, which is a very cumbersome operation. So nobody in the software industry liked ISA DMA. This explains why system designers were happy to get rid of the obselete ISA DMA scheme when they designed PCI systems. They did not expect the use case of "ISA compatible sound cards" being that important. Later on, some standards (eg. PC/PCI) were designed to give PCI sound cards access to the ISA DMA/IRQ system, and that's why PCI cards can't generally offer a nice Plug&Play experience including soundblaster compatibility.

Reply 68 of 89, by digger

User metadata
Rank Oldbie
Rank
Oldbie

Thank you for the very extensive and informative answer, mkarcher. Your level of knowledge about this topic is quite impressive.

Reply 69 of 89, by digger

User metadata
Rank Oldbie
Rank
Oldbie
mateusz.viste wrote on 2025-07-31, 13:09:

I tried Dune 2. Without EXMS86 it plays music and sfx, but no speech. With EXMS86 it plays music, sfx and speech (music is disabled during the intro, but I guess this is normal, it's either music or speech there - later in the game the music works fine).

It's not supposed to be "either/or" in the intro. On 286 and higher systems with sufficient XMS memory, the intro should play with music, sound effects and speech. No idea why you experienced this limitation in the intro. What version of the game did you download? And how much emulated XMS was available to the game? Maybe this happened due to a lack of conventional memory?

I tested this on 86box with an emulation of an 8088 with a LoTech card and a Sound Blaster Pro card. I had to set the SB card to IRQ 7, for some reasons speech wasn't working when the card was set to IRQ 5 (surely unrelated to EXMS86).

The game is totally playable on a 8088, very cool.

That is indeed amazing. 😄 It must run a bit on the sluggish side, though. Or is it not that bad? It might get a lot worse later in the game, with more units moving around.

Either that, or the game has a graceful fallback to a non-DMA mode when the XMS driver refuses to expose the physical addresses of XMS regions.

Maybe. But why go through the trouble of implementing such a workaround if the XMS drivers back in the day did expose those physical addresses in most cases? Or was this limitation quite common in XMS drivers?

Reply 70 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member
digger wrote on 2025-08-01, 15:11:

It's not supposed to be "either/or" in the intro. On 286 and higher systems with sufficient XMS memory, the intro should play with music, sound effects and speech. No idea why you experienced this limitation in the intro. What version of the game did you download? And how much emulated XMS was available to the game? Maybe this happened due to a lack of conventional memory?

I retested on a stronger VM (486) and speech is playing at the same time as music indeed.
When I tested on the 8088 VM it had 2MB of XMS (from an emulated LoTech 2 MB EMS board), but I think the problem is either the soundcard (on the 486 I tested with an SB16, while the 8088 could only have an SB Pro) or the amount of conv. RAM. I had only about ~570K of conv RAM available while the game's setup program was telling that I need 602K... Unfortunately I wasn't able to get more on this 86Box setup without HMA.

But why go through the trouble of implementing such a workaround if the XMS drivers back in the day did expose those physical addresses in most cases? Or was this limitation quite common in XMS drivers?

I checked now - Dune2 does not even try to lock XMS regions. It does not care about the XMS version either. The only functions it calls are "query available XMS", "allocate XMS", "move XMS" and "free XMS".

http://mateusz.fr

Reply 71 of 89, by mkarcher

User metadata
Rank l33t
Rank
l33t
mateusz.viste wrote on 2025-08-01, 21:23:

I checked now - Dune2 does not even try to lock XMS regions. It does not care about the XMS version either. The only functions it calls are "query available XMS", "allocate XMS", "move XMS" and "free XMS".

So it just uses the XMS as swap space / RAM drive to be able to load the speech samples quickly to conventional RAM when they are supposed to be played. Programs like that are the perfect use case for EXMS86.

Reply 72 of 89, by Jo22

User metadata
Rank l33t++
Rank
l33t++

But why go through the trouble of implementing such a workaround if the XMS drivers back in the day did expose those physical addresses in most cases? Or was this limitation quite common in XMS drivers?

Did they all expose it? I mean, less popular DOSes such as ROM-DOS, X-DOS or PTS/Paragon DOS had their own Himem.sys substitutes.
Then there are the synthetic DOS environments of Unixes or OSes such as L3.
If they had XMS support, they maybe had to keep it abstract in order to not clash with the limits of the, um, sandbox.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 73 of 89, by mkarcher

User metadata
Rank l33t
Rank
l33t
Jo22 wrote on 2025-08-01, 22:33:

But why go through the trouble of implementing such a workaround if the XMS drivers back in the day did expose those physical addresses in most cases? Or was this limitation quite common in XMS drivers?

Did they all expose it?

They had to. XMS was not only used for real-mode software like disk caches or RAM drives, but also as hardware abstraction layer for Windows 3.0 in standard mode (or other protected mode software). Such software uses the XMS driver to allocate extended memory and lock all the blocks before entering protected mode. Then the locked physical addresses of the XMS blocks can be used to set up descriptors pointing into memory allocations managed by Windows 3.1.

This use case will break with EXMS86. Protected mode software requires "real XMS", but as long as you don't use EXMS on a 286 or newer, you can't run protected mode software anyway.

Reply 74 of 89, by Jo22

User metadata
Rank l33t++
Rank
l33t++

They had to.

It makes sense in principle. Had this ever been checked in practice, though?
I vaguely remember that some himem.sys alternatives are Windows 3.x incompatible (also including public domain XMS managers).
The PTS/Paragon version is quite tiny and may lack certain features.
X-DOS had a compatibility switch in config.sys for making Windows 3.x run, I vaguely remember.
That being said, I often had to use Microsoft's or IBM's himem.sys do run Windows 3.1x..

Edit: I don’t mean to argue, I just wonder if there had been, um, emperical evidences collected regarding various XMS managers.

"Time, it seems, doesn't flow. For some it's fast, for some it's slow.
In what to one race is no time at all, another race can rise and fall..." - The Minstrel

//My video channel//

Reply 75 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member

Introducing EXMS86 v0.9.4 - Now your 8088 PC can enjoy the benefits of XMS instead of EMS. With this release, you can let EXMS86 disable EMS support and repurpose the page frame into a valuable 64K of UMB. On my virtual 86Box setup, running an 8088 with an emulated LoTech EMS card, I was able to get to 622K of available conventional RAM.

http://mateusz.fr

Reply 76 of 89, by mkarcher

User metadata
Rank l33t
Rank
l33t
mateusz.viste wrote on 2025-08-06, 14:30:

With this release, you can let EXMS86 disable EMS support and repurpose the page frame into a valuable 64K of UMB.

That is a great idea, but while using the EMS page frame from interrupt handlers us a clear specification violation, accessing UMBs from interrupt handlers is standard practice. I checked the EXMS source code and it seems you do not disable Interrupts while you remap the page frame (and temporarily make the UMB inaccessible). I wonder if this will cause issues in practice - my gut feeling says that many TSRs hook hardware interrupts, and thus can not be loaded high with EXMS, on the other hand, an XT system does not have that many components that generate interrupts and need a DOS driver (the mouse might be the most common), so you might get away with it.

Reply 77 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member
mkarcher wrote on 2025-08-06, 21:41:

I checked the EXMS source code and it seems you do not disable Interrupts while you remap the page frame (and temporarily make the UMB inaccessible). I wonder if this will cause issues in practice

You are totally right. Perhaps I should disable interrupts to be on the safe side. I did think about it for a brief moment but decided not to in 0.9.4 so I can assess how it works in practice. The problem is that XMS transfers may be quite large and hence take a long time on an XT (imagine a program transfering 1 MB of assets from one XMS region to another, over the 8bit ISA bus...). Might lead to the system clock getting late and possibly other inconveniences. A workaround would be to pause transfers every now and then, restore the page frame to its original state and enable interrupts for a couple of cycles. But this would come at a performance cost. Another problem is that I am not sure if all EMS cards (and their drivers) work properly under disabled ints.

Another option is to only use the higher 32K of the page frame for UMBs. This is totally safe because EXMS86 never touches this part of the page frame, it only plays with the two lower pages. So this might be the best compromise - plus it requires no code change, just a documentation update.

http://mateusz.fr

Reply 78 of 89, by mkarcher

User metadata
Rank l33t
Rank
l33t
mateusz.viste wrote on 2025-08-06, 22:16:
mkarcher wrote on 2025-08-06, 21:41:

I checked the EXMS source code and it seems you do not disable Interrupts while you remap the page frame (and temporarily make the UMB inaccessible). I wonder if this will cause issues in practice

A workaround would be to pause transfers every now and then, restore the page frame to its original state and enable interrupts for a couple of cycles. But this would come at a performance cost. Another problem is that I am not sure if all EMS cards (and their drivers) work properly under disabled ints.

Another option is to only use the higher 32K of the page frame for UMBs. This is totally safe because EXMS86 never touches this part of the page frame, it only plays with the two lower pages. So this might be the best compromise - plus it requires no code change, just a documentation update.

Both are interesting options. The second option is by far the easier one, but it comes at the cost of 32KB of UMBs.

You correctly identified the issues of the first option, I thought about them as well. I wouldn't expect any EMS driver to require interrupts being enabled to re-map the page frame, so I expect them to work even if interrupts are disabled. On the other hand, I don't know if you can rely on the EMS driver keeping interrupts disabled all the time, which would undo a CLI instruction. So my suggestion for the first route would be to disable interrupts at the PIC (I think I wouldn't care about the NMI, or even better, keep it enabled to catch parity errors during block move), which will avoid issues if the EMS driver uses CLI/STI pairs. For performance optimization, you might want to check whether (a) interrupts were enabled at the CPU when the XMS driver was called and (b) the PIC reports interrupt requests in the IRR (interrupt request register). If you do that check every chunk you transfer, it won't hit performance noticably, as checking the IRR after selecting the IRR once in the setup code is just a single port read. I didn't check your logic, but given that you say you are using two pages, I assume you get two chunks per 16KB to deal with alignment.

I remember I watched my father troubleshooting random crashes of our AT clone running DR DOS 5.0 and the disk cache shipped with DR DOS, and he found out that the issue was caused by interrupts arriving at an inadvertant time during that transfer (I do no longer remember the specifics of "inadvertant", it might be related to the A20 gate and how DR DOS handled the HMA), and he developed a fix-up tool that masked the interrupts at the PIC during every INT 15 call to move memory. While MS-DOS 5.0 HIMEM uses LOADALL for XMS transfers on a 286, DR-DOS 5.0 HIMEM uses INT15. While INT15 transfers memory in protected mode, interrupts are not delivered anyway, so masking them for a slightly longer time including the upswitch and downswitch and possibly restoration of the A20 state didn't add significant extra latency.

Reply 79 of 89, by mateusz.viste

User metadata
Rank Member
Rank
Member
mkarcher wrote on Yesterday, 06:20:

Both are interesting options. The second option is by far the easier one, but it comes at the cost of 32KB of UMBs.

Indeed. But no int drama, no performances hit, and no code overhead. These are very compelling arguments to me for sacrificing 32K of UMB. :-P

On the other hand, XMS drivers are expected to disable interrupts anyway. The XMS 2.0 spec even explicitly states it - "(...) function is guaranteed to provide a reasonable number of interrupt windows during long transfers". I have to think about it. I also don't know if an application might call XMS with already disabled ints and expect these ints to stay disabled once the XMS transfer returns (?).

mkarcher wrote on Yesterday, 06:20:

So my suggestion for the first route would be to disable interrupts at the PIC (I think I wouldn't care about the NMI, or even better, keep it enabled to catch parity errors during block move), which will avoid issues if the EMS driver uses CLI/STI pairs. For performance optimization, you might want to check whether (a) interrupts were enabled at the CPU when the XMS driver was called and (b) the PIC reports interrupt requests in the IRR (interrupt request register).

Accessing the PIC directly is a very neat idea, I had not thought about it. I am a bit afraid of it, though, as it would require more extensive testing on real hardware since it is murky territory to me. Also, there could be two PICs (on AT+), which would lead to more doubts, questions and testing. I'd rather keep EXMS86 "simple" if possible so the code-flow does not follow too many different paths.

mkarcher wrote on Yesterday, 06:20:

given that you say you are using two pages, I assume you get two chunks per 16KB to deal with alignment.

Depends what the requested XMS offsets are, but yes - in worst case I transfer three chunks at start (to deal with page alignment for src and dst), and then only 1 chunk for every further page.
EXMS86 maps two pages in the page frame: one for the src, the other one for the dst. An optimization would be to map all the 4 pages when the requested transfer is large enough, but it would make the logic more complicated and I wouldn't expect much performance gains.

Speaking about performances: EXMS86 uses rep movsb to copy bytes. Using rep movsw could be more performant, but again - more complexity to deal with odd addresses and odd lengths (the XMS spec says that the requested transfer length must be even, but does all application comply?), and the performances would not be better on a 8088 anyway (and not even on a 80286 if the EMS board is 8bit). So it's the same theme again - giving up some capabilities/performances in the name of simplicity and easier testability.

mkarcher wrote on Yesterday, 06:20:

I remember I watched my father troubleshooting random crashes of our AT clone running DR DOS 5.0 and the disk cache shipped with DR DOS, and he found out that the issue was caused by interrupts arriving at an inadvertant time during that transfer

Sounds like a very cool childhood! Myself I didn't have electricity until I was ca. 10 years old, and even then computing was a science fiction thing I only heard about on american movies, so I had to catch up with all of this much later. :-P

http://mateusz.fr