VOGONS


First post, by OCTAGRAM

User metadata
Rank Newbie
Rank
Newbie

How should DOSBox-aware DOS program implement delay function? It should be non-CPU-consuming and non-frequency-dependent.

I have heard that there is a HLT instruction that halts CPU until interrupt occurs. Maybe one could set IRQ 0 handler and then HLT. Interrupt handler should wake up CPU when sleep is over.

Is there another, easier ways as of DOSBox 0.72?

Reply 1 of 18, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

HLT does NOT idle in dosbox, it's a (very) busy loop.

The only way to get some sort of cpu-friendly idling is to set the cycles
very low and have a regular busy loop in your dos code.

Reply 3 of 18, by OCTAGRAM

User metadata
Rank Newbie
Rank
Newbie

I have found two ways to implement a delay:

1) mov ax,1680h ; int 2fh
2) mov ah,86h ; int 15h

The first one is working in NTVDM and also said to be working fine in DOSEmu (decreasing CPU usage from 90% to 0.1%). However, it seems to be not working in DOSBox.

The second one is working fine in DOSBox. I observe that this interrupt is implemented internally, so I can count on performance increase. I can't say for sure about performance, but at least delay times do not drift depending on CPU usage by other programs. The second one doesn't work outside of DOSBox 🙁

Reply 12 of 18, by OCTAGRAM

User metadata
Rank Newbie
Rank
Newbie

Well, at least it was not uncommon in my games 😀

INT 15/AH=86 has indeed a custom implementation (handled by DOSBox):

http://dosbox.cvs.sourceforge.net/viewvc/dosb … .73&view=markup

It invokes CALLBACK_Idle();

CALLBack_Idle is implemented here:

http://dosbox.cvs.sourceforge.net/viewvc/dosb … .40&view=markup

if (!CPU_CycleAutoAdjust && CPU_Cycles>0)
CPU_Cycles=0;

These lines might probably be the optimisation, but I didn't noticed CPU usage decrease (maybe it wasn't implemented in 0.72 yet? After all, 0.72 is two years old)

My Internet investigations shows that those who patch RTLs, patch it with INT 2F/AX=1680 method. So will I.

Another alternative to tweaking CPU cycles would be to tweak priority instead. Putting DOSBox.exe into idle priority serves the same aim: it prevents DOSBox from disturbing other programs.

Reply 14 of 18, by OCTAGRAM

User metadata
Rank Newbie
Rank
Newbie

I have done some measurements on Mac OS X Intel, DOSBox 0.72, core=dynamic, cycles=max.

The red color is kernel CPU usage,
the green color -- user programs (mostly DOSBox)
the blue color -- niced programs (mostly Folding@Home)

The DOSBox process is highlighted on screenshots.

Code:

uses CRT, DOS;

{ CRT.Delay with anti-200 patch }

var Hour, Minute, Second, Sec100 : Word;

begin
repeat
GetTime(Hour, Minute, Second, Sec100);
WriteLn(Hour : 2, ':', Minute : 2, ':', Second : 2, '.', Sec100 : 2);
Delay(1000);
until KeyPressed;
end.

Results:
af5fd962f213c6e4a8f4edeec6251d1f.png

Code:

uses CRT, DOS;

{ INT 2F/AX=1680 }

{$Q-}
{ Disable overflow checks
Support timer wraparounds,
though didn't checked if it actually helps }

const
TimerRemainder : LongInt = 0;
{ Accumulates the remainders }
{ When ms is being converted to ticks,
the remainder is usually being thrown out;
This variable accumulates remainders.
The intention is to make some delays
one tick longer, so that rounded delay
interval will be close to the specified
amount of ms.
}

procedure YieldDelay (ms: Word);
var
Timer : LongInt absolute $0040:$006C;
TimerSave,
TimerWait : LongInt; { amount of ticks to wait }
begin
if ms = 0 then Exit;

TimerSave := Timer;
{
Timer increments about every 55ms (18.2Hz)
More precisely, it increments 1573040 ticks per day (24hours)
TimerWait is thus (ms * 1573040) / 24 * 60 * 60.
This values are big, they might overflow LongInt.
I have reduced them:
1573040 ticks per 86400000 milliseconds (24:00:00)
157304 ticks per 8640000 milliseconds (02:24:00)
78652 ticks per 4320000 milliseconds (01:12:00)
39326 ticks per 2160000 milliseconds (00:36:00)
19663 ticks per 1080000 milliseconds (00:18:00)
The values are rather precise, they should scale good.
It is better then TimerWait := ms div 55;
}
TimerWait := (LongInt(ms) * LongInt(19663));
TimerWait := TimerWait + TimerRemainder;
TimerRemainder := TimerWait mod LongInt(1080000);
TimerWait := TimerWait div LongInt(1080000);
{
0 <= ms <= 2^16; 0 <= 19663 <= 2^15; 0 <= ms * 19663 <= 2^31
All the values should be fitting boundaries under any
circumstances (though I didn't checked it)

Given the fact that max delay is 65.5 seconds, this might be not
worth the trouble
}
while (Timer - TimerSave) < TimerWait do
asm
push ax; mov ax, $1680; int $2f; pop ax
end;
Show last 16 lines
    { Returns the timeslice to the multitasker }

{Timer - TimerSave -- there might be wraparound here,
that's why I used $Q- }
end;


var Hour, Minute, Second, Sec100 : Word;

begin
repeat
GetTime(Hour, Minute, Second, Sec100);
WriteLn(Hour : 2, ':', Minute : 2, ':', Second : 2, '.', Sec100 : 2);
YieldDelay(1000);
until KeyPressed;
end.

Results:
2a64a63d4ce3cd8ffe445bb85b2ccf3b.png

Code:

uses CRT, DOS;

{ INT 15/AH=86 }

procedure BIOSDelay(ms : word); Assembler;
asm
mov ax,1000
mul ms
mov cx,dx
mov dx,ax
mov ah,$86
int $15
end;

var Hour, Minute, Second, Sec100 : Word;

begin
repeat
GetTime(Hour, Minute, Second, Sec100);
WriteLn(Hour : 2, ':', Minute : 2, ':', Second : 2, '.', Sec100 : 2);
BIOSDelay(1000);
until KeyPressed;
end.

Results:
05015e2e0ec112e91ddc401d5607b077.png

Code:

uses CRT, DOS;

{ HLT }

{$Q-}
{ Disable overflow checks
Support timer wraparounds,
though didn't checked if it actually helps }

const
TimerRemainder : LongInt = 0;
{ Accumulates the remainders }
{ When ms is being converted to ticks,
the remainder is usually being thrown out;
This variable accumulates remainders.
The intention is to make some delays
one tick longer, so that rounded delay
interval will be close to the specified
amount of ms.
}

procedure HLTDelay (ms: Word);
var
Timer : LongInt absolute $0040:$006C;
TimerSave,
TimerWait : LongInt; { amount of ticks to wait }
begin
if ms = 0 then Exit;

TimerSave := Timer;
{
Timer increments about every 55ms (18.2Hz)
More precisely, it increments 1573040 ticks per day (24hours)
TimerWait is thus (ms * 1573040) / 24 * 60 * 60.
This values are big, they might overflow LongInt.
I have reduced them:
1573040 ticks per 86400000 milliseconds (24:00:00)
157304 ticks per 8640000 milliseconds (02:24:00)
78652 ticks per 4320000 milliseconds (01:12:00)
39326 ticks per 2160000 milliseconds (00:36:00)
19663 ticks per 1080000 milliseconds (00:18:00)
The values are rather precise, they should scale good.
It is better then TimerWait := ms div 55;
}
TimerWait := (LongInt(ms) * LongInt(19663));
TimerWait := TimerWait + TimerRemainder;
TimerRemainder := TimerWait mod LongInt(1080000);
TimerWait := TimerWait div LongInt(1080000);
{
0 <= ms <= 2^16; 0 <= 19663 <= 2^15; 0 <= ms * 19663 <= 2^31
All the values should be fitting boundaries under any
circumstances (though I didn't checked it)

Given the fact that max delay is 65.5 seconds, this might be not
worth the trouble
}
while (Timer - TimerSave) < TimerWait do
asm
hlt
end;
Show last 16 lines
    { Powers down the CPU for the moment }

{Timer - TimerSave -- there might be wraparound here,
that's why I used $Q- }
end;


var Hour, Minute, Second, Sec100 : Word;

begin
repeat
GetTime(Hour, Minute, Second, Sec100);
WriteLn(Hour : 2, ':', Minute : 2, ':', Second : 2, '.', Sec100 : 2);
HLTDelay(1000);
until KeyPressed;
end.

db128ac9f0e18322b8196d472d2b0033.png

So far, we have:

CRTDelay -- hogs, drifts very much
YieldDelay -- hogs
BIOSDelay -- hogs, drifts a bit
HLTDelay -- works

This is how everything works in DOSBox.
Outside of DOSBox (summary of facts):

BIOSDelay -- should not be used, not portable
HLTDelay -- mostly works; is reported to crash on Windows 2000
YieldDelay -- mostly works; is reported to result in long delays (8 seconds) on Windows 3.1

The basic scheme I use (not counting my recent measurements) would be to try INT 2F/AX=1680. According to interrupt list, it should return 00 in AL if supported and 80h if not.

TimerSave := Timer;

if EnvType = NotDetectedYet then
EnvType := try_int_2f_ax;

if EnvType = MultiTasker then
while Timer .... do
int 2f/ax=1680
else
while Timer ... do
hlt;

An expanded version is in the bottom of my post.

This scheme should properly work on Windows 2000 and true DOS systems with CPU that can be powered down (e. g. DOS Laptops)

DOSBox, however, doesn't fit into this scheme. It reports "supported" on int 2f/ax=1680, but hlt is indeed better on dosbox.

So far, I also need to detect DOSBox (and HLT in this case), detect Win 3.1 (and I don't know what to do in this case).

{$Q-}
{ Disable overflow checks
Support timer wraparounds,
though didn't checked if it actually helps }

const
TimerRemainder : LongInt = 0;
{ Accumulates the remainders }
{ When ms is being converted to ticks,
the remainder is usually being thrown out;
This variable accumulates remainders.
The intention is to make some delays
one tick longer, so that rounded delay
interval will be close to the specified
amount of ms.
}
TestYield : Byte = 0;
{ 0 -- didn't tested yet
1 -- no multitasker detected, using HLT
2 -- multitasker detected, using INT 2F

This intetion is to do our best on every
known platform.

HLT lowers CPU usage on pure DOS systems.
INT 2F/AX=1680 yields a timeslice in a
multitasking environment

HLT can also yield a timeslice, but it is
reported that HLT crashes NTVDM on Windows
2000. The hybrid method implemented here
won't use HLT in presence of multitasker. On
Windows 2000, it will use INT 2F/AX=1680, and
won't crash.
}

procedure Delay (ms: Word);
var
Timer : LongInt absolute $0040:$006C;
TimerSave,
TimerWait : LongInt; { amount of ticks to wait }
begin
if ms = 0 then Exit;

TimerSave := Timer;
{
Timer increments about every 55ms (18.2Hz)
More precisely, it increments 1573040 ticks per day (24hours)
TimerWait is thus (ms * 1573040) / 24 * 60 * 60.
This values are big, they might overflow LongInt.
I have reduced them:
1573040 ticks per 86400000 milliseconds (24:00:00)
157304 ticks per 8640000 milliseconds (02:24:00)
78652 ticks per 4320000 milliseconds (01:12:00)
39326 ticks per 2160000 milliseconds (00:36:00)
19663 ticks per 1080000 milliseconds (00:18:00)
The values are rather precise, they should scale good.
It is better then TimerWait := ms div 55;
}
TimerWait := (LongInt(ms) * LongInt(19663));
Show last 41 lines
  TimerWait := TimerWait + TimerRemainder;
TimerRemainder := TimerWait mod LongInt(1080000);
TimerWait := TimerWait div LongInt(1080000);
{
0 <= ms <= 2^16; 0 <= 19663 <= 2^15; 0 <= ms * 19663 <= 2^31
All the values should be fitting boundaries under any
circumstances (though I didn't checked it)

Given the fact that max delay is 65.5 seconds, this might be not
worth the trouble
}
if TestYield = 0 then
asm
push ax; mov ax, $1680; int $2f
or al,al
jz @1
mov TestYield,1
jmp @2
@1:
mov TestYield,2
@2:
pop ax
end;

if TestYield = 1 then
while (Timer - TimerSave) < TimerWait do
asm
hlt
end
{ Powers down the CPU for the moment }
else
while (Timer - TimerSave) < TimerWait do
asm
push ax; mov ax, $1680; int $2f; pop ax
end;
{ Returns the timeslice to the multitasker }

{Timer - TimerSave -- there might be wraparound here,
that's why I used $Q- }
end;

Reply 18 of 18, by OCTAGRAM

User metadata
Rank Newbie
Rank
Newbie

I'm trying to solve it without changing dosbox itself.

Intercepting int 2fh isn't a big problem. My quick and dirty TSR worked: YieldDelay now consume only 16% CPU.

I'd also like to change int 16h (keyboard) behaviour.

Int 16h is handled by callback at 0f100h:01c0h:

sti
db 0feh, 38h, 0eh, 00h, 0cfh
nop
nop
...
nop
nop
jmp short 01c1h

According to sources (dosbox/src/ints/bios_keyboard.cpp), when no key is pressed yet, DOSBox starts executing NOPs and eventually goes into the same callback. I have replaced (I observe that BIOS is writable) the last NOP(opcode 90h) with HLT(opcode 0f4h), and it indeed caused CPU usage decrease when I run int 16h with ah=00h. However, I can see no difference when int 16h is being run with ah=10h (this is how it is invoked mostly).

I have picked up two year-old bios_keyboard.cpp

According to sources, the idling method is the same:

/* enter small idle loop to allow for irqs to happen */
reg_ip+=1;

This is strange. I'm wondering if it can be caused due to instructions cache.