VOGONS


The Soundblaster DSP project

Topic actions

Reply 720 of 1053, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie
Maelgrum wrote on 2023-10-06, 06:24:

So i did a execution trace of interrupt handler = its ~94 cycle
execution trace of command 0x14 execution path is ~174 cycle
In total it is 268 cycles - even in most optimistic case - we out of bounds (> 250)

Hmmm... I wonder if it's possible to configure the 8052 to run in faster modes (6T or even fewer clocks per cycle) for the DSP code, if supported.

If the DSP code itself contains nothing requiring complex delays then it might work out okay, allowing the execution path to finish a bit faster to meet the timing requirements.

EDIT: Just read the disassembly... Only a bunch of small delays consisting of 1 or 2 NOPs are present... if these NOPs are not too sensitive, maybe it can be made work with a 8052 MCU capable of faster timings.

Reply 722 of 1053, by mkarcher

User metadata
Rank l33t
Rank
l33t
mattw wrote on 2023-10-06, 09:34:
based on the reports here that CSP.SYS fails to load on AWE32 when the MCU is flashed with V4.16, I think best guess is the ASP […]
Show full quote
Maelgrum wrote on 2023-10-06, 02:19:

4.16 is shorter then 4.13 - something is cutted.

based on the reports here that CSP.SYS fails to load on AWE32 when the MCU is flashed with V4.16, I think best guess is the ASP code is removed from the V4.16 DSP - looking at 86box AWE32/ASP emulation code:

https://github.com/86Box/86Box/blob/master/sr … nd/snd_sb_dsp.c

those are the following DSP commands:

case 0x01: /* asp_data_len???? */
case 0x03: /* ASP status */
case 0x04: /* ASP set mode register */
case 0x05: /* ASP set codec parameter */
case 0x08: /* ASP get version */
case 0x0E: /* ASP set register */
case 0x0F: /* ASP get register */

also, in "Reverse engineering the SB16 ASP/CSP" thread here:

Tests/Info welcome: Reverse engineering the SB16 ASP/CSP

it was figured out that Creative used Signaled Processor ST18933 from SGS-Thomson (STMicroelectronics).

Nice references, but in my "totally unbiased" oppinion, the best reference for the CSP/ASP command set is in this thread: Re: The Soundblaster DSP project . That post (and this whole thread) should put an end to the myth that 0x0E and 0x0F are "ASP commands", even though they can be used to access the ASP/CSP chip.

Reply 723 of 1053, by mattw

User metadata
Rank Oldbie
Rank
Oldbie
mkarcher wrote on 2023-10-06, 16:54:

Nice references, but in my "totally unbiased" opinion, the best reference for the CSP/ASP command set is in this thread: Re: The Soundblaster DSP project . That post (and this whole thread) should put an end to the myth that 0x0E and 0x0F are "ASP commands", even though they can be used to access the ASP/CSP chip.

thank you! I don't doubt your research on the subject and let's hope people will start submitting code (even if it's just fix to a wrong comment in the code) to projects like '86box' and fix the mistakes and misunderstandings over the years. otherwise, the wrong information will continue spreading from an open-source project to the next one, because almost no such project stars developing from zero, '86box' inherited that all from 'PCem' and it seems now parts of that is ported to 'DOSBox-X' as well.

Reply 724 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
LSS10999 wrote on 2023-10-06, 15:11:

Hmmm... I wonder if it's possible to configure the 8052 to run in faster modes (6T or even fewer clocks per cycle) for the DSP code, if supported.

If the DSP code itself contains nothing requiring complex delays then it might work out okay, allowing the execution path to finish a bit faster to meet the timing requirements.

EDIT: Just read the disassembly... Only a bunch of small delays consisting of 1 or 2 NOPs are present... if these NOPs are not too sensitive, maybe it can be made work with a 8052 MCU capable of faster timings.

Two more NOPs can be added, and source recompiled ))

Reply 725 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
georgel wrote on 2023-10-06, 16:45:

Keep in mind we are not sure at what clock an integrated 8052 is running.

It is true, but 24 Mhz is reasonable assumption, until proved otherwise.
Even modern '52 is not much faster.
As classic SB16 has 24MHz OSC.

Reply 726 of 1053, by georgel

User metadata
Rank Member
Rank
Member
Maelgrum wrote on 2023-10-06, 17:25:
LSS10999 wrote on 2023-10-06, 15:11:

Hmmm... I wonder if it's possible to configure the 8052 to run in faster modes (6T or even fewer clocks per cycle) for the DSP code, if supported.

If the DSP code itself contains nothing requiring complex delays then it might work out okay, allowing the execution path to finish a bit faster to meet the timing requirements.

EDIT: Just read the disassembly... Only a bunch of small delays consisting of 1 or 2 NOPs are present... if these NOPs are not too sensitive, maybe it can be made work with a 8052 MCU capable of faster timings.

Two more NOPs can be added, and source recompiled ))

Any idea on how to measure clock frequency of the integrated 8052s? E.g. is it equal or higher than that of the discrete DSPs?

Reply 727 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
georgel wrote on 2023-10-06, 17:34:

Any idea on how to measure clock frequency of the integrated 8052s? E.g. is it equal or higher than that of the discrete DSPs?

If it feels like discrete 24Mhz, sounds like 24Mhz - may be it is 24Mhz ? ))
By measuring time delay between bytes in 'get copyright string' command, for example.
Read time of whole string (on fast PC) can be good indicator.

Last edited by Maelgrum on 2023-10-06, 18:06. Edited 2 times in total.

Reply 728 of 1053, by rasz_pl

User metadata
Rank l33t
Rank
l33t

arent sampling rates expressed in internal clock delays? The Soundblaster DSP project
>256 - (1000000 / rate)
should be easy to derive from internal table

Open Source AT&T Globalyst/NCR/FIC 486-GAC-2 proprietary Cache Module reproduction

Reply 729 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
Maelgrum wrote on 2023-09-27, 18:08:
Interesting facts: In SB16, samplerate is set by one byte (which stored at memory location 0x37 and x-bus register 0x09). Lets c […]
Show full quote

Interesting facts:
In SB16, samplerate is set by one byte (which stored at memory location 0x37 and x-bus register 0x09). Lets call is SR.
Output frequency will be SR * 46615120 / (256 * 1024).
Only 3 frequency can be exactly set with SR - it is 44100, 22050 and 11025 (with SR equal to 248, 124 and 62).
Another set of usable frequencies is 8000, 16000, 24000, 32000, 40000. Not exact, but error is low.
For all other frequencies error is high.

By using of command 0x40 you cannot set 44100, but can set 22050 and 11025.
Max samplerate (for SR = 0xFF) is 45345

Reply 730 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
rasz_pl wrote on 2023-10-06, 17:52:

arent sampling rates expressed in internal clock delays? The Soundblaster DSP project
>256 - (1000000 / rate)
should be easy to derive from internal table

No, see my post above.

Reply 732 of 1053, by maxtherabbit

User metadata
Rank l33t
Rank
l33t

I've tried 4.16 in two different CT2230 cards now, it does not work in either one. Stuck the MCU back in the programmer and validated the data just to be sure, and it's a good flash, so 4.16 firmware simply will not work on this card

Reply 733 of 1053, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie
Maelgrum wrote on 2023-10-06, 17:25:

Two more NOPs can be added, and source recompiled ))

From the 4.13 disassembly, all the double NOP occurrences are the same, with only the last MOVX instruction different in direction (read or write).

	mov	r0,#0		; 1332   78 00      x.
nop ; 1334 00 .
nop ; 1335 00 .
movx a,@r0 ; 1336 e2 b

or

	mov	r0,#0		; 108c   78 00      x.
nop ; 108e 00 .
nop ; 108f 00 .
movx @r0,a ; 1090 f2 r

And there's a single NOP occurrence.

X11dd:	jb	p1.0,X11dd	; 11dd   20 90 fd    .}
nop ; 11e0 00 .
setb 2fh.0 ; 11e1 d2 78 Rx
ljmp cmd_E5 ; 11e3 02 12 12 ...

I'm not sure what kind of operation requires those NOPs to behave properly. On the surface it feels like it's trying to align the code in question to 4 bytes (the NOP in 8051 is 00h).

The only thing to worry about would be timers and UART. If these also operate in 6T when set to 6T mode then the related values may need to be doubled to ensure original functionality. Some faster (like 1T) MCUs I know of appear to retain 12T for timers/UART so no value changes needed for these parts of the code, though all the features depend on the MCU model being used.

But ultimately it depends on whether other chips on the sound card would mind if the MCU is operating at faster timing (that commands will complete faster than usual)...

Reply 734 of 1053, by Maelgrum

User metadata
Rank Member
Rank
Member
LSS10999 wrote on 2023-10-07, 04:38:
From the 4.13 disassembly, all the double NOP occurrences are the same, with only the last MOVX instruction different in directi […]
Show full quote
Maelgrum wrote on 2023-10-06, 17:25:

Two more NOPs can be added, and source recompiled ))

From the 4.13 disassembly, all the double NOP occurrences are the same, with only the last MOVX instruction different in direction (read or write).

	mov	r0,#0		; 1332   78 00      x.
nop ; 1334 00 .
nop ; 1335 00 .
movx a,@r0 ; 1336 e2 b

or

	mov	r0,#0		; 108c   78 00      x.
nop ; 108e 00 .
nop ; 108f 00 .
movx @r0,a ; 1090 f2 r

And there's a single NOP occurrence.

X11dd:	jb	p1.0,X11dd	; 11dd   20 90 fd    .}
nop ; 11e0 00 .
setb 2fh.0 ; 11e1 d2 78 Rx
ljmp cmd_E5 ; 11e3 02 12 12 ...

I'm not sure what kind of operation requires those NOPs to behave properly. On the surface it feels like it's trying to align the code in question to 4 bytes (the NOP in 8051 is 00h).

The only thing to worry about would be timers and UART. If these also operate in 6T when set to 6T mode then the related values may need to be doubled to ensure original functionality. Some faster (like 1T) MCUs I know of appear to retain 12T for timers/UART so no value changes needed for these parts of the code, though all the features depend on the MCU model being used.

But ultimately it depends on whether other chips on the sound card would mind if the MCU is operating at faster timing (that commands will complete faster than usual)...

All access to x-bus port 0 (sound blaster data port) is done with 2 NOPs in fw. I think this is some delay, required by unknown hardware constraints.
So every time you see:
Mov r0, #0
Nop
Nop
Movx a, @r0 (@r0, a)
Add 2 or 3 more NOPs, this should be ok as delay.
Timers are used for midi time stamping (don't care) and as baudrate control of midi Uart. So midi won't work.
Different timings with Bus control chip (x-bus exchange) - don't know. It may work, or not))
Most compatible will be classic 24MHz mcu.
If only thing what wanted is cure for single cycle dma bug - existing fw can be patched to significantly reduce execution time of critical parts of code.
It may help. Or not))

Reply 735 of 1053, by mattw

User metadata
Rank Oldbie
Rank
Oldbie
Maelgrum wrote on 2023-10-07, 07:42:

Most compatible will be classic 24MHz mcu.

I see the Oscillator crystal connected to the 8052 MCU is 24MHz, on all my SB16 cards, but is that really the speed, because here:

https://web.archive.org/web/20100103134333/ht … /tuttimng.phtml

it's explained that it depends on how many instructions particular 8052 can run per how many "oscillator cycles". (i am confused and so I could be reading that wrong)

[EDIT] I further read:

The standard 8052 microcontroller requires 12 oscillator cycles for each instruction cycle. The 80C32 requires only 4. This means that, given the exact same hardware design and crystal speed, dropping in an 80C32 will generally increase performance by about 250%.

Reply 736 of 1053, by georgel

User metadata
Rank Member
Rank
Member
Maelgrum wrote on 2023-10-07, 07:42:
... Most compatible will be classic 24MHz mcu. If only thing what wanted is cure for single cycle dma bug - existing fw can be […]
Show full quote

...
Most compatible will be classic 24MHz mcu.
If only thing what wanted is cure for single cycle dma bug - existing fw can be patched to significantly reduce execution time of critical parts of code.
It may help. Or not))

Had it been a firmware bug only 4.16 would have corrected it, because cards with 4.16 do not experience single-cycle DMA bug.

Reply 737 of 1053, by mattw

User metadata
Rank Oldbie
Rank
Oldbie
georgel wrote on 2023-10-07, 08:55:
Maelgrum wrote on 2023-10-07, 07:42:
... Most compatible will be classic 24MHz mcu. If only thing what wanted is cure for single cycle dma bug - existing fw can be […]
Show full quote

...
Most compatible will be classic 24MHz mcu.
If only thing what wanted is cure for single cycle dma bug - existing fw can be patched to significantly reduce execution time of critical parts of code.
It may help. Or not))

Had it been a firmware bug only 4.16 would have corrected it, because cards with 4.16 do not experience single-cycle DMA bug.

what about the possibility that newer ICs found on V4.16 cards have higher efficient 8052 that can do more instructions per oscillator cycles speeding the work that way, i.e. the example I cited in my previous post above when "drop down" replacement of classic 8052 to more efficient one could lead to up to 250% increase in performance - without changing anything else in the surrounding circuit. Also, what about on those older SB model we not only replace the 8052 MCU, but also change the surrounding circuit with changing the crystal and increase its speed from 24MHz to let say 33MHz. Are all those options ruled out somehow?

Reply 738 of 1053, by LSS10999

User metadata
Rank Oldbie
Rank
Oldbie
mattw wrote on 2023-10-07, 08:38:
I see the Oscillator crystal connected to the 8052 MCU is 24MHz, on all my SB16 cards, but is that really the speed, because her […]
Show full quote
Maelgrum wrote on 2023-10-07, 07:42:

Most compatible will be classic 24MHz mcu.

I see the Oscillator crystal connected to the 8052 MCU is 24MHz, on all my SB16 cards, but is that really the speed, because here:

https://web.archive.org/web/20100103134333/ht … /tuttimng.phtml

it's explained that it depends on how many instructions particular 8052 can run per how many "oscillator cycles". (i am confused and so I could be reading that wrong)

[EDIT] I further read:

The standard 8052 microcontroller requires 12 oscillator cycles for each instruction cycle. The 80C32 requires only 4. This means that, given the exact same hardware design and crystal speed, dropping in an 80C32 will generally increase performance by about 250%.

Classic 8051/8052 operates at 12 clock cycles per instruction. With a 24MHz crystal, that would be 2MIPS.

Having a MCU operating at fewer clock cycles per instruction will make internal operations complete faster, but the code for timers/UART as well as places requiring a minimum amount of time delay (like I/O with an external bus/device) will have to be adjusted accordingly to ensure proper function.

A side note: To get a very accurate UART baud rate for regular COM operations, you'll want to use a crystal in the multiple of 11.0592MHz.

EDIT: Thanks for pointing out that MIDI uses a 31250 UART baudrate which can be divided from 24MHz crystal, so everything's good.

Last edited by LSS10999 on 2023-10-07, 10:33. Edited 1 time in total.

Reply 739 of 1053, by Tiido

User metadata
Rank l33t
Rank
l33t

The UART is for MIDI which is 31250baud and that can be divided exactly from the 24MHz already present. For regular COM port stuff you do want a 11.0592 or some multiple of that.

T-04YBSC, a new YMF71x based sound card & Official VOGONS thread about it
Newly made 4MB 60ns 30pin SIMMs ~
mida sa loed ? nagunii aru ei saa 😜