It's CSW (composite Sine Wave) synthesis; it also involves the two opl2 timers, and I believe it works almost identically to how it works on other chips like the ym2612 used on the sega genesis. One timer controls pitch, the other controls maybe pulse width?
Edit: CSW speech synthesis for dummies: do LPC analysis of speech. find all the peaks (these are most likely your formants). place a sine wave at each peak starting at the highest one, going down until you run out of sine waves. move around the sine waves each frame (i.e. <=~50ms period) as the peaks change and move.
Edit2: the lowest frequency high peak is most likely your pitch/first formant/'f0'. There are complicated mathematical ways to best estimate the fundamental pitch, or simpler/less accurate ways which are analog (gold) or digital using voting between 5 algorithms (gold-rabiner), and more still. Some simply use the highest peak as the fundamental frequency/pitch.
See wikipedia article on "Sinewave synthesis" for more info.
"When life gives you zombies... *CHA-CHIK* ...you make zombie-ade!"