That is indeed strange.
The fact that a 386sx can only handle 16mb might have something to do with that use of cache, so it can handle more ram with less tag ram.?
But then again, a 386 sx has probably only 16bit chunks for the chipset instead of a 32bit for a dx chipset. so the tag table would be needed even greater instead of caching 32bit adress.
So the 32KB might be caching 2K lines of 16 bits words instead of a dx what would do 1K lines of 32bit words in its cache so the tag would be bigger on a sx???.
Or the 386sx what internally still is a 32bit one, does 2 reads at one request from the memory 16bit seperated, and the cache is still build in 32bit chunks, in that way the tag should be equal because the chipset knows the next line would have to be adressed also to read the 2nd 16bit word to fill the 386 32bit requests. so this would save the tag space in the end.
I dunno.
I need someone l33t to explain this to me, how stuff works.