I'll let others correct me but this is my understanding of how it works.
A 256KB cache is 2^18 bytes. The cache is divided into 16-byte (2^4) lines to match the line size of the L1 cache. So, there are 2^18/2^4=2^14 lines in the cache.
That tells you that your tag RAM must have at least 2^14 entries, or 16KB for one-byte tags.
Now, as far as how much RAM can be cached? Take a look at the bits in the memory address:
4 bits are used to identify the byte within the 16-byte cache line.
14 bits are used to identify which cache line (486 caches are direct-mapped, so a memory address can either be at only one specific location in the cache or not in the cache at all at any given time - kind of like a hash table with no capability to handle collisions)
8 bits are used from the tag to identify the "rest" of a memory address that is occupying a cache line at that time. This is for a write-through cache, and that would give you 2^(4+14+8)=2^26=64MB cacheable area. The tag only being 8 bits is what limits the cacheable area.
For a write-back cache, one bit of the tag tends to be robbed as a dirty bit (that identifies whether the data in cache are newer than what are in RAM - an impossible situation for a write-through cache) cutting the cacheable area in half.
So you would get
64KB cache -> 16MB cacheable (write-through), 8MB cacheable (write-back); tag ram >= 4KB
128KB cache -> 32MB cacheable (write-through), 16MB cacheable (write-back); tag ram >= 8KB
256KB cache -> 64MB cacheable (write-through), 32MB cacheable (write-back); tag ram >= 16KB
512KB cache -> 128MB cacheable (write-through), 64MB cacheable (write-back); tag ram >= 32KB
1024KB cache -> 256MB cacheable (write-through), 128MB cacheable (write-back); tag ram >= 64KB
If the cache used a separate chip to track the dirty bits (mkarcher are you out there?) instead of robbing a tag bit the write-back cacheable area would be the same as write-through.