For now, I've just adjusted the paging emulation to invalidate(evict) 4KB equivalent pages from the TLB when a 4MB one is added(and vice versa, removing the 4MB one if a 4KB one inside it is written to the TLB). And of course, the main translation simply tries to read the 4MB TLB (finishing if succeeding), then 4KB (also finishing when succeeding), then on a TLB miss, reading the 4KB(PTE) or 4MB(PDE), depending on what's in memory, writing that to the TLB when it doesn't fault, invalidating the other size of TLB(so a 4KB PTE entry being written invalidates the corresponding 4MB TLB entries and a 4MB PDE entry being written invalidates the corresponding 4KB TLB entries).
Of course, the invalidation uses a simple filter(just the 4MB linear address that is compared to determine whether or not to evict an entry in it's corresponding TLB entry with a set that has the set corresponding to the PDE/PTE when it would have been a 4KB(for 4MB TLB being written) or 4MB(for 4KB TLB being written). Simply, this is just a comparison of the top 10 bits against any 4KB/4MB TLB entry for said linear address, removing it from the TLB before writing the new PTE/PDE to a fresh TLB entry(or overwriting the existing 4KB/4MB entry when it's already in the TLB of the correct size). And of course updating the new or reused(when it's not in the TLB yet and all entries for said set are in use) entry to be the MRU.
Would that be correct behaviour?