VOGONS


First post, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've been reading http://www.logix.cz/michal/doc/i386/chp10-06.htm . Said bits in the TLB TAG have already been implemented in UniPCemu(with the top 2 bits of the address being assigned to the set).

- The set is bits 28-29 of the logical address.
- The tag consists of logical address and the r/w, u/s and dirty bits.
- Matching the tag ignores the r/w, u/s bits and dirty bits.
- Writing an existing tag updates the r/w, u/s and dirty bits.

Paging lookups work as follows: (syntax of lookup:hit(address,RW,US,dirty))
1. Read TLB hit(address,RW,US,RW)? OK. Use said entry. Finish.
2. Read TLB hit(address,RW,US,1)? OK. Use said entry. Finish.
3. Abort fail on prefetching(preventing prefetch unit from continuing, allowing the EU to handle steps 4-5).
4. Read PDE&PTE from memory and handle any faults, which abort. Mark PTE.D 1 when writing(and writeback to memory location).
5. Successful fetch. Write TLB entry(address,RW,US,PTE.D) value PTE.

Steps 1/2 ensure that the TLB for a read operation checks and allows for both dirty and non-dirty TLB, while writes check for a dirty TLB only(thus forcing steps 3-5 to execute when a write executes with a non-dirty TLB for said combination, exploiting the TLB behaviour).

Is this correct behaviour? It works when not accounting for the TR7/TR8 register behaviours, but is this what a real 80386 does as well?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 1 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

Let me make this more simple: is the information other than the resulting frame number and logical address number stored within the TLB TAG or TLB value?

Edit: This is my current TLB-using Paging checks(made before the instruction executes or before the bytes of the GDT/IDT/LDT are accessed):

int isvalidpage(uint_32 address, byte iswrite, byte CPL, byte isPrefetch) //Do we have paging without error? userlevel=CPL usually.
{
word DIR, TABLE;
byte PTEUPDATED = 0; //Not update!
uint_32 PDE, PTE; //PDE/PTE entries currently used!
if (!CPU[activeCPU].registers) return 0; //No registers available!
DIR = (address>>22)&0x3FF; //The directory entry!
TABLE = (address>>12)&0x3FF; //The table entry!

byte effectiveUS;
byte RW;
RW = iswrite?1:0; //Are we trying to write?
effectiveUS = getUserLevel(CPL); //Our effective user level!

uint_32 temp;
if (Paging_readTLB(address,RW,effectiveUS,RW,&temp)) //Cache hit not dirty? Don't check not dirty when writing(must be marked dirty, otherwise we won't handle non-dirty values below(and mark them dirty appropriately)).
{
return 1; //Valid!
}
if (Paging_readTLB(address,RW,effectiveUS,1,&temp)) //Cache hit dirty?
{
return 1; //Valid!
}
if (isPrefetch) return 0; //Stop the prefetch when not in the TLB!
//Check PDE
PDE = memory_BIUdirectrdw(PDBR+(DIR<<2)); //Read the page directory entry!
if (!(PDE&PXE_P)) //Not present?
{
raisePF(address,(RW<<1)|(effectiveUS<<2)); //Run a not present page fault!
return 0; //We have an error, abort!
}

//Check PTE
PTE = memory_BIUdirectrdw(((PDE&PXE_ADDRESSMASK)>>PXE_ADDRESSSHIFT)+(TABLE<<2)); //Read the page table entry!
if (!(PTE&PXE_P)) //Not present?
{
raisePF(address,(RW<<1)|(effectiveUS<<2)); //Run a not present page fault!
return 0; //We have an error, abort!
}

if (!verifyCPL(RW,effectiveUS,((PDE&PXE_RW)>>1),((PDE&PXE_US)>>2),((PTE&PXE_RW)>>1),((PTE&PXE_US)>>2))) //Protection fault on combined flags?
{
raisePF(address,PXE_P|(RW<<1)|(effectiveUS<<2)); //Run a not present page fault!
return 0; //We have an error, abort!
}
if (!(PTE&PXE_A))
{
PTEUPDATED = 1; //Updated!
PTE |= PXE_A; //Accessed!
}
if (iswrite) //Writing?
{
if (!(PTE&PTE_D))
{
PTEUPDATED = 1; //Updated!
}
PTE |= PTE_D; //Dirty!
}
if (!(PDE&PXE_A)) //Not accessed yet?
{
Show last 10 lines
		PDE |= PXE_A; //Accessed!
memory_BIUdirectwdw(PDBR+(DIR<<2),PDE); //Update in memory!
}
if (PTEUPDATED) //Updated?
{
memory_BIUdirectwdw(((PDE&PXE_ADDRESSMASK)>>PXE_ADDRESSSHIFT)+(TABLE<<2),PTE); //Update in memory!
}
Paging_writeTLB(address,RW,effectiveUS,(PTE&PTE_D)?1:0,(PTE&PXE_ADDRESSMASK)); //Save the PTE 32-bit address in the TLB!
return 1; //Valid!
}

As you can see, cached TLB fetches are counted as already validated for the linear address/RW/US combination and are ignored and allowed(the (next) read/write step of linear memory will perform the actual access using said PTE entry stored in the TLB).

The first two parts(two Paging_readTLB calls) checks for an already loaded, valid and protection-verified entry. If either matches, the operation succeeds on said fetched entry(as described above).

If both fail, first a check for the BIU/prefetch is made: if we're caused by the prefetching process, always abort. The prefetch unit isn't supposed to handle the page faults. This will be arrived at again when the CPU tries to fetch an instruction from said address, which will cause it to continue onwards to the next instructions(which will fill the TLB when it's valid and thus allow the prefetch unit to continue because the TLB is now filled(thus the above readTLB accesses succeed from then onwards until the next uncached TLB)).

The next step will read the PDE. It's then checked for validity using the present flag. If not present, a Page fault is asserted. Otherwise, continue on to reading the PTE.

The next step will read the PTE. It's also checked for validity using the present flag and, like the PDE, aborts with a Page fault when not present.

Next, the combination of the Read/Write and User/Supervisor bits of the PDE and PTE are checked. If the check fails, a Page fault is raised.

As the final step, since the access is deemed valid, the remaining writeback checks for Accessed and Dirty are applied to the loaded entries. They are then written back to their memory locations when modified for said accessed/dirty bits. Before deeming the access valid, the TLB entry is written for the address/RW/US/dirty, which contains the PTE to use as a result.
The function then returns success, as the loaded PTE entry in the TLB can now be used to find the physical memory address correctly.

Is this correct behaviour? Logically, this would be the only way to actually honour the bits described in the 80386 programmer's reference manual, while supporting all those settings and protection bits using the TLB(without having to walk through the page tables again each time to just check if the access is valid(by using the CPL, iswrite(RW) and Dirty information).

Of course, it will cause the TLB to contain two entries for the same address when a page is read and then written, as they use two different TLB entries for that purpose, according to the manual?).

Anyone?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 3 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

Well, one thing is for sure(by the listings of the TR6/7 register layouts): The way a key within the 80386 TLB is composed DOES include the Dirty, U/S, Valid, R/W bits, as well as the linear address required for the lookup, seeing as they're all stored within the command register. The data register contains at least the physical address(the value read from the TLB and some diagnostic bits. Seeing as it also described three bits and valid bits being stored, you can conclude that:
- The key consists of at least the linear address(upper 20 bits).
- The remaining bits(present, R/W, dirty and U/S) could be in either the key or in the value.
- Since the remaining bits reside in the command register only, together with the logical address, one can assume that they also are combined with the logical address to form a key for lookup. That way, the lookup would be able to prevent more checks to be required to run, as well as preventing the need for the CPU to strip them to obtain a physical address(as the TLB is documented to give).

Said behaviour is even confirmed by it's own text:

Tags are 24-bits wide. They contain the high-order 20 bits of the linear address, the valid bit, and three attribute bits. The data portion of each entry contains the high-order 20 bits of the physical address.

So the tag must contain all those bits(logical address bits, present bit, R/W bit, U/S bit and Dirty bit) while the data only contains the upper 20 bits retrieved from the PTE entry.

So combining that information, my emulator must be handling the lookups correctly? The method described in my earlier posts is the only correct and simple way to actually combine TLB with paging and protection while completely adhering to the documentation on the TLB without adding memory overhead(if those weren't exploited said way)?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 4 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just found a little slideshow explaining TLB vs Paging on a 80386: https://www.slideshare.net/mobile/aniketbhute … by-aniket-bhute

It matches my x86 TLB implementation.
Also page table rewalks(TLB must miss) is used in UniPCemu as specified(although only on TLB miss), to speed up protection.

Since UniPCemu uses the dirty bit combined with Read/Writes(either 0 or 1 on reads, only 1 on writes), it only walks the page tables(setting dirty&accessed bits in the tables). This also matches the documentation, while minimizing required table walks for updating bits when required. The tables need to be walked to update said bits, so the given Paging vs TLB policy allows it to adhere to the documentation on the TLB, Paging and speedy memory access(less rewalking of the Paging Tables).

Your thoughts on this?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 5 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

Just implemented the TR6 and TR7 registers based on the basics mentioned in the documentation.

I've also improved paged writes to only use the TLB entry if it's marked Dirty and as a Write. Reads allow any kind of TLB entry to be used(doesn't matter what if the entry was for a read, write, dirty or not. Only the address and valid bit is checked in that case when looked up).

That will allow the full functionality of my earlier posts to work, while also be a speedy lookup and store(so first writing then reading from the same (dirty or not) page won't cause it to pagewalk again).

Last edited by superfury on 2018-05-13, 20:25. Edited 1 time in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 6 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

I'm wondering about something now: UniPCemu now uses the TLB as follows:
- TLB reads for memory writes and reads set, for the lookup, local address and u/s bits according to the access, while setting the looked up r/w and dirty bits to 1 for the lookup. Thus only matching the TLB entry that's according to the user's rights and address, while letting the non-dirty and read cached values result in a TLB miss, thus a page walk(for protection checks and dirty bit updates).
- TLB reads for memory reads are a bit more lax: logical address and u/s are the same as for writes, but r/w and dirty bits are ignored for the lookup(for a simple speedup). Thus it can quickly match any valid entries for the address and privilege, while not requiring more lookups for addresses that have been written to only, or written dirty and read dirty. The speedup is simply archieved by using a simple mask on the tag(not used when reading the TLB using the test registers).
- TLB entries are written after not faulting reading(&writeback) the PTE lookup(and not faulting because of access privileges) , using the R/W state(0=Memory read, 1=Memory write), U/S(privilege level being 3(=1) or not(=0)), Dirty bit from the (written back(for changed access/dirty bits) dirty) PTE, logical address and P=1 in the tag, value being the high 20 bits from the PTE.

This does result in entries changing from non-dirty to dirty filling up the cache because of dirty and read/writes adding entries(e.g. read then write to same memory location resulting in two entries: first the read/(non-)dirty entry, then the second write/dirty entry).
I assume this happens in a real CPU as well, since it cannot be avoided when dealing with read/write operations? (e.g. ADD [mem],imm)
Of course the mask is only there to not have to walk the TLB multiple times for the different RW/Dirty combinations when executing a read, thus a speed optimization. It's mask disabled for a full match on the TAG when using the TR6/TR7 registers on it's handling on TR6 writes, for full 80386/80486 emulation.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 7 of 11, by crazyc

User metadata
Rank Member
Rank
Member

(e.g. read then write to same memory location resulting in two entries: first the read/(non-)dirty entry, then the second write/dirty entry).

What? Why? There's no way the real cpu would work this way. With a 4way associative tlb you'd fill it quickly with redundant entries.

Reply 8 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

Then what are the values values of the four bit values that are together with the linear address high 20 bits the complete tag? How are they used by a real CPU?

Afaik, those four bits are Present, Dirty, Read/Write and User/Supervisor(according to the 80386 documentation). But are they the direct values that are read from the PTE? Does it have a combination of the PDE/PTE in those bits? What when PDE.U!=PTE.U, and all those combinations? Same for R/W bits? What is used for the TLB tag?

How is all this then combined with protection and page walks? I'd assume the whole process is optimized for as little page walks as possible(the whole purpose of the TLB)?

Edit: I've edited the code for writing the TLB a bit, allowing the Dirty bit to update the dirty bit when a record is present with the rest of the TLB(except Dirty) bits matching. Thus non-dirty->dirty becomes an update instead of a new entry.
The combination Present/RW/US/TAG is still creating new records when not matched as a whole.

Paging code itself us unmodified.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 9 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

I've improved the RW value to contain the writable(1)/not writable(0) status when requesting a store for the TLB entry. Now the TLB contains a few fields in the key:
- Dirty: dirty status of page. only dirty changing only updates the existing entry instead of creating a new one.
- Writable: 1 for writeable, 0 for read-only.
- User: 1 for user's entry, 0 for Supervisor's entry
- Present: Always read/written as 1 by the Paging Unit. Used during lookup filter tag for selecting used entries(entries with 0 in the Present bit are ignored and counted as unused).
- Linear address 20-bits.

Thus the combination Writable/User/Present/Dirty/Address is used for lookups, while only Writable/User/Present/Address are used for selecting the entry to write when a TLB write occurs(no match means creating a new entry or using the oldest entry, depending on whether there's an unused entry in the set).

The old verifyCPL function now also returns whether the page is writable or not(only when not throwing a fault because of the current access). This value is used for the Writable bit when writing the TLB entry.

Of course, the User bit is taken directly from the CPL: 3=1, 0-2=0. It will only match for the same privilege(users accessing user entries, supervisor accessing supervisor entries). It's the same kind of filter as used for the linear address(linear address+U bit+P(always 1)=Filter for a TLB read. Writable(1)&Dirty(1) is only used for filtered during writes to check for a fault, reads don't filter them at all(matching meaning no fault and used entry)).

Is this correct behaviour?

Edit: Although, P=V=1 in the TLB always for allocated entries.

Last edited by superfury on 2018-05-15, 05:17. Edited 2 times in total.

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io

Reply 10 of 11, by crazyc

User metadata
Rank Member
Rank
Member
superfury wrote:

Afaik, those four bits are Present, Dirty, Read/Write and User/Supervisor(according to the 80386 documentation). But are they the direct values that are read from the PTE? Does it have a combination of the PDE/PTE in those bits? What when PDE.U!=PTE.U, and all those combinations? Same for R/W bits? What is used for the TLB tag?

The IA32 manual is pretty clear that the table permission is anded with the directory permission.

Reply 11 of 11, by superfury

User metadata
Rank l33t++
Rank
l33t++

UniPCemu currently applies U/S bit from the requesting CPL(address space seperation), while the writable bit for the TLB write is set by the access rights of said level combined with U/S bits of the request according to documentation(verifyCPL call does that. Any incorrect U/S(from CPL being 3 or not) combined with R/W(from action executed) comparison with PDE/PTE U/S and R/W bits faults. When it doesn't fault for that protection, the U/S from CPL is used as the U-bit for the TLB, while the W-bit depends on the PDE/PTE(according to the tables mentioned in the manual for writability. System combinations always are writable(W=1), while user can become either(depending on PDE.W and PTE.W). All according to the manual tables and descriptions)).

OPTINLINE byte verifyCPL(byte iswrite, byte userlevel, byte PDERW, byte PDEUS, byte PTERW, byte PTEUS, byte *isWritable) //userlevel=CPL or 0 (with special instructions LDT, GDT, TSS, IDT, ring-crossing CALL/INT)
{
byte uslevel; //Combined US level! 0=Supervisor, 1=User
byte rwlevel; //Combined RW level! 1=Writable, 0=Not writable
if (PDEUS&&PTEUS) //User level?
{
uslevel = 1; //We're user!
rwlevel = ((PDERW&&PTERW)?1:0); //Are we writable?
}
else //System? Allow read/write if supervisor only! Otherwise, fault!
{
uslevel = 0; //We're system!
rwlevel = 1; //Ignore read/write!
}
if ((uslevel==0) && userlevel) //System access by user isn't allowed!
{
return 0; //Fault: system access by user!
}
if (userlevel && (rwlevel==0) && iswrite) //Write to read-only page for user level?
{
return 0; //Fault: read-only write by user!
}
*isWritable = rwlevel; //Are we writable?
return 1; //OK: verified!
}

Is that correct behaviour?

Author of the UniPCemu emulator.
UniPCemu Git repository
UniPCemu for Android, Windows, PSP, Vita and Switch on itch.io