I think these are generally the timings that you "should" be able to get:
60ns FPM DRAM: 0WS at 33MHz, 1WS at 40MHz, 1WS at 50MHz, 2WS at 60MHz
70ns FPM DRAM: 0WS at 25MHz, 1WS at 33MHz, 1WS at 40MHz, 2WS at 50MHz
15ns SRAM: 2-1-1-1 at 33MHz, 3-1-1-1 at 40MHz, 3-2-2-2 at 50MHz
20ns SRAM: 2-1-1-1 at 25MHz, 3-1-1-1 at 33MHz, 3-2-2-2 at 40MHz
Sometimes 3-1-1-1 won't work but 2-2-2-2 will. 60ns RAM will often work with zero waitstates at 40MHz.
What happens on a 486 when there is a read that misses the L1 cache is that it reads a block of 16 bytes (4 32-bit words, this is the cache "line size") from the motherboard (whether that be L2 or RAM). So when you have 2-1-1-1 timing that means that it can read the first 32 bits in 2 cycles, and the next three reads take one cycle each. So reading 16 bytes from the L2 takes 5 bus cycles. If you have a DX2 CPU then 5 bus cycles equals 10 CPU cycles, and so on.
Reading from DRAM is usually 4-2-2-2 when you have it set at 0 WS. But it depends on the chipset. Once I had a 486-16 with 60ns RAM and no L2. It used 2-1-1-1 timing to access RAM.
On a 386 the cache line size could be different, since it is all handled off-chip. Who knows?
Sometimes you can work out what is going on from memory benchmarks. For instance, if you have a 486DX2-80, and you get 76MB/s for L1 and 38MB/s for L2, then you can figure 16 cycles for reading 16 bytes in the L1 (4 cycles for one LODSD instruction) plus an additional 16 cycles to read from L2, which is 8 bus cycles, which would suggest 2-2-2-2 timing...
It gets trickier though. RAM access can be interrupted by memory refresh, which would slightly lower the benchmark results. Some chipsets also have a huge penalty for L2 misses (turn off the L2 in the BIOS and your RAM performance can jump way up)
The "60ns" for 60ns FPM DRAM is the time for a random access (could be the first one in a series) but NOT including RAS precharge, which is sort of like a rest period. RAS precharge itself takes almost as long the stated access time (60ns), so at a minimum it might be 40-50ns. Consecutive memory access in the same page is faster, it takes half or less than the nominal 60ns, so it could be 20-30ns. So this is roughly how the timing breaks down for 4-2-2-2 at 33MHz. One cycle is 30ns. RAS precharge gets 2 cycles. First memory access is 2 cycles. Second, third, fourth are each 2 cycles. Why not one cycle for page mode read? Once the data is ready at the memory chip, it has to remain on the bus for a time so the CPU/chipset can read it before we can start the next read. (With EDO the cycles can overlap a bit, and 3-1-1-1 would be possible at 33MHz)