VOGONS


First post, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

I've only gone through a little bit, dma.cpp and mixer.cpp. For instance, dma.o is 821 bytes smaller. Less cache thrashing. Size drops noted for various changes in the comments. edit: changes in mouse aren't size opt, just some other changes that slipped into the patch, mostly applying the bound optimization you can see in mixer.cpp.

Attachments

  • Filename
    what.diff
    File size
    17.84 KiB
    Downloads
    427 downloads
    File license
    Fair use/fair dealing exception

Reply 1 of 12, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

More; a notable one being shaved ~800 bytes off of adlib/opl handling.

Attachments

  • Filename
    what2.diff
    File size
    29.92 KiB
    Downloads
    365 downloads
    File license
    Fair use/fair dealing exception

Reply 4 of 12, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

I would say then, not really,that just covers prefetch. If you're a bit rusty on caching, like me, there's mapped to fully associative, with n-way associative in between. Memory locations map to particular cache lines, with mapped it's one to one, with full accoc. any address can be mapped to a cache line, and n-way, n addresses are mapped to a particular cache line. With n-way, if I understand correctly, they go by page/high bits in mapping addresses. I've been trying to find a reference that gives a non-psuedo example of address calculation to no avail. There's the question of where the function starts in a cache line, not necessarily at the beginning?, and determining the cache line for skipping past capture handling, in calculating how many cache lines the function's common path would reside.

Reply 5 of 12, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

The beginning of mixdata is 274 bytes, the end is 276 bytes, function call is four, passing Bitu needed is four, and cache lines are 64 bytes. If one were to assume the start of the function was the start of a cache line, the last cache line filled for the beginning of mixdata not only includes the function call, but the instructions past it. The cache line fills would be consecutive and so it would be 558/64=8.71 or 9 cache lines. The capture handling is 125 bytes, which would have it skip down to past capture handling and start a new cache line fill. 274/64=4.28 or 5 cache lines + 276/64=4.31 or 5 cache lines; 10 cache lines.

Reply 6 of 12, by wd

User metadata
Rank DOSBox Author
Rank
DOSBox Author

Cache associativity, cache line size, total cache size, line replacement strategies
are quite different over the various processor types and their cache levels,
so you should better not assume too much about these parameters.
Maybe Moe wants to tell more about that.

Reply 10 of 12, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

+pic

There's a bug in this one.

Attachments

  • Filename
    what3.diff
    File size
    34.83 KiB
    Downloads
    413 downloads
    File license
    Fair use/fair dealing exception
Last edited by ih8registrations on 2007-09-26, 16:32. Edited 1 time in total.

Reply 11 of 12, by `Moe`

User metadata
Rank Oldbie
Rank
Oldbie

About caches:

Imagine a non-associative 4KB cache: A memory location can't be put anywhere in the cache, but only at the location that matches it's lowest 12 bits. So address 0x0000 and address 0x1000 share the same cache location. If you load a continuous area, like a piece of linear code, it doesn't matter, since that is, by definition, linear. If 0x0000 is in the cache, and 0x1000 is to be loaded, then 0x0000 must be removed, even if the rest of the cache is unused.

Now imagine a 16KB Cache, 4-way associative. You can think of it as 4 non-associative caches, 4KB each. If 0x0000 is loaded and 0x1000 is to be loaded, 0x1000 is put in the second "way". 0x2000 and 0x3000 can also be loaded, and only when 0x4000 is loaded, one of the 4 previous locations will be thrown out.

4-way is quite common. If your code+data fits into 64kb (even if split across 2-4 distinct memory blocks), practically every CPU since the Pentium 2 will fit it into it's L1 cache. Use a Linux machine and OProfile to get a detailed profile on what locations suffer the most cache misses.

GCC will obey the GCC_UNLIKELY flag and put that code path out of the way, arranging code flow for minimal branching. If you use PGO, you don't even need GCC_UNLIKELY -- the profile data will be used to decide which code path is the hottest (that was the very first use of PGO). Use an as new gcc as you can get, as the PGO features are still quite new and new profile-guided optimization steps are added all the time.

Reply 12 of 12, by ih8registrations

User metadata
Rank Oldbie
Rank
Oldbie

The bug is an issue with pointing to a function inside c++ namespace. Pukes from oplwrite= OPL2::YM3812Write/THEOPL3::YMF262Write iin adlib.cpp. Shows up with Master of Orion. Oh, how I love c++ and oop.. Anyone know how to get past this retardedness?