First post, by OttoPS
Hi everyone,
I want to share p4tool, a DOS utility for Intel Pentium 4 / NetBurst systems focused on performance control beyond traditional slowdown methods.
Project page:
GitHub repository
Most slowdown tools on these systems rely on one of these approaches:
ODCM duty-cycle throttling
- Reduces CPU performance
- Also reduces external BUS throughput proportionally
- Affects overall system responsiveness (video, memory, I/O)
In practice, this behaves as a global throttle affecting the entire platform.
Cache disabling via CR0
- Produces inconsistent or limited effects on NetBurst
- Does not fully disable all cache structures (trace/uop cache remains active)
- Not suitable for fine-grained control
- Can be overridden by software (e.g. Ultima VII re-enabling cache at runtime)
This makes CR0-based approaches unreliable.
What p4tool does differently
p4tool avoids CR0-based mechanisms and instead relies on:
- MSR-based control
- MTRR memory policy manipulation
- Debug Store (DS) / Branch Trace Store (BTS) effects
These allow independent control over:
- CPU execution behavior
- Memory access characteristics
- Overall system responsiveness
Techniques implemented
- ODCM throttling (baseline reference)
- MSR-based full cache disable (true global uncached mode)
- Debug Store (DS) slowdown
- Debug Store + BTS slowdown
- MTRR manipulation (main RAM, base 0)
- IA32_MTRRdefType override (global memory type)
Key observations
These techniques behave very differently:
- ODCM -> global slowdown (CPU + memory + bus + video all degrade together)
- MSR full uncached mode -> strong, consistent system-wide slowdown
- DS / BTS -> CPU execution degradation without bus impact
- MTRR (range-based) -> memory behavior changes without affecting instruction flow
Some methods can significantly reduce CPU performance while keeping video throughput relatively stable, unlike ODCM.
NetBurst-specific notes
- CR0 does not fully disable cache effects
- Range-based MTRRs only affect the data path
- Trace cache (uop cache) remains active unless global policies are used
Because of this:
p4tool does not rely on CR0
Also, some DOS software (like Ultima VII) modifies CR0 at runtime, which can break traditional slowdown tools.
Using IA32_MTRRdefType instead provides a stable and consistent global uncached mode.
Planned comparisons (SpeedSys)
ODCM-only slowdown
- CPU performance reduced
- Memory throughput reduced
- Video bandwidth significantly degraded
- Fully proportional slowdown across the platform
Full cache disable (MSR-based)
- Strong CPU performance reduction
- Memory access significantly slower
- Consistent and predictable behavior
- Affects both data cache and trace cache
This produces a true global uncached state on NetBurst.
Debug Store / BTS slowdown
- CPU performance reduced
- Memory behavior affected differently
- Does not behave like a cache-level slowdown
- Video throughput remains comparatively stable
MTRR (main RAM base 0)
- Strong impact on memory throughput
- CPU affected indirectly
- Different profile compared to full uncached mode
Combined techniques
- Fine-grained performance tuning
- Intermediate performance levels
- Better balance between CPU / memory / video
Goal
The goal is not just to slow down a Pentium 4 system, but to make it usable across performance ranges that are normally:
- Too fast with standard throttling
- Or far too slow with cache-based approaches
If there’s interest, I can also share more technical details about:
- MSR-based cache/memory control
- Debug Store / BTS behavior on NetBurst
- Practical differences between slowdown techniques

