How to Compare Program Versions on Windows: From Power Mode Setup to the Limits of Repeatability
Bottom line first
Six points for taking repeatable benchmarks:
- Decide what you want to compare before you start: the code difference, or the actual user experience?
- Treat power mode and power plan as separate things, and record both
- Separate the cold first run from the steady state after warm-up
- Alternate runs in A->B->A->B order
- Look at the median and the spread, not just the average
- If the difference is small, dig into the cause with ETW/WPR
Decide the type of comparison up front
Comparing the code itself
When you want to know the impact of an algorithm change or compiler optimization on the implementation itself. Strip out as much environmental noise as you can (clean boot, fixed power mode, notifications off, and so on).
Comparing the real user experience
When you want to know how fast it actually feels to users after release. Do not strip out the noise. Compare in an everyday environment, including OneDrive sync, Defender, notifications, and the rest.
Mixing these two leads to twisted conclusions (“12% faster in the lab, but indistinguishable in real life”).
Main sources of variance
| Layer | Source of variance | Typical example |
|---|---|---|
| Hardware | CPU/GPU, memory, SSD, cooling | Thin laptops, with or without a cooling pad |
| Firmware | BIOS/UEFI, OEM controls | Power-saving policies, fan control |
| OS | Windows build, drivers, update state | Behavior changes after an update |
| Power | AC/DC, power mode, power plan | Battery operation is a different world |
| Thermals | Room temperature, fans, recent load | Turbo on the first run, then drops off |
| Background | Update, Defender, sync, notifications | A scan or sync kicks in mid-run |
| Data/cache | OS cache, app cache | Slow only on the first run, fast only after |
Pin down the power mode and power plan
These are two different things
- Power mode: chosen from the Settings app under “System > Power & battery” (Best power efficiency / Balanced / Best performance)
- Power plan: the classic power scheme (Balanced / High performance, etc.). Check it with
powercfg
How to pin them
- Always run the comparison on AC power for laptops
- Pin the power mode (use Best performance for benchmarking)
- Record the active power plan:
powercfg /list
powercfg /getactivescheme
- Switch to High performance if needed:
REM High performance
powercfg /setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c
Caveats
- High performance may not appear on Modern Standby devices (a design constraint of the model)
- If you cannot change the power mode, a custom power plan may be selected. Try selecting Balanced first
Crush background noise
- Reboot and wait a few minutes: right after boot, updates, indexing, sync, and Defender are all running wild
- For serious comparisons, do a clean boot: stop non-Microsoft services with
msconfig. This is for lab comparisons that focus on the code difference - Turn off notifications: enable Do not disturb
- Suppress search indexing and sync: exclude your benchmark directory from indexing. Stop OneDrive/Dropbox. Close browsers and Teams
Match thermals
A cold CPU/GPU and a warmed-up one are two different beasts. This is especially obvious on laptops.
- Keep the room temperature consistent
- Fix how the laptop is positioned
- Fix the AC adapter, dock, and external display setup
- Avoid heavy work right before benchmarking
- Measure the first run and the steady state separately
Alternate the run order
10 runs of A followed by 10 runs of B -> bad (thermal and cache bias gets baked in).
Recommended:
A B A B A B ...A B B A A B B A ...- Pre-generate a random order and run it
What to measure
| Metric | How to obtain it | Use |
|---|---|---|
| Wall-clock time | QueryPerformanceCounter / Stopwatch |
Closest to user-perceived speed |
| CPU time (user + kernel) | GetProcessTimes |
Computational efficiency |
| Cycle count | QueryProcessCycleTime |
Computational load excluding wait time |
Read the metrics in combination
- Only wall-clock got faster -> probably an I/O, wait-time, or cache improvement
- CPU time and cycles both dropped -> the implementation itself got lighter
- Only the first run is slow/fast -> a cold/warm difference (startup, initialization, JIT)
- Gets slower as runs accumulate -> thermals, throttling, or memory pressure
priority and affinity are last resorts
If there is a difference at default settings, that difference itself is meaningful. Reaching for /high or /affinity from the start introduces conditions you would never see on real Windows.
start "" /high /wait myapp.exe --bench case1.json
start "" /affinity F /high /wait myapp.exe --bench case1.json
Do not use /realtime. It does not remove noise; it creates a different kind of accident.
A practical measurement procedure
- Pin what you are comparing (commit hash, build number, Debug/Release, logging on/off)
- Pin the machine conditions (Windows build, BIOS version, AC connected, room temperature)
- Pin the power conditions (record power mode and active power plan)
- Reboot and wait a few minutes
- Do a clean boot if needed
- Add a warm-up
- Alternate A and B (with enough total runs)
- Keep median, min, max, and p95
- Save the raw data
- If the difference is small, capture an ETW/WPR trace
Fields worth recording
timestamp, version, scenario, elapsed_ms, user_ms, kernel_ms, cycles,
power_mode, power_plan, ac_or_dc, room_temp_c, notes
If possible:
cpu_package_temp_start_c, cpu_package_temp_end_c,
affinity_mask, priority_class, windows_build, driver_version
Dig into “why is it faster?” with ETW/WPR
When the difference is small, or the reason is hard to read, reach for ETW (Event Tracing for Windows).
wpr -start CPU -filemode
REM run the benchmark here
wpr -stop trace.etl
From there you can argue with reasons attached, like “B has less lock contention so ready time dropped” or “A has more file opens, so the cold start is slow.”
Wrap-up
What actually moves the needle when comparing versions on Windows is unglamorous discipline that helps repeatability:
- Pin AC / power mode / power plan, and record them
- Separate cold and warm
- Alternate A and B
- Look at the median and the distribution
- Do a clean boot if needed
- If the difference is small, dig into the reason with ETW/WPR
The most important thing is to write down what you fixed and what you did not, alongside the results. Benchmark results without the conditions written down are unreliable as far as repeatability goes.
References
- Microsoft Support: Change the power mode for your Windows PC
- Microsoft Learn: Power Policy Settings
- Microsoft Learn: Customize the Windows performance power slider
- Microsoft Learn: Powercfg command-line options
- Microsoft Support: How to perform a clean boot in Windows
- Microsoft Support: Notifications and Do Not Disturb in Windows
- Microsoft Support: Search indexing in Windows
- Microsoft Learn: Configure custom exclusions for Microsoft Defender Antivirus
- Microsoft Support: Device Security in the Windows Security App
- Microsoft Learn: QueryPerformanceCounter function
- Microsoft Learn: Acquiring high-resolution time stamps
- Microsoft Learn: GetProcessTimes function
- Microsoft Learn: QueryProcessCycleTime function
- Microsoft Learn: start command
- Microsoft Learn: SetPriorityClass function
- Microsoft Learn: SetProcessAffinityMask function
- Microsoft Learn: Processor Groups
- Microsoft Learn: Windows Performance Recorder
- Microsoft Learn: WPR Command-Line Options
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
What Windows Efficiency Mode Is: What the Green Leaf Icon in Windows 11 Means and How to Turn It Off
A practical look at the green leaf icon (Efficiency Mode) you see in Windows 11: what it actually does, when Windows applies it automatic...
How Should You Measure and Compare the Speed of Programming Languages? A Practical Guide to Benchmarking C# / C++ / Java / Go on Equal Terms
A practitioner-oriented guide to comparing the runtime speed of C#, C++, Java, and Go fairly: how to design measurements, fix the environ...
Sorting out Windows text encodings and line endings - Shift_JIS / UTF-8 / UTF-16, mojibake, CRLF / LF, and why it gets confusing
A practical guide that breaks Windows text-file trouble down into independent pieces — bytes, encoding, BOM, and CRLF / LF — and walks th...
What ClickOnce Actually Is: How It Works, How Updates Flow, and Where It Fits in Practice
A practical look at ClickOnce — how the manifests, auto-updates, per-version cache, and signing fit together, why it shines for internal ...
How to Use Windows Sandbox to Speed Up Windows App Validation - Admin Rights, Clean Environments, and Reproducing Missing-Permission or Low-Resource Cases
A practical guide to validating Windows apps with Windows Sandbox. Covers first-install checks in a clean environment, isolating admin-ri...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.