A Practical Guide to Soft Real-Time on Windows - A Checklist for Reducing Latency

· · Windows Development, Soft Real-Time, Design, Measurement

1. Short version

  • On ordinary Windows (10/11), the goal is not hard real-time guarantees, but smaller latency and jitter and fewer deadline misses
  • The first thing to revisit is not the priority value, but what you put inside the periodic thread
  • Split periodic work into a fast path (must hit timing) and a slow path (some delay is OK)
  • In the fast path, avoid Sleep-based waiting, blocking I/O, per-iteration new/malloc, and unbounded queues
  • In real deployments, AC power, power mode, timer resolution, power throttling, and background load all matter
  • Don’t evaluate by averages alone. Look at p99 / p99.9 / max / miss count / queue depth / DPC / ISR / page faults

2. Scope

In scope Out of scope
General-purpose PCs running Windows 10/11 Dedicated RTOSes or commercial RT extensions
Normal user-mode apps (C++/C#) Control logic that lives mostly in a kernel driver
Standard APIs and standard settings Moving everything onto an FPGA or microcontroller

3. Basic terms

Term Meaning
Latency Work starting or finishing later than scheduled
Jitter Variation in period or execution time. Not running at the same interval each time
Deadline miss Work that does not finish by the deadline you set

4. Checklist: the big picture

Item What to do Typical anti-pattern
Waiting strategy Drive on absolute deadlines. Use events or a high-resolution waitable timer Periodic loops based on Sleep(1)
Splitting work Separate the fast path from the slow path Saving, sending, or UI work inside the fast path
Queue Make it bounded and decide what happens on overflow Unbounded queues that just postpone the problem
Inside the fast path Move out allocations, heavy logging, blocking I/O, and heavy locks new/malloc/synchronous I/O
Priority Raise only the threads that need it. Use MMCSS for audio/video Jumping straight to REALTIME_PRIORITY_CLASS
OS / power Check AC power, power mode, and timer resolution Evaluating on battery or in a power-saving mode

5. Each item in detail

5.1 Don’t leave the periodic loop to Sleep

Sleep(1) does not mean “wait exactly 1 ms” - it means “wait at least 1 ms.” The drift accumulates.

Switch to an absolute-deadline style:

int64_t next = QpcNow() + periodTicks;

while (running)
{
    WaitUntil(next - wakeMarginTicks);
    while (QpcNow() < next) { CpuRelax(); }  // short spin only at the very end

    int64_t started = QpcNow();
    FastStep();  // no blocking, no alloc, no heavy lock
    int64_t finished = QpcNow();

    RecordTiming(next, started, finished);
    next += periodTicks;  // key point: do NOT use next = now + period

    while (finished > next)
    {
        ++missedDeadlines;  // record the slip
        next += periodTicks;
    }
}

Two points:

  • Don’t use next = now + period each time (the drift won’t accumulate)
  • Decide in advance what to do when you do fall behind

5.2 Split fast path and slow path

Device/timer -> [fast path: acquire / control / minimal copy] -> [bounded queue] -> [slow path: format / save / send / UI]

Only these belong in the fast path:

  • Data acquisition
  • Control value calculation
  • Minimal copying
  • Timestamping
  • Enqueueing
  • Recording misses and overruns

Everything else goes to the slow path.

5.3 Queues should be bounded

An unbounded queue just hides the latency.

Pick one of three policies for overflow:

  • Latest wins: drop older data
  • No loss allowed: error / stop / alert
  • Logging only: record the drop count

5.4 Don’t put heavy work in the fast path

Things to avoid:

  • File writes
  • Network sends
  • Database writes
  • Heavy log output
  • Per-iteration new / malloc / List<T>.Add
  • Per-iteration string concatenation or ToString()
  • Heavy locks
  • First-touch code that tends to trigger page faults

Watch out especially for:

  1. Allocation and deallocation: in the fast path, preallocate buffers and reuse them
  2. Blocking I/O: it may look fast on your dev machine, but it will jitter in production
  3. Page faults: touch the memory you need once at startup

5.5 Raise priority only for the threads that need it

Basic policy:

  • Keep UI and ordinary worker threads at normal priority
  • Raise only the fast-path thread, and only as needed
  • Consider background mode for save/send/log-aggregation threads
  • Think per thread, not per process

For continuous audio/video streams, consider MMCSS first:

DWORD taskIndex = 0;
HANDLE hAvrt = AvSetMmThreadCharacteristicsW(L"Audio", &taskIndex);
// ... work ...
AvRevertMmThreadCharacteristics(hAvrt);

REALTIME_PRIORITY_CLASS has large side effects, so use it only on a dedicated machine, after thorough testing, and only when you really need it.

5.6 Power-settings checklist

  1. Run on AC power (no amount of tuning will be stable on battery)
  2. Set the power mode toward “Best performance”
  3. Create a dedicated power plan if needed (Balanced for everyday use, the dedicated one only in production)
  4. Check minimum/maximum processor state (it’s worth trying 100%/100% on AC)
  5. Check process power throttling / EcoQoS
  6. Cut unnecessary background load (cloud sync, auto-update, resident monitors, and so on)
  7. Test minimized and hidden states too (on Windows 11 the timer resolution can change for hidden apps)
  8. Save BIOS/UEFI for last (C-states and quiet-mode settings vary a lot by hardware)

6. Measurement and evaluation

6.1 What to record

  • Scheduled period time, actual start time, actual finish time
  • Lateness
  • Execution time
  • Missed deadline count / consecutive missed deadline count
  • Queue depth / drop count
  • DPC / ISR spikes
  • Page faults
  • Temperature / clock variation

6.2 How to read p99 / p99.9 / max

  • Average: the overall trend. Big delays are easily buried
  • p99: out of 1000 samples, the upper bound after dropping the 10 slowest
  • p99.9: out of 1000 samples, the upper bound after dropping the 1 slowest
  • max: the worst case

If you want to catch “usually fine, occasionally hangs,” p99 / p99.9 / max are essential.

6.3 Tools to use

  • In-app measurement: first, take period, lateness, execution time, and queue depth yourself
  • ETW / WPR / WPA: look at CPU, context switches, DPC/ISR, page faults
  • LatencyMon: get a feel for driver-induced jitter
  • Temperature/clock monitoring: check the impact of thermal throttling

6.4 Test conditions

Test these conditions separately:

  • Right after startup (before warm-up) / after warm-up
  • Long continuous runs
  • UI in the foreground / UI minimized or hidden
  • AC power / battery
  • With network or disk load

7. Rough rules of thumb

Required level Realistic approach
~10-20 ms range, occasional jitter is OK Fast/slow split, bounded queue, normal priority, event-driven; that’s enough
~1-5 ms range, must keep up continuously Allocation-free fast path, dedicated thread, MMCSS, high-resolution waitable timer, AC power
Sub-1 ms, sustained high load Hard to do in user mode alone. Move the critical part elsewhere (FPGA, RTOS, etc.)
Lives alongside GUI / logging / networking / DB Don’t cram it into one process and one loop. Separate responsibilities

8. Summary

The order to improve in practice:

  1. Is the periodic loop relying on Sleep?
  2. Are the fast path and slow path separated?
  3. Is the queue bounded, and is the overflow policy decided?
  4. Is there I/O, allocation, or a heavy lock inside the fast path?
  5. Are priority and MMCSS used only on the threads that need them?
  6. Are power settings and measurement aligned?

Even on ordinary Windows, if you align design, waiting strategy, power settings, and measurement, soft real-time becomes very practical.

9. References

Related Articles

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Related Topics

These topic pages place the article in a broader service and decision context.

Where This Topic Connects

This article connects naturally to the following service pages.

Back to the Blog