Reading Crash Dumps with WinDbg + SOS — A Practical Guide to Analysis After Collection
· Go Komura · WinDbg, SOS, Crash Dump, .NET, CSharp, Debugging, PDB, Bug Investigation, Technical Consulting
A previous article, “An Introduction to Collecting Windows Crash Dumps,” walked through collecting dumps using WER LocalDumps, ProcDump, and MiniDumpWriteDump. But a dump that has merely been collected tells you nothing on its own. It only becomes useful material for an investigation once you actually get your hands dirty and work out which thread crashed, why it crashed, or what is holding on to memory.
This article picks up where the collection article left off, focusing narrowly on actually reading a collected dump with WinDbg and the SOS extension. It covers installation and symbol configuration, loading the SOS extension that is essential for .NET applications, what to look at and how to judge it with representative commands like !clrstack and !dumpheap -stat, !analyze -v for native crashes, and when to reach for dotnet-dump analyze instead of WinDbg.
1. The short version first
- The main tool for dump analysis is WinDbg (the current version, formerly called WinDbg Preview). You can get it via
winget install Microsoft.WinDbgor from the Microsoft Store, and it runs on x64/ARM64 for Windows 10 Anniversary Update (1607) and later, and Windows 11.1 !clrstack,!dumpheap -stat,!gcroot, and similar commands work directly off CLR metadata and heap data, so they function even without symbols (PDBs) loaded. What you lose is managed source file names/line numbers and symbol names for native frames. That said, if you want to trace things down to the source line, that’s a different story, and the standard practice is to point_NT_SYMBOL_PATHat both Microsoft’s public symbol server and your own PDB location.2- For .NET (Framework / Core / 5+) application dumps, managed information only becomes visible once you load the SOS extension. You cannot trace C# code with the native
k(stack display) command alone.3 - There are three representative investigation patterns: if it crashed on an exception, start with
!clrstack→!pe; if memory keeps growing, start with!dumpheap -stat→!gcroot; for a native crash, start with!analyze -v. dotnet-dump analyzeis an option if you want to skip WinDbg. Most SOS commands work as-is, but it cannot handle native stack frames. For managed-only investigations that don’t involve native DLLs or COM, it’s a lighter-weight option to set up.4- The procedures described here are about how to read a dump, not how to collect one. For how dumps are collected (WER / ProcDump /
MiniDumpWriteDump), see the collection article; for designing your crash-time setup so logs and dumps line up, see “Designing Log and Dump Retention for Windows App Crashes.”
2. Installing WinDbg and configuring symbols
2.1 Installation
The current version of WinDbg can be installed either of these ways.1
winget install Microsoft.WinDbg
Installing via the Microsoft Store gets you the same engine, with identical commands, extensions, and workflow. After installation it updates automatically (in the background for Store/direct installs, or via winget upgrade Microsoft.WinDbg for the winget install), so you won’t run into much behavioral variance across versions.1
2.2 Opening a dump
windbg -z C:\CrashDumps\MyApp\MyApp_20260702_101500.dmp
-z is the option that specifies a dump file to open at startup. You can do the same thing from the GUI via “File > Open Dump File.”
2.3 Configuring the symbol path
Where the Windows debugger looks for symbol files (PDBs) is controlled by the _NT_SYMBOL_PATH environment variable, or the .sympath command within a session.2 In practice, the standard approach is to point it at both Microsoft’s public symbol server and your own PDB location.
.symfix C:\Symbols\Microsoft
.sympath+ C:\Symbols\MyApp
.reload
.symfixis a shortcut that sets the path to Microsoft’s public symbol server (https://msdl.microsoft.com/download/symbols), together with a specified local cache. Symbols for standard OS DLLs are automatically downloaded from here.5.sympath+appends your own PDB location to the existing path. You need to provide PDBs for your own code yourself — they are not hosted on Microsoft’s symbol server..reloadreloads and lets you check the symbol status in the module list.
To set this permanently via an environment variable, use the following form. This is better suited to automated analysis on CI or build servers.
set _NT_SYMBOL_PATH=srv*C:\Symbols\Microsoft*https://msdl.microsoft.com/download/symbols;C:\Symbols\MyApp
To confirm symbols are being read correctly, list loaded modules with the lm (loaded modules) command and check whether the target module shows pdb symbols. If it’s still deferred, the symbols haven’t been resolved yet.
3. Loading the SOS extension
For .NET application dumps, native WinDbg commands alone cannot show you the contents of the managed heap, C# stack frames, or the contents of exception objects. This gap is filled by the SOS (Son of Strike) extension. It provides heap inspection, heap corruption detection, display of internal runtime data types, and insight into the state of running managed code, all via SOS commands.3
3.1 Differences by runtime
Which runtime you need to load, and where SOS comes from, depends on whether the target application is .NET Framework or .NET (Core) / .NET 5+.
| Target | Runtime module | Load command |
|---|---|---|
| .NET Framework | clr.dll |
.loadby sos clr |
| .NET Core / .NET 5+ | coreclr.dll |
.loadby sos coreclr |
.loadby is a command that looks in the same directory as a specified module (clr or coreclr) for the extension DLL (sos.dll) and loads it from there. Its advantage is that it reliably picks up the version of SOS matching the environment where the dump was captured, without you having to type a full path.6
In WinDbg/cdb version 10.0.18317.1001 and later, if the debugger detects that the target process has loaded coreclr.dll (or libcoreclr.so on Linux/macOS), it automatically loads the .NET extension from the Microsoft Extension Gallery.6 The .loadby command above is meant for cases where automatic loading doesn’t kick in, or when you’re using an older version of the debugger.
3.2 When SOS can’t be found
In environments where automatic loading doesn’t work, you can install it locally with the dotnet-sos tool.
dotnet tool install --global dotnet-sos
dotnet-sos install
After installation, you can also load it manually within WinDbg as follows (this may be necessary with older debuggers).7
.load %USERPROFILE%\.dotnet\sos\sos.dll
3.3 Confirming it loaded
!sos.help
Alternatively, try !Threads for Core-family targets, or !sosstatus for Framework-family targets — if it doesn’t error out and returns some information, the load succeeded. If a command fails here with an error like Unable to find module, the cause is almost always a symbol path or runtime mismatch (e.g., a bitness or version mismatch between the environment the dump was captured in and the local runtime). It’s not unusual in practice to get stuck right here, before even reaching the commands in the next chapter.
4. Reading exceptions and stacks — !clrstack and !pe
This is the first move for a dump that crashed on an unhandled exception.
!threads
First, use !Threads (aliased as clrthreads in lldb environments) to list managed threads and check the Exception column for each one.8 If a thread has an exception, switch to that thread.
~5s
!clrstack
!CLRStack displays a stack trace of managed code only.9 If you want to see arguments and variables too, add -a (a shortcut combining -l and -p).
!clrstack -a
- If your own code’s methods show up, you can directly read off where and through what call path it crashed. Source file names and line numbers appearing depends on symbols being loaded correctly (Chapter 2).
CLRStackenumerates managed frames directly from CLR metadata, so whether symbols are present has no bearing on whether frames are displayed. If symbols are missing, all you lose is the source file name and line number — frames themselves are never omitted.9 - If none of your own code’s frames appear at all, suspect something other than missing symbols. Possible causes include: you’ve selected a different thread that doesn’t have the exception (a thread-selection mistake), the crash happened purely on the native side so managed frames simply don’t exist, or the dump type (e.g., Mini) doesn’t include enough stack information at that point in time.
Next, look at the exception object itself.
!pe
!PrintException (abbreviated !pe) displays the last exception thrown on the current thread if no address is specified. You get the type name, message, inner exceptions (shown with -nested), and even the stack trace string.10 For exceptions like System.NullReferenceException, where the type name alone tells you nothing, you’ll need to cross-reference it against the local variable values visible via !clrstack -a.
5. Tracking the heap and leaks — !dumpheap -stat and !gcroot
These commands are central to investigations of the “memory creeps up slowly and it crashes hours or days later” variety. The preliminary work of distinguishing GC backlog from a genuine leak was covered in detail in “Telling GC Backlog Apart from a Memory Leak in .NET.” This article picks up as a continuation of that, going into the part where you read a single dump and dig into what’s actually holding on to memory.
!dumpheap -stat
The -stat option displays only a statistical summary of the managed heap. Types are shown roughly in descending order of count and total size, so you can first identify “the type that’s dominating by sheer volume.”11 There are two patterns commonly seen in practice:
- A domain class itself is growing (e.g., hundreds of thousands of
MyApp.Models.Customerinstances) — there’s a strong reference somewhere holding them alive - Only
System.Stringor arrays are conspicuously numerous — this is often the result of internal data belonging to a domain class near the top of the list; it’s usually faster to suspect the domain class first rather than examining individual instances right away
Once you’ve narrowed down the target, grab the addresses of individual instances.
!dumpheap -type MyApp.Models.Customer
Then investigate why that object hasn’t been collected by the GC.
!gcroot 000001a2b3c4d5e0
!GCRoot searches the entire managed heap and handle table, and enumerates the roots (stack variables, static fields, GC handles, etc.) that reach the specified object.12 If the output shows a static field used for caching or an event handler subscription, that’s a prime suspect for a missing unsubscribe. The following kind of “left in a cache with no way to release it” code is a classic example:
public static class CustomerCache
{
// A static dictionary with no eviction path — it only ever grows
private static readonly Dictionary<int, Customer> _cache = new();
public static void Add(Customer c) => _cache[c.Id] = c;
}
If a static container like CustomerCache shows up in the !gcroot output, that’s material for considering an expiration policy, a size cap, or switching to a WeakReference on the code side.11
6. Automated analysis of native crashes — !analyze -v
For native crashes (access violations, etc.) involving C++ DLLs, COM, or vendor SDKs, this is the command to start with.
!analyze -v
!analyze is an extension command that performs automated crash/exception analysis, and -v gives verbose output.13 There are three fields in particular worth focusing on:
EXCEPTION_CODE/BUGCHECK_STR: what kind of fault occurred (access violation, stack overflow, etc.)FAULTING_IP/FOLLOWUP_IP: the address of the instruction that actually crashed, and the corresponding module/function nameMODULE_NAME/IMAGE_NAME: whether the crash location is your own module or a third-party DLL
If the crash is inside a vendor DLL rather than your own module, you’ll need the vendor’s PDB to trace further (which is usually unobtainable). In practice, the realistic landing point is to trace back to the caller (the arguments your own code last passed) and check whether those values were sane. Check the STACK_TEXT field of !analyze -v to see what your code called just before the crash.
!analyze can also be used on dumps other than exception dumps. If you suspect a hang, select the thread in question and run the following, which analyzes blocking relationships between threads.
!analyze -hang
7. An option that skips WinDbg — dotnet-dump analyze
If you only need to examine managed code on .NET Core / .NET 5+ (with no native DLLs or COM involved), dotnet-dump is also an option — one that’s lighter than WinDbg.
dotnet tool install --global dotnet-dump
dotnet-dump analyze C:\CrashDumps\MyApp\MyApp_20260702_101500.dmp
The analyze subcommand opens an interactive session with SOS pre-installed, letting you use most of the commands introduced so far — clrstack, dumpheap, gcroot, and others — directly, without the ! prefix.4
Here’s a rough guide for choosing between the two:
| Aspect | WinDbg + SOS | dotnet-dump analyze |
|---|---|---|
| Native stack frames | Visible | Not visible (managed only)4 |
Automated native analysis via !analyze -v |
Available | Not available |
| Linux dumps | Can be analyzed with WinDbg on Windows (use the x64 build for x64 dumps, x64 build for Arm64 dumps, x86 build for x86 dumps) | Supported (use the tool matching the same platform’s bitness)14 |
| macOS dumps | Not supported (WinDbg’s Linux dump support does not include macOS) | Supported (.NET 5 and later)4 |
| Ease of setup | Installer or winget | One dotnet global tool command |
| Integrating into cross-platform CI | Takes some effort | Easier |
The practical dividing line in production: use WinDbg when “COM, P/Invoke, or native DLLs might be involved,” and use dotnet-dump analyze when it’s “a pure managed-code memory leak investigation that also needs to run in CI or across multiple platforms.” Since both share the same SOS command set, whatever you learn in one carries over almost directly to the other. Note that if you’re handling a dump captured on macOS, WinDbg isn’t even an option, so dotnet-dump (or LLDB) is your only choice.
8. Without readable symbols, nothing gets off the ground
Some of the procedures above work reasonably well even without symbols (PDBs) loaded correctly. As noted in Chapter 4, !clrstack, !dumpheap -stat, and !gcroot read CLR metadata and heap data directly, so frames and type information themselves are displayed even without a PDB. What’s lost when a PDB is missing is source file names and line numbers for managed code, and symbol names for native frames and native modules (addresses show up instead of function names). In situations where all you have is the dump and the executable, and all you want at first is just to grasp what happened, it’s perfectly fine to start with !threads → !clrstack before getting stuck hunting for PDBs. That said, if you want to trace down to the source line to pin down the root cause, that’s a different matter. In Chapter 2 we added the path to your own PDBs via .sympath+, but in practice, simply “not knowing where the PDB matching a distributed EXE/DLL lives” causes investigations to stall before reaching a source line, more often than the collection article’s discussion might suggest.
What a PDB actually contains, Portable PDBs, and Source Link (a mechanism that embeds source-control metadata into an assembly so a debugger can fetch the exact source at the commit the build came from) are all covered in one place in “What Is a PDB?.”15 If you’re building dump analysis into ongoing operations, retaining a PDB for every build and enabling Source Link is preparation just as important as the dump collection setup itself. Skip this, and !clrstack output will never show a source line, leaving you to grope around with nothing but addresses and type names.
9. A case study — dump analysis in a handle-leak investigation
A previous article, “Investigating a Long-Running Industrial Camera Crash — the Handle Leak Case,” covered an investigation into an industrial camera control application that crashed abruptly after long hours of operation, where the culprit turned out to be a handle leak rather than a memory leak. Dumps prove effective in this kind of investigation when the following combination holds:
- Use
!dumpheap -statto confirm the managed heap side is healthy (per-type counts and sizes aren’t growing without bound) - If the process’s handle count is still growing anyway, you can conclude the leak is not managed objects but OS handles (files, events, handles internally allocated by a camera SDK, and so on)
- If a managed wrapper object holding a
SafeHandleis still alive, trace its GC root with!gcrootto identify the place where a reference is being kept that should have been released
In other words, !dumpheap -stat functions as the branch point for judging whether it’s a managed-heap growth issue, and once you’ve established that it’s not, the focus of the investigation shifts to a native-boundary anomaly-detection tool like Application Verifier. How to build that kind of abnormal-condition test foundation is covered in “Building a Windows Abnormal-Condition Test Foundation with Application Verifier.” Dump analysis addresses “the state things are in right now,” while Application Verifier addresses “reproducing an anomaly ahead of time” — using both together is standard practice for troubleshooting long-running failures.
10. Summary
Reading a crash dump takes longer to master than collecting one, but there aren’t that many distinct patterns.
- Install WinDbg, and point the symbol path at both Microsoft’s public symbol server and your own PDBs (Chapter 2)
- Load the SOS extension for .NET applications (
.loadby sos clr/.loadby sos coreclr, or rely on automatic loading. Chapter 3) - Start with
!clrstack→!pefor an exception,!dumpheap -stat→!gcrootfor growing memory, and!analyze -vfor a native crash (Chapters 4–6) - If your task is purely a managed-code investigation, consider the lighter-weight
dotnet-dump analyze(Chapter 7)
And underlying all of this is PDB and symbol management. Deciding, at the same time you set up dump collection, to retain a PDB for every build and enable Source Link makes a huge difference to how long an investigation takes once an incident actually happens. If in-house analysis is difficult or you don’t have the time, feel free to send us the dumps and logs and we can analyze them for you.
Related articles
- An Introduction to Collecting Windows Crash Dumps — WER/ProcDump/WinDbg
- Designing Log and Dump Retention for Windows App Crashes
- Telling GC Backlog Apart from a Memory Leak in .NET
- What Is a PDB (Program Database)?
- Investigating a Long-Running Industrial Camera Crash — the Handle Leak Case
- Building a Windows Abnormal-Condition Test Foundation with Application Verifier
Related consulting areas
Komura Soft LLC handles root-cause investigations of defects that combine crash dumps and logs, isolating failures that only occur after long-running operation, and design consultations for the dump/PDB retention and analysis process itself.
References
-
Microsoft Learn, Install the Windows debugger. Covers installing WinDbg via winget or the Microsoft Store, supported OS versions (Windows 10 1607 and later, Windows 11) and architectures (x64, ARM64), and automatic update behavior. ↩ ↩2 ↩3
-
Microsoft Learn, Symbol path for Windows debuggers. Covers configuring the symbol path via the
_NT_SYMBOL_PATHenvironment variable, and setting a default path to the public symbol server with the.symfixcommand. ↩ ↩2 -
Microsoft Learn, SOS debugging extension. Covers how the SOS extension can be used to gather managed heap information, detect heap corruption, and display internal runtime data types, and that the syntax in WinDbg is
![command]. ↩ ↩2 -
Microsoft Learn, Dump collection and analysis utility (dotnet-dump). Covers that
dotnet-dump analyzeprovides an interactive session where SOS commands work directly, that it cannot display native stack frames since it isn’t a native debugger, and that macOS support requires .NET 5 or later. ↩ ↩2 ↩3 ↩4 -
Microsoft Learn, Microsoft public symbol server. Covers the
srv*DownstreamStore*https://msdl.microsoft.com/download/symbolssymbol path syntax, and configuring it with a local cache via.symfix. ↩ -
Microsoft Learn, Debugging Managed Code Using the Windows Debugger. Covers that the .NET Framework runtime is
clr.dlland the .NET Core/.NET 5+ runtime iscoreclr.dll, loading extensions from a nearby directory via.loadby, and automatic loading in WinDbg 10.0.18317.1001 and later. ↩ ↩2 -
Microsoft Learn, SOS installer (dotnet-sos). Covers installing the SOS extension locally via
dotnet-sos install, and the manual load command needed with older debugger versions. ↩ -
Microsoft Learn, SOS debugging extension - Commands. Covers that the
Threadscommand (aliasedclrthreadsin lldb environments) lists each thread’s ID, domain, last exception thrown, and more. ↩ -
Microsoft Learn, SOS debugging extension - Commands. Covers that the
CLRStackcommand displays a stack trace of managed code only, that the-aoption displays both local variables and arguments, and that symbols (SYMOPT_LOAD_LINES) only affect whether source file names/line numbers are shown, not whether frames themselves are displayed. ↩ ↩2 -
Microsoft Learn, SOS debugging extension - Commands. Covers that the
PrintException(pe) command displays the last exception thrown on the current thread when no address is given, and that-nestedalso displays nested exceptions. ↩ -
Microsoft Learn, Debug a memory leak in .NET and Dump collection and analysis utility (dotnet-dump) - Analyze memory leaks and allocations. Covers
dumpheap -stat’s per-type count/total-size statistics display, and how to proceed with an investigation from there. ↩ ↩2 -
Microsoft Learn, SOS debugging extension - Commands. Covers that the
GCRootcommand searches the entire managed heap and handle table and enumerates references (roots) to the specified object. ↩ -
Microsoft Learn, Using the !analyze Extension and !analyze (WinDbg). Covers automated crash/exception analysis via
!analyze -v, the meaning of output fields such asFAULTING_IPandMODULE_NAME, and!analyze -hangfor hang investigations. ↩ -
Microsoft Learn, Debug Linux dumps. Covers that Linux dumps can be analyzed on Windows using WinDbg or dotnet-dump, and that you need to use a tool version matching the bitness (x64/Arm64/x86) of the capture environment. ↩
-
Microsoft Learn, Source Link. Covers the mechanism by which source-control metadata is embedded into an assembly at NuGet package creation time, allowing a debugger to access the exact source code as of the build directly. ↩
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
What Is a PDB (Program Database)? — Understanding Debug Information, Symbols, and Source Link
What a PDB (Program Database) is, what it does and does not contain, and how it relates to Debug / Release, Portable PDBs, Source Link, s...
An Introduction to Collecting Windows Crash Dumps - WER/ProcDump/WinDbg
To chase hard-to-reproduce Windows application crashes, we walk through when to use WER LocalDumps, ProcDump, MiniDumpWriteDump, and WinD...
How to Think About Windows Session Isolation — Session 0, RDP, and Running Multiple Users Concurrently
This article untangles the concept of a Windows "session," a topic that consistently confuses Windows app developers. It covers why Sessi...
Preventing Multiple Instances of a Windows App — Named Mutexes and Activating the Existing Window on a Second Launch
This article organizes the classic requirement for business Windows apps — 'don't let the same app launch twice' — around a named Mutex. ...
Safely Calling Win32 APIs from C# — A Practical P/Invoke Guide (DllImport / LibraryImport / CsWin32)
A practical rundown of what to watch for when calling Win32 APIs and native DLLs from C# via P/Invoke. Covers the differences between Dll...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Bug Investigation & Long-Run Failures
Topic page for intermittent failures, communication diagnosis, long-run crashes, and failure-path test foundations.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.
Bug Investigation & Root Cause Analysis
We investigate difficult production issues such as intermittent failures, long-run crashes, leaks, and communication stoppages.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links