Checklist for Unexpected Exceptions - A Quick Decision Table for Whether to Exit or Keep Running
The bottom line first
When an unexpected exception happens, the question to ask is not “can it be caught?” but “can the app’s state still be trusted afterward?”
- Swallowing the exception with
catch (Exception)and continuing is almost always dangerous - Continuing is acceptable only when three conditions line up: the failure unit can be discarded, shared state can be rolled back, and external side effects can be explained
- Lean toward exiting if there is inconsistent shared state, a broken parent loop, an anomaly at a native boundary, or a smell of memory corruption
- Do not assume continuation for exceptions that put the whole process’s health in doubt, such as
StackOverflowExceptionorAccessViolationException
Three levels of choice
“Keep the app running” actually has three distinct levels.
| Choice | Meaning |
|---|---|
| Fail just this operation and continue | Keep the screen open but treat this save or import as failed |
| Stop just the subsystem and continue | Reinitialize only the connection, screen, or worker |
| Terminate the process | The blast radius of state damage is unreadable, so restart |
Decision table
| Situation | Choice | Reason |
|---|---|---|
| Only one operation, screen, or job failed and the state can be discarded | Lean toward continue | The failure unit can be contained |
| The target objects or connections can be disposed and recreated after the exception | Lean toward subsystem reinit | The damaged area can be localized |
| Shared state was partially updated and the scope of the change is unclear | Lean toward exit | Invariants may be broken |
| External side effects on DB, files, or devices were left half-done | Lean toward exit | External consistency cannot be reasoned about |
| A monitoring loop or parent loop died from an unexpected exception | Lean toward exit | When only some features die, zombification is likely |
| Failure during startup, configuration loading, or DI composition | Fail startup and exit | Booting halfway is more dangerous |
| AccessViolationException, StackOverflowException, severe OutOfMemoryException | Lean toward immediate exit | The whole process’s health is in doubt |
| The dangerous work is isolated in a separate process and the parent is intact | Continue the parent, restart the child | The failure domain is properly separated |
Decision flowchart
Unexpected exception occurs
-> Smell of memory corruption / stack exhaustion / resource exhaustion?
-> YES -> Exit / FailFast / restart
-> NO -> Can the failure unit be discarded?
-> NO -> Lean toward exit
-> YES -> Can shared state be rolled back / reinitialized?
-> NO -> Stop the subsystem or exit
-> YES -> Can external side effects be explained?
-> NO -> Stop the subsystem or exit
-> YES -> Mark just this operation as failed and continue
Look at this before the exception type
- Where it happened: A UI event? A parent loop? Startup code? A native boundary?
- How far it got: Has memory state, the DB, files, or device state changed midway?
- The blast radius: Just that object, the whole screen, or the whole process?
- Whether rollback is possible: Can it be discarded and rebuilt, or rolled back via a transaction?
- External side effects: Was it sent or not, and is a duplicate execution safe?
- Monitoring and restart: Is there an automatic restart or recovery path after the crash?
High-risk exceptions
| Exception | Choice | Reason |
|---|---|---|
| StackOverflowException | Lean toward immediate exit | The call stack is broken |
| AccessViolationException | Lean toward immediate exit | Possible memory corruption |
| OutOfMemoryException | Lean toward exit | Recovery code that needs more allocations is unreliable |
| NullReferenceException / InvalidOperationException (unexpected) | Depends on context, but lean toward exit | Your assumptions are broken; partial changes may remain |
| Unexpected exception that escaped the parent loop | Lean toward exit | Risk that the core died but the process lingers |
| Anomalies at COM / P/Invoke boundaries | Immediate exit to strong lean toward exit | Hard to judge safety from managed code alone |
Decisions by scenario
UI events (button click, screen navigation)
Conditions that favor continuing:
- Failed before loading and business state has not been touched
- Only transient state inside a dialog is broken, and closing the screen discards it
- The ViewModel or connection can be rebuilt after the exception
Conditions that favor exiting:
- Both screen state and domain state were partially updated
- Shared state such as static, singleton, or cache was touched
Per-item job processing (messages, files, requests)
This is the easiest place to continue. Mark just that one item as failed and move on. However, you still need a transaction or compensation for partial changes.
Resident loops, monitors, queue workers
The most dangerous place. The scary case is the parent loop dying from a single unexpected exception while the process lingers.
- At each item-processing boundary, catch expected exceptions
- If an unexpected exception escapes the parent loop, lean toward terminating the process
Startup
Required configuration unreadable, migration failure, missing required folders or certificates -> fail startup and exit.
Native boundaries (COM, P/Invoke, unsafe)
Apply stricter rules. AccessViolationException, heap corruption, and handle anomalies should lean toward immediate exit.
Conditions that allow continuation
- The failure unit is clear (one operation, one screen, one job)
- State can be discarded (disposable / recreatable)
- Shared state is protected (no contamination of other features)
- External side effects can be explained (sent / not sent / safe to resend is known)
- You can be honest with the user
- It is observable (traceable via logs, metrics, dumps)
Common anti-patterns
catch (Exception)that only logs and continues: Hides the root cause and keeps the broken state alive- Trying to recover inside an unhandled exception handler: It is useful as a last-chance record, but not a recovery point
- Casual retry despite external side effects: Recipe for double-execution incidents
- Leaving the UI alive after the monitoring loop dies: A zombie app that only looks alive
- Saying “I don’t want it to crash” without designing for crashing: Automatic restart, session restore, and idempotency come first
Summary
Decision order for unexpected exceptions:
- Can the failure unit be discarded?
- Can shared state be rolled back or rebuilt?
- Can external side effects be explained?
- Can the health of memory, threads, and native boundaries be trusted?
Confident on all four -> continue. Not confident -> exit. In long-running apps, staying alive while broken is often more dangerous than crashing honestly.
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Where to `catch`, log, and handle exceptions — sorting out call-hierarchy boundaries and responsibilities for real-world code
A practical breakdown of where in the call hierarchy you should catch exceptions, where the primary log belongs, and where to decide betw...
A Minimum Security Checklist for Windows Application Development
A minimum security checklist for Windows app development (WPF, WinForms, WinUI, C++, C#) covering privileges, signing, secrets, communica...
When You Can't Avoid Building Your Own Logger: Practical Minimum Requirements and Integration Test Checks
When you have no choice but to build a custom application logger, here are the minimum requirements to lock down first and the integratio...
How to Reliably Capture Logs When a Windows App Crashes from a Programming Bug - Designs That Don't Bet on In-Process Logging, Plus Best Practices for WER, Final Markers, and Watchdogs
How to keep evidence when a Windows app crashes from an unexpected exception: how routine logs, a final crash marker, WER LocalDumps, and...
Why Windows Code Should Prefer Event Waits Over Timer Polling - Avoiding ~15.6 ms Granularity
Sleep and timed waits on Windows are bound to a clock granularity of about 15.6 ms, so they are rarely as precise as they look. Here is w...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.
Bug Investigation & Root Cause Analysis
We investigate difficult production issues such as intermittent failures, long-run crashes, leaks, and communication stoppages.