Where Exceptions Should Be Caught, Logged, and Handled - A Practical Guide to Boundaries and Responsibilities in the Call Hierarchy

· · Exception Handling, Logging, Architecture, Reliability, Windows Development

Contents

  1. Table of Contents
  2. 1. The short answer
  3. 2. Catching, logging, and error handling are different things
    • 2.1. Catching
    • 2.2. Logging
    • 2.3. Error handling
    • 2.4. Translating exceptions
  4. 3. The first decision table to look at
  5. 4. What to do at each level of the call hierarchy
    • 4.1. The deepest helper / utility / private method
    • 4.2. External I/O boundaries: Repository / Gateway / SDK wrapper
    • 4.3. Application Service / UseCase
    • 4.4. UI / HTTP / Job / Message boundaries
    • 4.5. The final unhandled-exception handler
    • 4.6. Looking at one call hierarchy end to end
      • Save button → SaveOrderUseCase → PaymentGateway → HTTP
  6. 5. Separate expected failures from unexpected exceptions
  7. 6. Where to log, and how many times
  8. 7. Common anti-patterns
    • 7.1. catch (Exception) in a deep layer that returns null / false
    • 7.2. Writing Error in every layer and then rethrowing
    • 7.3. Library layers or shared components directly showing UI
    • 7.4. Logging OperationCanceledException as an outage
    • 7.5. Retrying lightly when there are external side effects
    • 7.6. Trying to recover from everything in the final unhandled-exception handler
  9. 8. Review checklist
  10. 9. A rough quick guide
  11. 10. Summary
  12. 11. References
  13. 12. Related Articles

Table of Contents

  1. The short answer
  2. Catching, logging, and error handling are different things
    • 2.1. Catching
    • 2.2. Logging
    • 2.3. Error handling
    • 2.4. Translating exceptions
  3. The first decision table to look at
  4. What to do at each level of the call hierarchy
    • 4.1. The deepest helper / utility / private method
    • 4.2. External I/O boundaries: Repository / Gateway / SDK wrapper
    • 4.3. Application Service / UseCase
    • 4.4. UI / HTTP / Job / Message boundaries
    • 4.5. The final unhandled-exception handler
    • 4.6. Looking at one call hierarchy end to end
  5. Separate expected failures from unexpected exceptions
  6. Where to log, and how many times
  7. Common anti-patterns
  8. Review checklist
  9. A rough quick guide
  10. Summary
  11. References
  12. Related Articles

1. The short answer

  • As a rule, do not broadly catch in deep layers. Put catch closer to the boundary where you can define the unit of failure.
  • For logs, the default should be one primary log for one failure. If every layer keeps writing Error for the same exception, the reader loses.
  • The responsibility of the deepest layer is cleanup, local rollback, exception translation, and limited retry only when appropriate. If it rethrows, it normally does not write the primary log there.
  • Processing boundaries such as one screen action, one HTTP request, one job run, or one message handling unit are usually the most natural places for the primary log.
  • Expected failures should be converted into results at the unit of the use case. You do not have to keep throwing everything upward as an exception forever.
  • AppDomain.UnhandledException, WPF’s DispatcherUnhandledException, WinForms ThreadException, ASP.NET Core exception handlers, and the host’s final exception handling are the last place to record, rather than the main place to recover.
  • User cancellation or shutdown-related OperationCanceledException is normally not treated as Error.
  • When in doubt, check in this order.
    1. Can this place really make the decision?
    2. Is the failed unit visible here?
    3. Can the state be rolled back or rebuilt here?
    4. If I log here, will the same exception also be logged above?

The core idea is simple: do not catch where you merely can catch; catch where you can make a responsible decision.

2. Catching, logging, and error handling are different things

2.1. Catching

catch means receiving an exception once and changing control flow because of it.
But that is not recovery by itself.

For example, even if a lower method catches an exception, it may still not know:

  • what should be shown to the user
  • whether the whole screen should stop or only this operation should fail
  • whether the current request or job may continue

If it cannot answer those questions, that place is often not a good place for catch.

2.2. Logging

Logging is not only about recording that an exception happened. It is about recording which piece of work failed so you can trace it later.

That is why good logging points usually have one or more of these:

  • requestId / traceId
  • userId
  • orderId / fileId / batchId
  • which input item failed
  • which UI action it was
  • which queue or which message it was

Deep helpers and common functions often know the technical detail but do not have that operational context.
So the place that knows the technical details and the place that knows the operational context are often different places.

2.3. Error handling

Here, “error handling” means things like:

  • showing an error message on the screen
  • returning 4xx / 5xx in HTTP
  • failing just one item and moving on to the next
  • reinitializing a subsystem
  • stopping the process and letting restart policy take over
  • releasing resources and exiting safely

In other words, it means deciding the visible shape of failure for the caller or the user.

2.4. Translating exceptions

In real systems, there is one more important step between catch and “handle it.”
That step is translation.

For example:

  • HttpRequestException
  • IOException
  • JsonException
  • database-driver-specific exceptions
  • vendor-SDK-specific exceptions

If those leak directly into a UI or Controller, higher layers start learning lower-layer implementation details.

So at a boundary, it is often better to translate them into failures that make sense at that layer, such as:

  • “Could not connect to the payment service”
  • “The CSV format was invalid”
  • “Could not write to the save destination”
  • “The device response was invalid”

The key point is that translation is not the same thing as logging.
If you only translate and rethrow, that is normally not the place for the main log.

3. The first decision table to look at

It becomes much easier to stay organized if you first decide the broad policy with a table like this.

Place Basic policy Primary log Main responsibility
helper / utility / private method As a rule, do not broadly catch none cleanup in finally, local rollback, minimum context addition
Repository / Gateway / SDK wrapper catch only specific exceptions usually no exception translation, limited retry, disposing broken connections or handles
Application Service / UseCase turn expected failures into results if swallowed, only as needed here define the unit of failure, allow partial failure, make use-case-level decisions
UI / Controller / API / Job / Message boundary main receiver of unexpected exceptions often here user response, HTTP response, continue-next-item or abort decision
unhandled-exception handler / host final boundary last line of defense for missed cases Critical final recording, flush, dump, exit / restart path

Visually, it usually looks like this:

flowchart TD
    A["An exception happened"] --> B{"Can this place decide retry / result conversion / continue-or-stop?"}
    B -- "No" --> C["As a rule, do not catch it here; send it upward"]
    B -- "Yes" --> D{"Is this a layer boundary?"}
    D -- "No" --> E["Only local cleanup"]
    D -- "Yes" --> F["Translate into a meaningful exception if needed"]
    E --> G{"Can the unit of failure and operational context be identified here?"}
    F --> G
    G -- "No" --> H["Do not write the primary log here; send upward"]
    G -- "Yes" --> I["Write one primary log and decide the response"]
    I --> J["Exit / reinitialize / continue next item if needed"]

There are two key ideas in that diagram.

  1. The first reason to catch is recovery or cleanup, not logging
  2. The first reason to log is that the operational context is complete, not simply that you noticed an exception

4. What to do at each level of the call hierarchy

4.1. The deepest helper / utility / private method

At this level, the default rule is do not catch broadly.

For places such as string conversion, parsing, calculations, internal formatting, or shared helpers, the code usually cannot decide:

  • what screen action this belonged to
  • what request it belonged to
  • whether only this operation should fail
  • whether the whole screen should stop

What this layer is allowed to do is mostly:

  • release resources in finally
  • roll back local state that it partially broke
  • add only minimal context to the exception message
  • replace it with a more suitable exception type
  • dispose objects that are no longer reusable

Things better avoided here are:

  • catch (Exception) and return null / false / an empty array
  • show a MessageBox here
  • write an Error log here and then rethrow
  • “just continue somehow” when the state cannot really be restored

The especially dangerous pattern is continuing to use an object after a failure that happened halfway through mutating its internal state.
If the state can be restored locally, restore it. If not, move to a discard-and-recreate assumption.

4.2. External I/O boundaries: Repository / Gateway / SDK wrapper

This is a layer where the reasons to catch are much clearer.

Why? Because implementation-specific details from lower layers surface here:

  • database-driver exceptions
  • HTTP communication exceptions
  • file I/O exceptions
  • COM / P/Invoke / vendor SDK specific exceptions
  • parser or serializer exceptions

Typical responsibilities at this layer are these four:

  1. Catch specific exceptions
    Catch meaningful concrete exceptions rather than a wide Exception.

  2. Translate them into meaningful failures
    So that upper layers do not need to know lower-layer implementation details directly.

  3. If local retry is appropriate, do it here
    But only under fairly strict conditions:
    • the failure is known to be transient
    • the operation is idempotent
    • retry count and delay policy are defined
    • final behavior after failure is clear
      Retry belongs here only when those conditions are met.
  4. Throw away broken connections or handles
    In many cases, “recreate the connection” is safer than “keep using the same object next time.”

For logging at this layer, the following policy helps avoid confusion:

  • If you rethrow upward, normally do not write the primary log here
  • If this layer swallows the exception and converts it to a result, then this layer owns the necessary log or metric
  • During retry, keep individual attempts in Debug / Information / Warning, and record only the final failure more strongly

This layer is usually where translation happens, not where the final visible decision is made.

4.3. Application Service / UseCase

This is the layer that decides how this unit of work should fail.

Examples include:

  • a save operation
  • finalizing an order
  • importing a CSV
  • processing one batch item
  • applying one message

These are coherent use-case-level units.

This layer can make decisions such as:

  • a validation error should fail only this time
  • NotFound should become something equivalent to 404
  • a business-rule violation should be returned for user correction
  • one invalid CSV row should be recorded as Warning and processing should continue
  • a temporary external-service failure should fail the whole operation
  • partial work should be discarded and retried from the beginning

In other words, this is where the unit of failure can often be defined.

Typical good uses for this layer are:

  • converting expected failures into Result objects or failure DTOs
  • aggregating partial failures
  • deciding how many failures may be tolerated before continuing
  • converting to error codes or user-message keys

What this layer should usually avoid is bringing in too much UI rendering or HTTP response body construction.
It is cleaner if this layer decides use-case meaning, and leaves final presentation to the outer boundary.

4.4. UI / HTTP / Job / Message boundaries

This is where the primary log point often lives in real applications.

Examples include:

  • one click on a Save button in WinForms / WPF
  • one HTTP request in ASP.NET Core
  • one worker message
  • one input item in a batch
  • one scheduled job run

This location knows:

  • what operation it was
  • who initiated it
  • which item number it was
  • which request / batch / message it was
  • what should be returned to the user or caller if it fails

That makes it a natural place to:

  • catch unexpected exceptions broadly
  • write one primary log with context
  • convert into an error dialog, HTTP 500, Problem Details, job failure, or continue-next-item behavior

The important point is not merely that it catches broadly, but that what should be returned after catching is already defined here.

For batch or queue processing, it often helps to separate two levels:

  • Catch at the one-item boundary
    Decide whether one failed item should be skipped and the next item should continue
  • Do not broadly swallow at the parent loop
    If the parent loop dies from an unexpected exception, prefer process-level restart

“Fail one item and continue” and “the parent loop survives every unexpected exception silently” are very different designs.

4.5. The final unhandled-exception handler

This is the last line of defense.
It is not a magical recovery point.

Typical examples are:

  • AppDomain.UnhandledException
  • WPF Application.DispatcherUnhandledException
  • WinForms Application.ThreadException
  • ASP.NET Core exception middleware or handlers
  • the final exception handling around Generic Host / worker / BackgroundService

Its main responsibilities are things like:

  • final logging
  • flushing
  • arranging dump collection
  • storing session or near-last context
  • setting exit codes and restart paths

It is also better not to expect too much from it:

  • by the time an exception reaches here, it is often already a design miss above
  • the state may already be corrupted
  • locks may still be held, so heavy work is dangerous
  • even if the app appears able to continue, that does not mean it is safe to continue

There are also practical .NET-specific points worth keeping in mind:

  • AppDomain.UnhandledException is for notification and recording of an unhandled exception. Putting too much recovery logic after that point is risky.
  • In WPF DispatcherUnhandledException, there is a path where Handled = true keeps the app alive, but the first question is whether recovery is actually safe.
  • WinForms ThreadException can also leave the application in an unknown state even after handling.
  • ASP.NET Core exception middleware needs to be placed early enough in the pipeline to receive downstream exceptions.
  • Since .NET 6, an unhandled exception in BackgroundService is logged and by default tends toward stopping the host. In many cases, stopping and letting restart policy take over is safer than broadly swallowing everything in the parent loop.

Desktop applications in particular often do have a path that appears to “catch and continue” after an unhandled exception.
But being able to continue and being safe to continue are not the same thing.

4.6. Looking at one call hierarchy end to end

For example, imagine a flow like this:

flowchart LR
    A["UI / Controller / Job boundary"] --> B["Application Service / UseCase"]
    B --> C["Domain / business logic"]
    C --> D["Repository / Gateway / SDK wrapper"]
    D --> E["DB / HTTP / File / Vendor SDK"]

In that case, the roles usually separate roughly like this.

Save button → SaveOrderUseCasePaymentGateway → HTTP

  • PaymentGateway
    • catches communication failures and invalid response formats
    • translates them into something like “payment service connection failed” or “invalid payment service response”
    • performs retry here only when the conditions justify it
    • if it rethrows, it normally does not write the primary log
  • SaveOrderUseCase
    • turns expected failures such as payment rejection into results
    • treats the failure as “only this order finalization failed”
    • shapes the failure so UI or API layers can return it cleanly
  • UI button handler / Controller
    • broadly catches unexpected exceptions
    • writes the primary log with orderId, userId, and requestId
    • converts the failure into a dialog, 500, or 503 response
  • Unhandled-exception handler
    • records only what leaked that far
    • performs dump collection or final flush
    • prioritizes the exit path rather than recovery

With that split, technical details stay closed lower down, operational context is attached higher up, and decisions are made at boundaries.

5. Separate expected failures from unexpected exceptions

The most important thing in this whole topic is not treating everything as the same kind of “exception.”

It helps to separate them roughly like this:

Kind of failure First place to handle it Typical treatment
validation issue UseCase / request boundary return as an input error
NotFound / Conflict UseCase / Controller 404 / 409 or screen message
user cancel / shutdown operation boundary cancellation; usually not an Error
one invalid CSV row one-item boundary record as Warning and continue
transient timeout that still ends in failure I/O boundary to request boundary fail after retry
NullReferenceException, broken assumptions request / job boundary primary log and failure response
AccessViolationException, severe OutOfMemoryException, or native-boundary corruption smell final boundary treat as Critical and move toward shutdown

Expected failures are failures you can design for ahead of time.
Unexpected exceptions are failures where it is questionable whether the state should still be trusted afterward.

Separating those two alone reduces problems like:

  • logging NotFound as Error every time
  • treating user cancellation as an outage
  • letting truly dangerous broken-assumption failures continue as if they were only “this request failed”

6. Where to log, and how many times

When designing logs, it is often more important to decide who owns the primary log than to decide the exact place of catch.

The basic rules are:

  1. One primary Error / Critical log for one failure
  2. Lower layers add translation and context only when needed
  3. Upper boundaries write the primary log with the unit of failure and operational context
  4. Only a layer that swallows the failure fully owns the responsibility to record that swallowed failure
  5. Expected failures should not always become Error
  6. OperationCanceledException should be separated from ordinary failure logs

A rough table of logging points looks like this:

Situation Main logging place Typical level Note
validation error request / use-case boundary Information or no log not an outage, but a contract failure
user cancel / shutdown operation boundary Debug / Information normally not Error
transient failure during retry the layer that owns retry Debug / Warning do not make noise before final failure
failure after all retries are exhausted request / job boundary, or the layer that swallows it Warning / Error record with the unit of failure
one bad row and continue item boundary Warning include fileId and rowNumber
unexpected exception that kills a whole request request / UI / job boundary Error include requestId, userId, entityId
process-ending severity unhandled-exception boundary Critical flush, dump, restart path

In practice, one of the most common problems is duplicate logging like this:

  • Repository writes Error
  • Service writes Error for the same exception
  • Controller writes Error again
  • the final unhandled-exception handler also writes Critical

Then one outage produces multiple copies of the same stack trace.
What the operator actually needs is not four copies of the same stack trace, but one primary log and, at most, a small number of supporting logs.

Another way to say it is: one log, as much context as needed.

7. Common anti-patterns

7.1. catch (Exception) in a deep layer that returns null / false

This tends to erase the real cause.
It also makes the caller unable to tell whether “the data truly was not there” or “something broke halfway through.”

7.2. Writing Error in every layer and then rethrowing

This is one of the biggest causes of duplicate logs.

If you split responsibilities into:

  • lower layers translate
  • upper boundaries write the main log

the noise drops a lot.

In C#, if you rethrow, the basic form is throw; so you do not damage the stack trace.

7.3. Library layers or shared components directly showing UI

If a shared component opens a MessageBox or directly decides an HTTP response body, both reuse and responsibility boundaries collapse.
Lower layers are safer when they stop at returning or throwing a meaningful failure.

7.4. Logging OperationCanceledException as an outage

Cancellation is part of control flow.
If you write Error every time, real failures get buried.

7.5. Retrying lightly when there are external side effects

For things like email sending, billing, device commands, or file moves, doing the same operation one more time can easily cause damage.
Retry belongs only where transient failure and idempotency are both visible.

7.6. Trying to recover from everything in the final unhandled-exception handler

That place is only the last insurance policy.
It should not become the center of the design.

Recovery strategy is usually safer when it lives one step earlier, at the request / job / subsystem boundary.

8. Review checklist

When reviewing exception handling, it helps to look through the following in order:

  • Can this catch be explained in one sentence as what decision it exists to make?
  • Can this place really decide retry, result conversion, continue-or-stop behavior, or user response?
  • If it logs here, will the same failure also be logged as Error above?
  • Are lower-layer-specific exceptions translated into meaningful failures at the boundary?
  • Can this place restore corrupted state? If not, is discard-and-recreate the assumption?
  • Is OperationCanceledException separated from normal failure?
  • Is it clear whether continuation happens per item, per request, or only after process restart?
  • Is the final unhandled-exception handler treated as a recording point rather than a recovery point?
  • Does the log include the failure-unit context such as requestId, userId, batchId, fileId, or rowNumber?
  • Are “expected failures” and “broken assumptions” being treated differently?

The especially effective question is: “What exactly does this catch decide?”
If that cannot be answered clearly, the catch is often unnecessary or too deep.

9. A rough quick guide

Finally, in a much shorter form, the split usually looks like this:

Situation catch Logging Error handling
helper / utility usually no no no
Repository / Gateway / SDK wrapper catch only specific exceptions usually not the primary log translation, local retry, disposing connections
UseCase / Application Service catch expected failures only if swallowing as needed result conversion, partial failure handling
UI / Controller / request / item / job boundary catch unexpected exceptions broadly primary log response, message, continue / abort
unhandled-exception handler only what leaked through Critical final recording, exit path

When you are unsure, these five rules are usually enough:

  1. Do not broadly swallow in deep layers
  2. Catch at boundaries
  3. Write the primary log once
  4. The layer that swallows owns the responsibility
  5. The final unhandled exception is for recording and exit routing

10. Summary

Exception handling is not a story of “you can catch anywhere, so you should catch anywhere.”

In practice, this order of questions is usually enough:

  1. Can this place really make the decision?
  2. Is the unit of failure visible here?
  3. Can the state be restored or rebuilt here?
  4. Will logging here create duplicates?
  5. Is this a recovery point, or only the last recording point?

Once you look in that order, the call hierarchy becomes much easier to organize.

The three most important ideas are these:

  • deep layers mainly translate and clean up
  • boundaries mainly decide and write the primary log
  • the final unhandled-exception handler mainly records and routes termination

Put differently,
exceptions should be caught at boundaries, enriched with context, and only fully handled where recovery is actually possible.

Once that is decided, both code reviews and incident investigations become much less inconsistent.

11. References

Related Articles

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Related Topics

These topic pages place the article in a broader service and decision context.

Back to the Blog