Which language is fastest: C#, C++, Java, or Go?

There is no single answer, and trying to decide a winner with one number is the worst thing you can do. CPU computation, memory allocation, parallel processing, and startup time each make different languages and runtimes look strong. C++ tends to shine in tight loops, Go looks favorable on startup time and distribution, and C# and Java can catch up or overtake at steady state once the JIT kicks in. Real-world performance is determined by the combination of language, runtime, libraries, build conditions, data, OS, and hardware, so the meaningful question is which workload, under which conditions, on which metric.

Why do C# and Java need warm-up before benchmarking?

C# and Java are normally JIT-compiled, so a first run measures not only the program's speed but also runtime startup, class loading, and JIT preparation, while C++ and Go are compiled ahead of time. Mixing comparisons that include the first run with steady-state comparisons after warm-up twists the whole discussion. Treat cold and warm as separate measurements: cold matters for CLI tools and short-lived batch jobs, warm matters for servers, resident processes, and long-running work.

What benchmark tools should I use for each language?

For per-language measurement, use the harness suited to that language: BenchmarkDotNet for C#, JMH for Java, go test -bench with benchstat for Go, and Google Benchmark for C++. These absorb each runtime's quirks and handle statistics. For cross-language comparison, however, placing BenchmarkDotNet results next to JMH results is risky because the harnesses follow different conventions; instead, turn each implementation into an executable with the same CLI contract and drive them all from a common external runner under identical conditions.

What are the most common mistakes in cross-language benchmarks?

The classics are: mixing Debug and Release builds; not solving the same problem (different inputs, outputs, or error handling); running once and drawing a conclusion when a single run is mostly noise from JIT, heat, GC, or background tasks; skipping correctness checks, when every implementation should produce the same checksum from the same input; and in C++, letting the optimizer delete the work entirely so the code is fast because it is doing nothing. Also look at the median and distribution rather than the mean, and record the experimental conditions along with the numbers.

C# vs C++ vs Java vs Go: How to Benchmark Fairly

“C++ is supposed to be fast.” “Go is lightweight in production.” “Java gets really fast on long-running workloads.” “C# is surprisingly strong too, thanks to the .NET JIT.”

You hear claims like these all the time. But the single worst thing you can do here is take numbers measured by different people in different environments, line them up, and declare a winner among languages.

C# and Java are heavily affected by JIT and warm-up, while C++ and Go are normally compiled ahead of time. The presence and characteristics of GC differ too. Differences in standard library and ecosystem library implementations matter quite a lot. And even on the same machine, results easily wobble due to power settings, heat, background activity, and skew in the input data. It is a rather messy world.

In this article, we lay out how to measure C# / C++ / Java / Go as fairly as possible. To give away the conclusion up front: the most important thing is not trying to decide “which language is fastest” with a single number.

The main subject of this article is, strictly, how to structure the comparison. Lining up environment-dependent numbers that merely look plausible turns into fortune-telling, so we will not publish a measured ranking here. Instead, we focus on how to design the comparison so it actually has value.

The Conclusion First

In a C# / C++ / Java / Go speed comparison, these seven things are what really matter.

Decide first what kind of speed you want to compare Whether it is startup time, steady-state throughput, p95 latency, or memory efficiency changes how you measure.
Never draw conclusions from a single benchmark CPU computation, memory allocation, parallel processing, and startup time each make different languages and runtimes look strong.
Separate cold and warm for C# and Java Mixing comparisons that include the first run with steady-state comparisons after warm-up twists the whole discussion.
Measure with the same algorithm, the same input, and the same correctness check “It wasn’t a faster implementation, it was just solving a different problem” is a classic benchmark failure.
Separate per-language microbenchmarks from cross-language end-to-end benchmarks Each language’s dedicated harness is convenient, but cross-language comparisons are better run by a common external runner.
Look at the median and the distribution, not just the mean A single GC pause or background task hitting one run is enough to wreck the average.
Record the conditions, not just the numbers A benchmark result is a record of the experimental conditions just as much as a record of speed. Results without documented conditions become quite painful later.

What to Decide First

If you let “fast” be a single word, things usually go wrong. Start by deciding what you will call fast.

Even for the same program, what you want to look at can differ considerably.

1. Do you want to look at startup time?

For CLI tools, short-lived batch jobs, and helper tools that start once and exit immediately, cold start and process startup matter. On this axis, results change dramatically depending on whether JIT and class-loading initialization costs are included.

2. Do you want to look at long-running throughput?

For servers, resident processes, workers, and long-running conversion jobs, steady-state throughput is what counts. In that case, being slow only on the first run is not the point; the question is how stable and how high it gets after warm-up.

3. Do you want to look at tail latency?

For APIs, UI, and near-real-time processing, p95 / p99 can matter more than the mean. Even if the average is fast, occasional long stalls hurt from a user experience and SLA perspective.

4. Do you want to include memory efficiency?

If you only look at CPU time and ignore peak RSS, allocation volume, GC count, and GC pauses, you will misjudge the real operational weight. “Fast but eats a lot of memory” versus “a bit slower but consistently lightweight” can flip in ranking depending on the use case.

In short, the question you should settle first is

What this comparison should answer is not which language is fast, but which workload, under which conditions, on which metric, can be processed faster.

If you start collecting numbers while this is still vague, nothing will hold together at the end.

Why Comparing Languages Is Hard

Mixing JIT and AOT turns it into a different experiment

C# and Java are normally affected by JIT. C++ and Go, on the other hand, are normally compiled ahead of time.

That means if you measure the first run, you are measuring not only the speed of the program itself but also runtime startup, class loading, and JIT preparation. Conversely, if you only look at fully warmed-up runs, the comparison becomes how far steady-state optimization can go.

Both are meaningful. But they do not mean the same thing.

Implementation differences routinely outweigh language differences

Even for the same “sort”,

one side uses the standard library
one side is hand-rolled
one side does extra copies
one side regenerates the input every time

That alone changes results considerably.

Moreover, once you get into JSON, compression, cryptography, or regular expressions, library implementation differences matter far more than the language itself. Unless you make explicit what you are measuring, what you intended as a “language comparison” becomes a “library comparison”.

C++ has a trap where optimization deletes the work

Especially in microbenchmarks, when the compiler decides “nobody is using this result”, it can eliminate the computation entirely. Then it is not that the code is fast — it is that it is doing nothing at all, which makes for a small horror story.

This problem tends to show up especially blatantly in C++, so consuming the result, printing a checksum, or using the benchmark framework’s optimization-suppression facilities is quite important.

GC is neither an “advantage” nor a “disadvantage” — it is a characteristic

C#, Java, and Go have a GC. Reducing this to “it has GC, therefore it is slow” is far too crude.

In practice, what matters more is

how large numbers of short-lived objects are handled
heap size configuration
GC frequency and pauses
object layout
the allocation habits of libraries

Conversely, C++ allows fine-grained control via manual management and RAII, but that means design and implementation differences show up more easily. In other words, a difference in memory management strategy is not, by itself, a verdict of good or bad.

What Not to Do in a Comparison

1. Mixing Debug and Release

This is out of the question. Always align the comparison targets on production-grade optimized builds.

2. Not solving the same problem

Different input formats, different output, error handling present on only one side, different memory reuse policies. Leave these unaddressed and you end up measuring requirement differences, not speed.

3. Running once and drawing a conclusion

A single run is mostly noise.

JIT
page cache
CPU boost
heat
background tasks
GC
first-time file reads

All of these mix together in a single run.

4. Blurring warm-up

When measuring C# and Java, if you are vague about whether the first run is included or only post-warm-up runs count, the discussion collapses. Treat cold and warm as separate things.

5. Skipping correctness checks

Before “fast”, a benchmark needs “returns the same result”. Always verify that every implementation under comparison produces the same checksum or the same output from the same input.

6. Building a worldview from a single microbenchmark

Winning a tight loop does not mean winning across a real service. Conversely, losing on startup time can still mean being plenty strong on long-running workloads.

The Basic Approach When Comparing C# / C++ / Java / Go

This part is quite important. Our recommendation is a two-layer structure.

1. For per-language measurement, use the harness suited to that language

Each language has benchmark tools that absorb that language’s particular quirks.

C#: BenchmarkDotNet
Java: JMH
Go: go test -bench and benchstat
C++: Google Benchmark

These take care of each runtime’s quirks, the statistical processing, and the common measurement traps to a reasonable degree. They are quite effective for comparisons within a language and for drilling into an implementation.

2. For cross-language comparison, put a common runner on the outside

On the other hand, placing C# BenchmarkDotNet results next to Java JMH results as-is is a bit dangerous. The harnesses themselves follow different conventions.

So for cross-language work, we recommend turning each implementation into an executable that can be invoked with the same CLI contract and driving them all from outside under identical conditions.

For example, prepare an executable of this shape in each language.

bench --scenario sort_int32 --dataset data/sort_10m.bin --mode warm
bench --scenario group_words --dataset data/words_100mb.txt --mode cold
bench --scenario parallel_hash --dataset data/blob_1gb.bin --threads 8

Then, on the common runner side,

randomize the execution order
separate cold / warm
pass the same dataset
verify the checksum
collect wall-clock time and memory
keep the raw data in CSV / JSON

This makes it much easier to handle best practices within each language and cross-language fairness as separate concerns.

Concrete Example: What Benchmark Scenarios to Prepare

When asked to compare C# / C++ / Java / Go, our recommendation is: if you can only run one, pick a simple CPU-bound scenario that is hard to misinterpret; if you can run several, prepare 3-4 workloads with different characters.

Recommended lineup

1. `sort_int32_10m`

Purpose: observe CPU + memory bandwidth + use of temporary storage

Input: 10 million int32 values generated with a fixed seed
Processing: sort the array and return a checksum
Caveat: restore the same unsorted input every iteration

This one is relatively easy to understand. However, it includes differences in standard sort implementations, so it is a comparison including the standard library rather than the language itself.

2. `hash_group_count`

Purpose: observe hash tables, string processing, allocations, and GC tendencies

Input: a fixed text dataset
Processing: count occurrences of each word
Output: top N entries plus a checksum

This is close to real-world work, but string library and map implementation differences also matter considerably. In exchange, it is a more realistic comparison.

3. `parallel_sha256`

Purpose: observe parallelism, the scheduler, worker pools, and synchronization habits

Input: a sequence of fixed-size binary chunks
Processing: hash them across N threads and return a final checksum
Conditions: step the thread count through 1 / 2 / 4 / 8 and so on

Compared to a simple tight loop, this makes scaling behavior under parallel execution much easier to see.

4. `startup_noop` or `startup_parse_small`

Purpose: observe startup time

noop: start and exit immediately
parse_small: process one small input and exit

Here the JIT and initialization costs of C# / Java become visible, and the picture differs quite a bit from C++ / Go. Conversely, even if a gap shows up here, it is a separate question from who wins on long-running workloads.

What about JSON and HTTP benchmarks?

JSON and HTTP are close to real-world work, so of course they are meaningful. However, in that case it becomes a comparison including libraries, frameworks, and the ecosystem, rather than a language comparison.

That in itself is not bad. In fact, in practice that is often the more important question. But in an article or report, it causes less misunderstanding if you state explicitly:

This is not a comparison of languages but a comparison of typical implementations together with the major libraries.

Conditions to Align per Language

C++

Align on optimized builds
Pin the compiler
Pin the standard library implementation
Document conditions such as -O3 / /O2, LTO, and PGO
Watch out for results being optimized away
Suspect undefined behavior whenever something looks suspiciously fast

C++ has a lot of freedom, which means differences in conditions show up directly. Therefore, which compiler, with which flags, against which STL you measured is quite important.

C#

Align on Release builds
Pin the .NET version
Record conditions such as Server GC / Workstation GC
Document whether Tiered Compilation, ReadyToRun, and Native AOT are in play
Separate cold and warm

For C#, differences in .NET configuration change how things look. In particular, JIT-compiled C# and Native AOT C# are different axes even though both are “C#”. Mix them and what you are comparing is no longer the language but the deployment form.

Java

Pin the JDK vendor and version
Document the GC
Pin warm-up / measurement / fork settings
Record the heap size and JVM options
Separate cold start and steady state

Java benefits readily from the JIT, but in exchange, how it looks on the first run varies considerably. Therefore, separating short-lived process comparisons from long-running comparisons is mandatory.

Go

Pin the Go version
Pin GOMAXPROCS
Document CGO_ENABLED
If you touch GOGC, always record it
Keep benchmark-format output if possible

Go is relatively easy to handle, but in parallel benchmarks the impact of GOMAXPROCS is large. Also, whether or not you use cgo changes the whole picture, so always record that in the conditions.

How to Align the Execution Environment

In any language, a comparison without an aligned environment is mostly a comparison of environments.

Things to align

Same CPU / memory / storage
Same OS version
Same power conditions
Conditions close to the same room temperature
Same input data
Same process priority
Same core-count conditions
Same container-or-bare-metal conditions

Things that matter especially

Power settings and CPU frequency

On a laptop, AC power versus battery alone puts you in a different world. If the CPU governor or power mode is not aligned, comparison results wobble considerably.

For how to align power conditions, notifications, background noise, heat, and execution order on Windows, see our other article How to Compare the Execution Speed of Different Versions of a Program on Windows where this is covered in detail. If you measure on Windows, this matters a lot.

Heat

If only the first few runs are fast and later runs degrade, suspect heat and throttling. Rather than running all of A and then all of B, alternating like A / B / A / B reduces the bias.

Background activity

Updates, indexing, sync, virus scans, browsers, chat tools. These are unglamorous, but they routinely interfere.

What to Measure

For language comparisons, we recommend looking at at least these four separately.

1. Wall-clock time

The real time the user waits. This is the first metric to look at.

2. CPU time

“How much CPU was actually consumed.” If only the wall-clock time is faster while CPU time stays the same, the difference may come from waiting or I/O.

3. Memory / allocations

peak RSS
total allocation volume
allocation count
GC count
GC pauses

Looking at these reveals the cost behind the speed.

4. Distribution

median
p95 / p99
min / max
standard deviation and spread

If you talk in averages only, you never see the true nature of the runs that occasionally spike.

Recommended Execution Procedure

The flow that works well in practice goes roughly in this order.

1. Decide the workload

First, make explicit what you want to compare.

startup time
steady-state throughput
tail latency
memory efficiency
parallel scaling

2. Fix a common dataset

Align the input data using a fixed seed or a fixed file. If you include data generation in the measurement, it too must run under the same conditions in every language.

3. Pass correctness checks first

Confirm that all implementations return the same result on both small and large data. Having them emit a checksum or hash makes this easy to handle.

4. Pin the build conditions

Produce Release / optimized executables in each language, and record versions and flags.

5. Separate cold and warm

This is especially important for C# and Java.

cold: includes the moment right after process startup
warm: the stable state after several runs

These two are cleaner kept out of the same table.

6. Alternate or randomize the execution order

Example:

cpp -> csharp -> java -> go
go -> java -> cpp -> csharp
csharp -> go -> java -> cpp
...

This reduces bias from heat and noise.

7. Secure enough iterations

For lightweight microbenchmarks, run a great many iterations; for end-to-end runs, you want at least 10. When the difference is small and the iteration count is low, interpretation becomes quite precarious.

8. Save the raw data

Keep the raw data of every run, not just the aggregates. Looking back later, you can read outliers and warm-up quirks out of it.

9. Profile when a difference appears

Only when a difference appears do you start digging into the cause.

CPU profile
allocation profile
GC logs
flame graphs
OS-level traces

Once you get this far, you can discuss why it happens, not just “fast / slow”.

How to Read the Results

Even after the numbers are in, misreading them is still dangerous.

C# / Java are slow only on the first run

Suspect JIT, class loading, and initialization. In that case,

if startup time matters, it is a meaningful difference
if long-running operation is the subject, it is a difference that belongs in a separate table

C++ is strong in tight loops

Low-level optimization, object layout, and minimal runtime overhead may be paying off. However, looking only at that and concluding “therefore it is the fastest in a real service too” is a leap.

Go looks favorable on startup time and ease of distribution

The single binary, the relatively light startup, and the approachable concurrency model can pay off. However, that does not mean it is favorable for every CPU-bound workload.

C# / Java catch up considerably — or overtake — at steady state

JIT optimization may be kicking in. This is not a rare story either. That is why it is important not to mix startup-inclusive comparisons with steady-state comparisons.

Large differences on allocation-heavy workloads

In this case, more than the language name, what usually matters is

memory layout
how strings and maps are handled
GC behavior
extra copies

Recording Template

Keep at least these fields with every benchmark result and you will thank yourself later.

timestamp,language,scenario,run_kind,cold_or_warm,elapsed_ms,cpu_ms,max_rss_mb,alloc_bytes,gc_count,checksum
compiler_or_runtime,compiler_version,flags,os,cpu,threads,input_id,notes

For example, run_kind can be split like this.

micro
macro
startup
parallel

For cold_or_warm, you definitely want to make explicit which one it is.

cold
warm

With benchmarks, being interpretable later often matters more than the act of measuring.

Summary

What really matters in a C# / C++ / Java / Go speed comparison is taking the crude question of which language is fastest and turning it into the shape of an experiment: which workload, under which conditions, on which metric, are we comparing?

The points that are particularly hard to get wrong are these.

Separate startup time from steady state
Measure with the same algorithm, the same input, and the same correctness check
Never draw conclusions from a single benchmark
Separate per-language benchmarks from cross-language benchmarks
Look at the median and the distribution rather than the mean
Keep the conditions and the raw data

And the most important thing of all: do not try too hard to decide winners and losers by language name. Real-world performance is determined by the combination of language, runtime, libraries, build conditions, data, OS, and hardware.

“C++ is fast”, “Java is strong”, “Go is lightweight”, “C# is plenty fast too” — in some sense, all of these are true. But once under which conditions you are saying so drops out, it mostly turns into a fistfight in the fog.

Align the conditions, use multiple workloads, separate cold / warm, and look all the way down to the distribution. Unglamorous, but in the end this is what wins.

References

BenchmarkDotNet Getting Started https://benchmarkdotnet.org/articles/guides/getting-started.html
OpenJDK JMH Project https://openjdk.org/projects/code-tools/jmh/
JMH GitHub Repository / README https://github.com/openjdk/jmh
Go testing package https://pkg.go.dev/testing
Go benchstat https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
Google Benchmark User Guide https://google.github.io/benchmark/user_guide.html
How to Compare the Execution Speed of Different Versions of a Program on Windows https://comcomponent.com/en/blog/2026/03/16/002-windows-benchmark-comparing-program-versions/

Pages that are easier to understand when read together with this article.

Technical Topics

Where to Discuss This Topic

Designing performance comparisons, aligning measurement conditions, interpreting results, and digging into root causes are a great fit for the following services.

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

A Checklist for Safely Handling Child Processes in Windows Apps

How to make child processes die with the parent on Windows: Job Objects, exit propagation, stdout/stderr draining, and watchdog placement...

Read Article

Shared Memory Pitfalls and Practical Best Practices

Shared memory is fast IPC, but it does not synchronize itself. How to avoid the classic traps: atomics, ABI and layout, lifetime, permiss...

Read Article

How to Correctly Compare the Speed of Different Program Versions on Windows

A reproducible procedure for comparing program versions on Windows, covering power mode, power plan, thermals, background noise, measurem...

Read Article

Calling a C# Native AOT DLL from C/C++

Export C entry points from C# with Native AOT and UnmanagedCallersOnly and call them from C/C++ - project setup, string handling, and a w...

Read Article

System Tray Icons and Toast Notifications in Windows Apps — NotifyIcon Pitfalls and Choosing the Right AppNotification API

A practical guide to keeping a business Windows app resident in the system tray and notifying users with toast notifications. Covers the ...

Read Article

Where This Topic Connects

This article connects naturally to the following service pages.

Technical Consulting & Design Review

Designing performance comparisons, aligning measurement conditions, and reading warm-up behavior and statistics correctly are a great fit for technical consulting and design reviews.

View Service Contact

Bug Investigation & Root Cause Analysis

Isolating the cause of performance differences across languages and versions, pinpointing bottlenecks, and validating measurement procedures are well suited to bug investigation and root cause analysis.

View Service Contact

Frequently Asked Questions

Common questions about the topic of this article.

Which language is fastest: C#, C++, Java, or Go?: There is no single answer, and trying to decide a winner with one number is the worst thing you can do. CPU computation, memory allocation, parallel processing, and startup time each make different languages and runtimes look strong. C++ tends to shine in tight loops, Go looks favorable on startup time and distribution, and C# and Java can catch up or overtake at steady state once the JIT kicks in. Real-world performance is determined by the combination of language, runtime, libraries, build conditions, data, OS, and hardware, so the meaningful question is which workload, under which conditions, on which metric.
Why do C# and Java need warm-up before benchmarking?: C# and Java are normally JIT-compiled, so a first run measures not only the program's speed but also runtime startup, class loading, and JIT preparation, while C++ and Go are compiled ahead of time. Mixing comparisons that include the first run with steady-state comparisons after warm-up twists the whole discussion. Treat cold and warm as separate measurements: cold matters for CLI tools and short-lived batch jobs, warm matters for servers, resident processes, and long-running work.
What benchmark tools should I use for each language?: For per-language measurement, use the harness suited to that language: BenchmarkDotNet for C#, JMH for Java, go test -bench with benchstat for Go, and Google Benchmark for C++. These absorb each runtime's quirks and handle statistics. For cross-language comparison, however, placing BenchmarkDotNet results next to JMH results is risky because the harnesses follow different conventions; instead, turn each implementation into an executable with the same CLI contract and drive them all from a common external runner under identical conditions.
What are the most common mistakes in cross-language benchmarks?: The classics are: mixing Debug and Release builds; not solving the same problem (different inputs, outputs, or error handling); running once and drawing a conclusion when a single run is mostly noise from JIT, heat, GC, or background tasks; skipping correctness checks, when every implementation should produce the same checksum from the same input; and in C++, letting the optimizer delete the work entirely so the code is fast because it is doing nothing. Also look at the median and distribution rather than the mean, and record the experimental conditions along with the numbers.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

View Profile Contact

Public links

GitHub LinkedIn X COM_BLAS COM_BigDecimal

The Conclusion First

What to Decide First

1. Do you want to look at startup time?

2. Do you want to look at long-running throughput?

3. Do you want to look at tail latency?

4. Do you want to include memory efficiency?

Why Comparing Languages Is Hard

Mixing JIT and AOT turns it into a different experiment

Implementation differences routinely outweigh language differences

C++ has a trap where optimization deletes the work

GC is neither an “advantage” nor a “disadvantage” — it is a characteristic

What Not to Do in a Comparison

1. Mixing Debug and Release

2. Not solving the same problem

3. Running once and drawing a conclusion

4. Blurring warm-up

5. Skipping correctness checks

6. Building a worldview from a single microbenchmark

The Basic Approach When Comparing C# / C++ / Java / Go

1. For per-language measurement, use the harness suited to that language

2. For cross-language comparison, put a common runner on the outside

Concrete Example: What Benchmark Scenarios to Prepare

Recommended lineup

1. sort_int32_10m

2. hash_group_count

3. parallel_sha256

4. startup_noop or startup_parse_small

What about JSON and HTTP benchmarks?

Conditions to Align per Language

C++

C#

Java

Go

How to Align the Execution Environment

Things to align

Things that matter especially

Power settings and CPU frequency

Heat

Background activity

What to Measure

1. Wall-clock time

2. CPU time

3. Memory / allocations

4. Distribution

Recommended Execution Procedure

1. Decide the workload

2. Fix a common dataset

3. Pass correctness checks first

4. Pin the build conditions

5. Separate cold and warm

6. Alternate or randomize the execution order

7. Secure enough iterations

8. Save the raw data

9. Profile when a difference appears

How to Read the Results

C# / Java are slow only on the first run

C++ is strong in tight loops

Go looks favorable on startup time and ease of distribution

C# / Java catch up considerably — or overtake — at steady state

Large differences on allocation-heavy workloads

Recording Template

Summary

References

Related Topics

Where to Discuss This Topic

Related Articles

Related Topics

Where This Topic Connects

Frequently Asked Questions

Author Profile

Go Komura

1. `sort_int32_10m`

2. `hash_group_count`

3. `parallel_sha256`

4. `startup_noop` or `startup_parse_small`