How Should You Measure and Compare the Speed of Programming Languages? A Practical Guide to Benchmarking C# / C++ / Java / Go on Equal Terms
The single most important point
Do not try to settle “which language is fastest” with a single number. Instead, be explicit about which workload, under which conditions, measured by which metric you are comparing.
Decide these things first
The word “fast” is not enough on its own. Pin down the following.
1. Is it startup time you care about?
For CLI tools and short-lived batch jobs, cold start and process startup matter. Whether you include the cost of JIT initialization changes the result dramatically.
2. Is it long-running throughput?
For servers and resident processes, the steady state after warm-up is the real subject. The slowness of the first run is not the point.
3. Is it tail latency (p95/p99)?
For APIs and UIs, p95/p99 matters more than the average. Even if the mean is fast, occasional long stalls hurt.
4. Is memory efficiency also in scope?
If you only look at CPU time and ignore peak memory usage, allocation count, GC count, and GC pauses, you will misread how heavy the workload really is in production.
Why language-to-language comparison is hard
JIT vs AOT
- C# and Java are normally JIT (compiled at runtime)
- C++ and Go are normally AOT (compiled ahead of time)
If you measure the very first run, you are including runtime startup costs. If you only measure after warm-up, steady-state optimizations dominate. Treat cold and warm as two different things.
Implementation differences often outweigh language differences
Even for the same “sort,” results vary depending on whether you use the standard library or a hand-rolled implementation, and on how many extra copies happen. For JSON or crypto, the library implementation matters more than the language itself.
C++ can have its work optimized away
The compiler may decide “nobody uses this result” and eliminate the computation entirely. Consuming the result and emitting a checksum matters.
How GC (garbage collection) is handled
“GC makes it slow” is too crude. How short-lived objects in bulk are handled, the heap size configuration, and the frequency and pause of GC matter much more.
Things you must not do in a comparison
- Mixing Debug and Release builds - always use a production-equivalent optimized build
- Not solving the same problem - if input format or output differs, you are measuring requirement gaps
- Drawing conclusions from a single run - JIT, CPU boost, heat, and GC all blur together
- Mixing in warm-up - keep cold and warm in separate tables
- Skipping correctness checks - confirm that every implementation produces the same checksum
- Deciding everything from one microbenchmark - winning a tight loop does not guarantee winning a real service
The basic strategy: a two-layer comparison
Layer 1: in-language measurement (use each language’s tools)
| Language | Recommended tool |
|---|---|
| C# | BenchmarkDotNet |
| Java | JMH |
| Go | go test -bench + benchstat |
| C++ | Google Benchmark |
These take care of each language’s runtime quirks and statistical handling. They are useful for comparing within a language and for digging into an implementation.
Layer 2: cross-language comparison (use a shared runner)
Lining up BenchmarkDotNet output side by side with JMH output is risky (the harnesses follow different conventions).
Wrap each implementation in an executable that responds to the same CLI contract and drive them all from outside under identical conditions.
bench --scenario sort_int32 --dataset data/sort_10m.bin --mode warm
bench --scenario group_words --dataset data/words_100mb.txt --mode cold
On the shared runner side:
- Randomize execution order
- Separate cold and warm runs
- Pass the same dataset
- Verify the checksum
- Capture wall-clock time and memory
Recommended benchmark scenarios (3-4 of them)
1. sort_int32_10m
Goal: CPU plus memory bandwidth. Sort 10 million int32 values generated with a fixed seed.
2. hash_group_count
Goal: hash table behavior, string handling, allocation, and GC tendencies. Count word occurrences in a text file.
3. parallel_sha256
Goal: parallelism and scheduler behavior. Hash with N threads, stepping through 1 / 2 / 4 / 8 threads.
4. startup_noop or startup_parse_small
Goal: startup time. Two variants: noop (start and immediately exit), and a single small input parsed before exit.
JSON or HTTP benchmarks are closer to real work, but they end up comparing libraries and frameworks alongside the language itself. Make that explicit when you publish them.
Conditions to align per language
C++
- Optimized build (
-O3//O2, LTO, PGO) - Pin the compiler and STL implementation
- Watch out for the optimizer eliminating your work
- Suspect undefined behavior if a result looks too fast
C#
- Release build
- Pin the .NET version
- Record Server GC vs Workstation GC
- Note Tiered Compilation, ReadyToRun, and whether Native AOT is used
- JIT C# and Native AOT C# belong on different axes
Java
- Pin the JDK vendor and version
- State the GC explicitly (G1, ZGC, etc.)
- Record heap size and JVM options
- Fix the warm-up / measurement / fork counts
Go
- Pin the Go version
- Pin
GOMAXPROCS - If you tune
GOGC, record it - State whether cgo is used
How to align the execution environment
- Same CPU / memory / storage
- Same OS version
- Same power settings (a laptop on AC vs on battery is a different world)
- Comparable room temperature
- Suppress background activity (updates, antivirus scans)
- Interleave runs A/B/A/B (this reduces heat and throttling bias)
Metrics worth measuring
- Wall-clock time - the real time the user waits. Look at this first
- CPU time - the time the CPU was actually busy
- Memory / allocation - peak RSS, total allocated bytes, GC count, GC pause
- Distribution - median, p95/p99, min/max, standard deviation
If you only talk about averages, you cannot see what is causing the occasional spike.
How to read the results
- C# / Java are slow only on the first run - that is the JIT effect. Whether it matters depends on whether startup time or steady-state throughput is the focus
- C++ wins tight loops - that is low-level optimization paying off. It does not guarantee that the whole service is faster
- C# / Java catch up or pull ahead in steady state - JIT optimization paying off. Not unusual
- Differences widen on allocation-heavy workloads - more often than not, memory layout and GC behavior are doing the work, not the language name
A logging template
Keep at least the following per row:
timestamp, language, scenario, run_kind, cold_or_warm, elapsed_ms,
cpu_ms, max_rss_mb, alloc_bytes, gc_count, checksum,
compiler_or_runtime, compiler_version, flags, os, cpu, threads, input_id, notes
Being able to interpret a benchmark later matters more than running it once.
Summary
- Separate startup time from steady state
- Measure with the same algorithm, the same input, and the same correctness check
- Do not draw conclusions from a single benchmark
- Keep in-language and cross-language benchmarks separate
- Look at medians and distributions, not just averages
- Preserve conditions and raw data
- Do not lean too hard on the language name to decide a winner
Real-world performance is decided by the combination of “language + runtime + library + build settings + data + OS + hardware.”
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Checklist for Safe Child-Process Handling in Windows Apps - Best Practices for Job Objects, Exit Propagation, stdio, and Watchdogs
A design guide for safe child-process handling on Windows, organized around four axes - process-tree ownership, exit propagation, stdio, ...
Shared Memory Pitfalls and Best Practices - Sort Out Synchronization, Visibility, Lifetime, ABI, and Security First
A practical breakdown of the typical pitfalls of shared memory in production - synchronization, visibility, lifetime, ABI, permissions, a...
How to Compare Program Versions on Windows: From Power Mode Setup to the Limits of Repeatability
A practical guide to fairly comparing the runtime speed of old and new versions of your own program on Windows. Covers how to align the p...
How to Ship C# as a Native DLL with Native AOT - Calling UnmanagedCallersOnly Exports from C/C++
A practical guide to publishing a C# class library as a native DLL with Native AOT and calling it from C/C++ via UnmanagedCallersOnly — c...
What Windows Efficiency Mode Is: What the Green Leaf Icon in Windows 11 Means and How to Turn It Off
A practical look at the green leaf icon (Efficiency Mode) you see in Windows 11: what it actually does, when Windows applies it automatic...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.