Target: .NET 10 (LTS) Β· C# 14 Β· Last updated March 2026
This guide is a curated set of 100 performance rules, tips, and idioms every developer on this project must know. Each entry is concise and actionable. Entries marked π leverage features new in .NET 10 / C# 14.
- Async & Task Parallelism
- Thread & Concurrency Primitives
- LINQ Optimization
- Loops & Iteration
- Collections & Data Structures
- Memory, Span & Allocation
- String Processing
- Caching & Lazy Loading
- Serialization & I/O
- JIT, Runtime & GC
- EF Core & Data Access
- ASP.NET Core & HTTP
- General C# Idioms
- Benchmarking & Profiling
Never call .Result, .Wait(), or .GetAwaiter().GetResult() on hot paths. These block the calling thread and can cause thread-pool starvation and deadlocks (especially under SynchronizationContext).
ValueTask avoids the Task heap allocation when the result is already available. Ideal for cache hits, buffered reads, or fast-path returns.
public ValueTask<int> GetCountAsync()
{
if (_cache.TryGet("count", out int val))
return ValueTask.FromResult(val);
return new ValueTask<int>(FetchCountFromDbAsync());
}
β οΈ Never await aValueTaskmore than once and never use.Resulton one that is not yet completed.
In library/service code (anything that doesn't touch UI), always append .ConfigureAwait(false) to avoid capturing SynchronizationContext. This prevents unnecessary thread marshalling.
When you need results from N independent I/O calls, fire them all then await the batch:
var (users, orders) = (GetUsersAsync(), GetOrdersAsync());
await Task.WhenAll(users, orders);.NET 6+ provides Parallel.ForEachAsync which respects MaxDegreeOfParallelism and async delegates natively. Prefer it over manually spawning tasks in a loop.
await Parallel.ForEachAsync(urls, new ParallelOptions { MaxDegreeOfParallelism = 8 },
async (url, ct) => await ProcessAsync(url, ct));Wrapping synchronous code in Task.Run inside a library only hides the blocking. Push async down to the real I/O boundary. Task.Run is appropriate only at the application boundary (e.g., offloading CPU work from a UI thread).
Pass CancellationToken through every async call chain. Cancelled tasks free resources early and prevent wasted compute. Check token.ThrowIfCancellationRequested() in CPU loops.
Use SemaphoreSlim to limit concurrent access to scarce resources (DB connections, third-party APIs). Create once, reuse always.
private static readonly SemaphoreSlim _gate = new(10, 10);
await _gate.WaitAsync(cancellationToken);
try { /* work */ }
finally { _gate.Release(); }System.Threading.Channels are allocation-lean, backpressure-aware, and fully async. Use Channel.CreateBounded<T> for natural flow control.
async void swallows exceptions and cannot be awaited. The only acceptable use is event handlers in UI frameworks. Everywhere else: async Task.
Use Task.Run, ThreadPool.QueueUserWorkItem, or Parallel.*. The thread pool manages sizing and reuse. Raw threads bypass pool economics.
.NET 9+ introduced System.Threading.Lock - a lightweight, purpose-built lock type. Using lock (myLockObj) with a Lock instance emits optimized code paths.
private readonly Lock _lock = new();
lock (_lock)
{
// critical section
}If reads vastly outnumber writes, ReaderWriterLockSlim allows concurrent readers and exclusive writers, beating plain lock on throughput.
Interlocked.Increment, .CompareExchange, .Exchange are lock-free and orders of magnitude faster than locks for simple counters and flags.
Closures capture local variables into a compiler-generated class on the heap. Pass state via static lambdas + explicit state parameter where possible.
ThreadPool.QueueUserWorkItem(static state =>
{
var ctx = (MyContext)state!;
ctx.Process();
}, context);LINQ allocates iterators, delegates, and closures. On hot paths (millions of iterations, tight game loops, packet parsing), use hand-written for/foreach with spans.
Don't re-evaluate .Where(...).Select(...) chains multiple times. Call .ToList() or .ToArray() once and reuse.
Before calling .Count() (which may enumerate the full sequence), use TryGetNonEnumeratedCount to check if the collection already knows its length.
.Any() short-circuits after the first element. .Count() may enumerate the entire sequence.
.NET 10 includes System.Linq.AsyncEnumerable in the core libraries. Use it for composable async pipelines without third-party packages.
If the consumer only iterates, return IEnumerable<T> (deferred) or pass the query directly. Materializing into a List<T> allocates an array that may not be needed.
These avoid the overhead of a key-selector delegate when sorting by the element itself.
Otherwise filter first (.Where) to reduce the number of projections executed.
Enumerable.Chunk(n) (introduced .NET 6) splits a sequence into arrays of size n with zero custom code.
The JIT elides bounds checks when it can prove i < array.Length. foreach on List<T> incurs an enumerator struct copy.
.NET 10's JIT can devirtualize and inline array interface methods - IEnumerable<T> iteration on arrays is now close to raw for loop performance. Still, if you control the code, prefer Span<T> or for.
foreach on a span compiles to efficient pointer-arithmetic code with no heap allocation.
ReadOnlySpan<int> data = collection.AsSpan();
foreach (var item in data) { /* zero alloc */ }Move allocations, lookups, and computations that don't change per-iteration before the loop.
Calling .FirstOrDefault(), .Any(), or .Where() inside a for/foreach creates hidden O(NΓM) complexity and per-iteration allocations.
When iterating over a large struct[], use ref to avoid copying:
for (int i = 0; i < items.Length; i++)
{
ref var item = ref items[i];
item.Value += 1; // mutate in place, no copy
}This gives you a Span<T> over the list's internal buffer - no copy, no enumerator, fully bounds-checked.
var span = CollectionsMarshal.AsSpan(myList);
for (int i = 0; i < span.Length; i++) { /* fast */ }Manual unrolling rarely beats the JIT's own loop optimizations in .NET 10. Profile first.
new List<T>(capacity), new Dictionary<K,V>(capacity), new HashSet<T>(capacity) - prevents repeated resize/re-hash.
System.Collections.Frozen collections (.NET 8+) optimize for read-heavy workloads after construction. Lookups are significantly faster than Dictionary in high-concurrency scenarios.
FrozenDictionary<string, Config> config = data.ToFrozenDictionary(x => x.Key, x => x.Value);The overhead of striped locking isn't free. If access is single-threaded or guarded externally, use a plain Dictionary.
ContainsKey + [key] performs two lookups. TryGetValue performs one.
int[] nums = [1, 2, 3, 4, 5]; // compiler picks optimal backingThe compiler may use ReadOnlySpan<T> or inline arrays internally.
Introduced in .NET 6, it's a proper min-heap - O(log n) enqueue/dequeue.
Avoid repeated allocation of large arrays. Pool them.
var buffer = ArrayPool<byte>.Shared.Rent(8192);
try { /* use buffer */ }
finally { ArrayPool<byte>.Shared.Return(buffer); }Lower overhead than ImmutableList<T> (no tree, just an array wrapper).
They have purpose-optimized internal storage and avoid the temptation of random access.
Never store value types in ArrayList, List<object>, or non-generic collections. Each element boxes to the heap. Use generic List<T> or typed arrays.
C# 14 adds implicit conversions between T[], Span<T>, and ReadOnlySpan<T>. You no longer need .AsSpan() in many cases - just pass the array where a span is expected.
void Process(ReadOnlySpan<byte> data) { }
byte[] buffer = GetData();
Process(buffer); // implicit conversion in C# 14.NET 8+ supports params Span<T>. Callers pass inline args with no hidden array allocation:
void Log(params ReadOnlySpan<string> messages) { }
Log("start", "processing", "done"); // no array allocatedFor buffers under ~512 bytes whose lifetime doesn't escape the method, stackalloc avoids heap allocation entirely:
Span<byte> buf = stackalloc byte[256];.NET 10's escape analysis can stack-allocate small arrays that don't escape the method. Write idiomatic code and let the JIT decide:
int[] temp = [1, 2, 3]; // may be stack-allocated in .NET 10
var sum = temp.Sum();Span<T> is stack-only. When you need to pass a slice across async boundaries, use Memory<T> / ReadOnlyMemory<T>.
Every heap allocation eventually costs GC time. Use BenchmarkDotNet's [MemoryDiagnoser] to track Allocated bytes.
For expensive-to-create objects (regex engines, StringBuilder, custom contexts) that are needed repeatedly, pool them.
MemoryMarshal.Cast<TFrom, TTo>(span) performs zero-copy reinterpret casts between blittable types. Powerful but unsafe - use only when profiling proves necessity.
String concatenation in a loop allocates a new string per iteration. StringBuilder amortizes.
Modern string interpolation compiles to stack-based Span<char> builders when assigned to string. For custom targets, accept ref DefaultInterpolatedStringHandler.
Allocates once, writes directly into the buffer:
string result = string.Create(10, seed, (span, s) =>
{
// fill span directly
});Substring allocates a new string. Slicing a ReadOnlySpan<char> is allocation-free:
ReadOnlySpan<char> name = fullName.AsSpan()[..5];Ordinal comparisons are ~5Γ faster than culture-aware ones. Always specify the comparison type explicitly.
SearchValues pre-computes a vectorized lookup table. Reuse the instance across calls.
private static readonly SearchValues<char> Vowels = SearchValues.Create("aeiouAEIOU");
bool hasVowel = input.AsSpan().ContainsAny(Vowels);Pre-parse the format string once, reuse many times:
private static readonly CompositeFormat Fmt = CompositeFormat.Parse("Hello, {0}! You have {1} items.");
string msg = string.Format(null, Fmt, name, count);Encode directly into a buffer instead of allocating a byte[].
[GeneratedRegex(@"\d{4}-\d{2}-\d{2}", RegexOptions.Compiled)]
private static partial Regex DatePattern();Source-generated regex is compiled at build time - no runtime compilation cost, better throughput, and AOT-compatible.
For 2β4 parts, string.Concat overloads are faster than interpolation or + because they calculate exact length upfront.
private readonly Lazy<ExpensiveResource> _resource = new(() => new ExpensiveResource());Unlike Lazy<T>, this doesn't require a wrapper object.
Always set SizeLimit on MemoryCache and assign Size to entries. Unbounded caches cause memory leaks.
var value = await cache.GetOrCreateAsync("key", async entry =>
{
entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5);
return await FetchDataAsync();
});Entries are collected when the key is collected.
HybridCache (Microsoft.Extensions.Caching.Hybrid) coalesces concurrent requests for the same key, preventing cache stampedes. It supports L1 (in-memory) + L2 (distributed) out of the box.
[JsonSerializable(typeof(MyDto))]
internal partial class AppJsonContext : JsonSerializerContext { }Source-generated serialization avoids runtime reflection, is NativeAOT-compatible, and is up to 40% faster.
.NET 10 improves JsonSerializer.DeserializeAsyncEnumerable<T> performance. For large payloads, stream instead of buffering:
await foreach (var item in JsonSerializer.DeserializeAsyncEnumerable<Record>(stream))
{
Process(item);
}Returning IAsyncEnumerable<T> from a controller streams results to the client without buffering the full collection in memory.
System.IO.Pipelines provides backpressure, pooled buffers, and zero-copy parsing. Use for protocol parsers, file processing, and socket servers.
BinaryFormatter is removed in .NET 10. Use System.Text.Json, MessagePack, Protobuf, or MemoryPack.
await using var fs = new FileStream(path, FileMode.Open, FileAccess.Read,
FileShare.Read, bufferSize: 4096, FileOptions.Asynchronous | FileOptions.SequentialScan);Avoids LOH allocations from large MemoryStream buffers.
The .NET 10 JIT delivers 15β30% raw perf gains on modern hardware through AVX-512/AVX10.2, ARM SVE, improved loop inversion, and better inlining. Upgrade to .NET 10 and recompile - no code changes needed.
.NET 10 can stack-allocate delegates whose target doesn't escape, eliminating GC pressure from lambdas in tight loops. Write idiomatic code; the JIT optimizes.
The GC automatically tunes heap sizing based on workload and container memory limits. For most apps, do not manually tune GC settings unless profiling proves otherwise.
<PropertyGroup>
<ServerGarbageCollection>true</ServerGarbageCollection>
</PropertyGroup>Prevent the GC from consuming the full container memory. Set via runtimeconfig.json:
{
"runtimeOptions": {
"configProperties": {
"System.GC.HeapHardLimit": 209715200
}
}
}Calling GC.Collect() in production disrupts the GC's adaptive heuristics. Acceptable only in known memory-spike scenarios with careful measurement.
Ahead-of-time compilation eliminates JIT warmup and reduces working set. .NET 10 further reduces NativeAOT binary sizes and improves compile-time optimization.
Skips zero-initialization of locals when you know you'll write before reading:
[SkipLocalsInit]
static void ProcessBuffer(Span<byte> data) { /* ... */ }sealed enables the JIT to devirtualize method calls, enabling inlining. Mark every class sealed unless it's explicitly designed for inheritance.
Disabling change tracking eliminates the overhead of identity resolution and snapshot creation.
var products = await db.Products.AsNoTracking().ToListAsync();var orders = await db.Orders
.Include(o => o.Items)
.AsSplitQuery()
.ToListAsync();Prevents cartesian explosion from multiple Include joins.
Avoid loading entities just to update/delete them:
await db.Products
.Where(p => p.IsDiscontinued)
.ExecuteDeleteAsync();private static readonly Func<AppDbContext, int, Task<Product?>> GetById =
EF.CompileAsyncQuery((AppDbContext db, int id) =>
db.Products.FirstOrDefault(p => p.Id == id));Named filters (HasQueryFilter("name", ...)) let you selectively ignore specific filters per query, avoiding the global filter all-or-nothing problem.
Never fetch full entities when you need three columns. Projection reduces I/O, memory, and deserialization cost.
Open connections late, close early. Let the pool manage reuse. Configure MaxPoolSize to match your concurrency model.
Profile with EF Core logging or interceptors. N+1 is the single most common data-access performance killer.
builder.Services.AddResponseCompression(opts =>
opts.MimeTypes = ResponseCompressionDefaults.MimeTypes.Concat(["application/json"]));app.MapGet("/products", GetProducts).CacheOutput(p => p.Expire(TimeSpan.FromMinutes(5)));Direct instantiation causes socket exhaustion. The factory manages handler lifetimes and DNS rotation.
builder.Services.AddRateLimiter(opts =>
opts.AddFixedWindowLimiter("api", o => { o.Window = TimeSpan.FromSeconds(10); o.PermitLimit = 100; }));Explicit IResult return avoids runtime content negotiation overhead.
app.MapGet("/health", static () => Results.Ok("healthy"));Prevents defensive copies when accessed through in parameters or readonly fields.
public readonly struct Point(double X, double Y);record struct auto-generates efficient Equals, GetHashCode without boxing.
public string Name
{
get;
set => field = value ?? throw new ArgumentNullException(nameof(value));
}[InlineArray(8)]
public struct EightInts
{
private int _element0;
}Gives you a fixed-size, stack-allocated, span-compatible buffer.
You cannot optimize what you do not measure.
| Tool | Purpose |
|---|---|
| BenchmarkDotNet | Micro-benchmarks with statistical rigor. Use [MemoryDiagnoser]. |
| dotnet-counters | Real-time runtime metrics (GC, thread pool, exception rate). |
| dotnet-trace | Collect EventPipe traces, open in PerfView / Speedscope. |
| dotnet-dump | Capture and analyze heap dumps for memory leaks. |
| Visual Studio 2026 Profiler Agents π | AI-powered profiler that generates performance recommendations. |
| JetBrains dotMemory / dotTrace | Commercial but powerful memory and CPU profiling. |
- Identify - use
dotnet-counters/ APM to find the hot service or endpoint. - Trace - collect a
dotnet-traceor VS profiler session under realistic load. - Micro-bench - isolate the hot method and benchmark candidate fixes with BenchmarkDotNet.
- Ship & verify - deploy the fix and confirm improvement in production metrics.
| Anti-Pattern | Fix |
|---|---|
.Result / .Wait() |
await with async all the way |
new HttpClient() in a loop |
IHttpClientFactory |
string += in a loop |
StringBuilder or string.Create |
list.Count() > 0 (LINQ) |
list.Count > 0 (property) or .Any() |
Dictionary for static config |
FrozenDictionary |
Large byte[] allocations |
ArrayPool<byte>.Shared |
new List<T>() with known size |
new List<T>(capacity) |
| Regex in a loop | [GeneratedRegex] static field |
| EF: loading full entity for 2 fields | .Select(x => new { x.A, x.B }) |
| UnSealed classes everywhere | sealed class by default |
- Performance Improvements in .NET 10 - Stephen Toub
- What's new in C# 14 - Microsoft Learn
- What's new in .NET 10 Runtime - Microsoft Learn
- Writing High-Performance C# - Microsoft Docs
- BenchmarkDotNet Documentation
Maintained by the engineering team. PRs welcome - keep entries concise and evidence-backed.