Three Weeks in the Trenches: Hunting a 4GB Native Memory Leak in ASP.NET Core

TL;DR

Our ASP.NET Core pods on Kubernetes were OOM-killed every few hours during a pre-launch pilot. The managed heap was clean—95% of the memory was native, invisible to every .NET diagnostic tool. The cause: unstable mobile connections created thousands of zombie SignalR connections, each carrying per-connection native overhead from the runtime—connection state, native interop structures, and the many small malloc() calls the runtime makes per connection—whose allocations landed below glibc's 128KB mmap threshold, fragmenting the allocator's internal heap across dozens of thread arenas. Freed memory was never returned to the OS. Kernel TCP send buffers for dead sockets added another invisible layer. The fix was four changes at four layers: aggressive SignalR timeouts (5 min → 10 sec), ALB idle timeout reduction, stream deduplication, and replacing glibc's allocator with jemalloc via LD_PRELOAD. Pods stabilised under 1GB. Zero OOM-kills since.

Day Zero

It started with an alert mid-morning. One of our production pods had been OOM-killed. Nothing unusual for a system under load—except it kept happening. During business hours—roughly 9 AM to 9 PM, when our users were active—pods would balloon to 4GB and die. Kubernetes would restart them, our mobile clients would auto-reconnect in three seconds, and the cycle would begin again.

I'm the founder and CTO of BasicHomeLoan. The system in question handles real-time lead management—every customer interaction, call log, WhatsApp message, and status update on a home loan application is streamed live to the assigned fulfilment team member and their management hierarchy via SignalR. The goal is that when a customer calls, everyone in the fulfilment team who needs to know already knows. Real-time visibility across the organisation drives faster response times and higher customer satisfaction. This was a pre-launch pilot. Around 200 connected devices, our fulfilment team testing before a full rollout. Because it was a controlled pilot, we were able to trace client IPs and network calls back to specific devices. That's how we discovered the irony: roughly half the unstable connections weren't coming from some remote rural location—they were coming from our own office basement and ground floor, where cellular reception was terrible. Once we identified the pattern, we positioned testers and developers physically in the basement with their phones to deliberately replicate the connection drops while we watched the server-side metrics in real time.

The architecture: when a home loan lead changes—a new WhatsApp message, a status update, a call log entry—we read the full lead entity and its related tables, then push it to every client that needs to see it. Because we run multiple Kubernetes pods, SignalR uses a Redis backplane—every broadcast goes through Redis so that all pods can deliver messages to their connected clients, regardless of which pod the originating request hit. More on why that matters later.

The pod memory told a simple story: something was eating 4GB during working hours. Finding what that something was would take three weeks—and we couldn't launch until it was fixed.

4 GB

Memory at crash

6 hrs

Time to OOM-kill

~200

Connected devices

50%

On unstable networks

· · ·

Week 1: The Obvious Suspect

The first instinct of any .NET developer facing a memory problem is to look at the managed heap. I took a full memory dump from a pod sitting at 3.7GB and opened it with dotnet-dump.

# The managed heap breakdown
eeheap -gc

# Result:
GC Heap Size:    ~207 MB

207MB. Out of 3.7GB. The managed heap was using five percent of the total memory.

I ran dumpheap -stat anyway and found a real issue: our EF Core second-level cache interceptor was configured to cache all queries by default, including the streaming queries that fire every two seconds. 172,000 EFTableRow objects. 31 million strings. I added .NotCached() to every query in our stream notifier, redeployed, and watched the metrics.

Managed heap dropped. RSS barely moved.

This became the pattern that made the investigation so gruelling. We'd work through the night deploying fixes and running load tests—sleepless nights, one after another. Every time, the overnight tests passed. Memory stayed flat, pods hummed along. We'd finally go to bed thinking we'd cracked it. Then 9 AM would hit, 200 real users from our fulfilment team would come online with their phones on spotty cellular connections, and within hours the pods would start dying again. The problem only manifested under real mobile traffic from real devices on unstable networks—something no synthetic overnight test could reproduce.

I forced a full GC collection. Nothing. Called GC.Collect(2, GCCollectionMode.Aggressive, true, true). Nothing. The memory was still there, growing steadily, completely indifferent to the garbage collector.

The realisation

Our health check endpoint confirmed it. Native memory was at 2,577 MB against a budget of 656 MB—nearly 4× what it should have been. The managed heap was 145 MB. The garbage collector had nothing to collect because 95% of the memory wasn't managed memory at all.

A brief detour: what is "native" memory?

If you've spent your career writing C#, you might never have needed to think about this. In .NET, your objects live on the managed heap—a region of memory that the garbage collector (GC) controls. When you create an object, the GC allocates space. When nothing references it anymore, the GC reclaims it. This is the world of dumpheap, gcroot, and GC.Collect().

But your .NET application doesn't run in a vacuum. It runs on top of a runtime, which runs on top of an operating system. And the OS, the runtime, and many libraries your app depends on allocate memory outside the managed heap. This is native memory—allocated directly from the operating system using C-level functions like malloc() and free(). The .NET garbage collector cannot see it, cannot track it, and cannot collect it. Examples include: database connection buffers (MySQL reads data off the wire into native buffers), runtime bookkeeping (JIT compiler state, thread stacks), and—critically for us—per-connection overhead from the runtime. Every WebSocket connection triggers native allocations as the .NET runtime manages its lifecycle: native interop structures, connection-tracking state, and the many small internal malloc() calls that accumulate across hundreds of concurrent connections. Individually small, but at scale they add up—and none of them are visible to the garbage collector.

On Linux, there's a number called RSS—Resident Set Size. This is the total physical memory your process is actually using, as reported by the operating system. It includes your managed heap, all native allocations, thread stacks, and memory-mapped files. But Kubernetes doesn't just look at RSS—it uses cgroup memory accounting, which tracks RSS plus kernel memory consumed on your behalf, like TCP socket buffers. When Kubernetes decides whether to OOM-kill your pod, it looks at this combined figure, not your managed heap size. When your monitoring dashboard shows a pod at 4GB, that's cgroup memory.

The gap between managed heap and RSS is your native memory footprint. In a healthy .NET application, native memory might be 200-400MB—runtime overhead, connection pools, JIT compiler state. In our case, it was 3.5GB.

Where 3.7GB actually lived

This is the moment that divides .NET debugging into two worlds. Everything I'd done so far—dumpheap, gcroot, forced GC—only operates on the managed heap. The memory eating our pods was native memory, and it existed below the .NET runtime, in a layer most C# developers never have to think about.

I switched to LLDB—the only debugger that can see both managed and native memory in a Linux dump. And I started reading about how memory actually works on Linux. Specifically, about something called glibc.

· · ·

Week 2: The Zombie Problem

The dump analysis pointed nowhere definitive. No single native allocation was large enough to explain 3.5GB. This wasn't a classic leak—a forgotten malloc with no matching free. Every allocation had a matching deallocation. The memory was being freed. It just wasn't being returned.

To understand why, I had to understand what was actually happening on the wire. Here's our architecture:

Data Flow: Entity Change → Client

Every lead change triggered a full database read and a broadcast to connected clients. The clients only used SignalR for receiving—all data mutations went through REST APIs. And here's where the problem lived: our SignalR configuration had ClientTimeoutInterval set to five minutes.

Five minutes. On mobile devices with unstable cellular connections. With auto-reconnect set to retry every three seconds.

Think about what happens when a mobile device enters a tunnel, switches from WiFi to cellular, or just hits a dead zone. The network connection dies silently—no graceful goodbye, no notification to the server, nothing. The server doesn't know the client is gone. It keeps the connection alive, keeps pushing entity updates into the void, for up to five minutes before the keepalive timeout fires and kills the zombie.

Meanwhile, the client has already reconnected. Three seconds after losing signal, it establishes a new connection. The server is now pushing data to both—the new live connection and the old zombie. Three seconds later, if the network blips again, another zombie. Each one alive for up to five minutes.

The zombie math

# With 100 unstable clients:
KeepAlive timeout:    300 seconds (5 minutes)
Client retry:         3 seconds (linear)
Silent drop rate:     ~20-30% of disconnects

# Worst case per unstable client:
300s timeout / 3s retry = up to 100 zombies stacking up
                          before the first one even times out

# Conservative estimate across fleet:
100 clients × ~20-30 zombies at peak = 2,000-3,000 zombie connections

Each zombie held native memory that the GC couldn't touch: per-connection runtime structures (native interop state, connection-tracking metadata) and kernel TCP send buffers filling up with entity updates that would never be acknowledged. The per-connection native footprint was small—but each zombie lived for up to five minutes, and during that window its allocations acted as pins in the allocator's heap, preventing the memory around them from being returned to the OS. The managed-side transport buffers (Kestrel's System.IO.Pipelines) were pooled and reusable, but the native overhead per connection was not. None of it showed up in dotnet-dump.

The Redis backplane multiplier

Here's where running in Kubernetes added a twist. Because we have multiple pods, SignalR uses a Redis backplane to synchronise messages across them. When a pod broadcasts an entity update, it publishes to Redis, and every pod receives it and delivers it to their locally connected clients—including their zombie connections.

This means a single entity change didn't just hit the zombies on one pod. It flowed through Redis and hit zombies on every pod. Each pod maintained its own set of zombie connections, each carrying its own native overhead. And the Redis backplane itself added load—StackExchange.Redis uses managed buffers for its read/write operations, but the increased pub/sub throughput from broadcasting to zombies drove more allocation churn through the native allocator as the runtime processed each message.

With three pods running, we weren't managing one zombie army—we were managing three, all fed by the same Redis stream.

The compounding factor

Every entity change pushed more data into every zombie's kernel TCP send buffers and kept its native connection state pinned in the allocator's heap. A zombie that lived for five minutes accumulated every broadcast during that window in its kernel send buffer. With entities changing every few seconds, the kernel memory per zombie grew continuously—and the longer the zombie lived, the more normal allocation churn got interleaved around its pinned native structures.

· · ·

The Fragmentation Trap

Finding the zombies explained where the memory was going. But it didn't explain why it never came back, even after quiet periods when all zombies had been cleaned up. To understand that, I had to go one layer deeper—below .NET, below Kestrel, into the Linux memory allocator itself.

What is glibc, and why should a C# developer care?

When your .NET application runs on Linux, it doesn't talk to the operating system directly. Between the .NET runtime and the Linux kernel sits a foundational library called glibc (the GNU C Library). You've probably never thought about it, but it's there in every standard .NET Docker container image, quietly handling the low-level plumbing: file I/O, threading, networking, and—most relevant here—memory allocation.

Every time anything in your process needs memory that isn't managed by .NET's garbage collector, it ends up calling malloc(), which glibc handles. When that memory is no longer needed, free() gives it back to glibc. Your database driver, the .NET runtime's internal structures, Kestrel's native interop layer—they all go through glibc for their native memory. Think of glibc as the memory landlord for everything that isn't on the managed heap.

The specific memory allocator inside glibc is called ptmalloc2. It was designed in the early 2000s as a general-purpose allocator, and it makes certain assumptions about how applications use memory—assumptions that are reasonable for short-lived programs but fall apart for a modern ASP.NET Core server handling hundreds of WebSocket connections in a container that never restarts.

The threshold that ruined us

Every native allocation in the process—runtime interop, JSON serialization temporaries, database driver operations, connection lifecycle management—goes through malloc(). All of these are small, well under 128KB individually. That's the problem.

ptmalloc2 has an internal threshold—128KB by default—that determines how memory is allocated. This is the most important detail in this entire story:

Allocations above 128KB get their own dedicated block of memory from the OS (via a mechanism called mmap()). Think of it like renting a separate storage unit. When you're done, you hand the keys back, and the OS reclaims it immediately. Clean, simple, no residue.

Allocations below 128KB go onto the allocator's internal heap—growing regions of memory that the allocator manages internally (the main arena extends via brk(); additional thread arenas carve up their own mmap() regions). Think of it like claiming space in a shared warehouse. When you free() that memory, the allocator marks it as available for reuse. But here's the critical part: the OS doesn't get that space back. As far as the operating system is concerned, your process is still using it. It still shows up in RSS. Kubernetes still sees it.

ptmalloc2: The Allocation Threshold

Here's the critical insight: the zombie allocations themselves weren't large enough to fill 3.5GB. A few KB of native overhead per connection, times 3,000 zombies, is maybe 10-20MB of live data. But those allocations lived for five minutes each, scattered across the heap. During that window, every other native allocation in the process—JSON serialization, database reads, connection setup and teardown for healthy clients—landed in the same arenas, interleaved with zombie memory. When the zombie finally died, its freed slots left holes that fragmented the heap. The zombies weren't the contents of the 3.5GB. They were the pins that prevented the heap from ever shrinking back.

A note on what we could and couldn't observe: we didn't have malloc tracing running during the original incident—you don't instrument libc internals in a live production fire. The exact native allocation profile is reconstructed from what we could measure (RSS, managed heap size, connection counts, cgroup memory) and what we could verify (jemalloc eliminated the retention, proving ptmalloc2 fragmentation was the mechanism). The specific interplay between the runtime's internal malloc() calls and the zombie lifecycle is our best model of why it worked—built from symptoms, fixes, and the parts of the stack we could see. When you're debugging across layers you don't own—the .NET runtime, Kestrel, glibc, the kernel—that's the reality of systems work.

Arenas: why threads make it worse

ptmalloc2 has another design choice that compounded our problem. To reduce contention when multiple threads need memory at the same time, it gives each thread its own separate memory pool—called an arena. On a system with 8 CPU cores, it can create up to 64 arenas. The idea is sound: threads don't block each other when allocating.

The problem: arenas don't share free space with each other. Think of them as separate warehouses. If warehouse A has empty shelves and warehouse B is full, B can't use A's space—it has to expand instead. Kestrel uses a thread pool, and connections get handled by different threads during their lifetime. A zombie connection might have its native structures allocated in arena 1, later get served by a thread using arena 3, and when it dies, leave holes in both arenas that neither can share with the other.

The allocation pattern looked like this:

# Thread pool thread 1 → arena 1
Handles connection A, native allocations: interop + state + metadata
Handles connection B, native allocations: interop + state + metadata

# Connection A goes zombie — lives for minutes, pinning its allocations
# Meanwhile, normal churn (serialization, DB, HTTP) fills around it
Arena 1:  [churn] [A: pinned] [churn] [B: alive] [churn]

# Connection A finally times out and is freed
Arena 1:  [churn] [hole] [churn] [B] [churn]

# Repeat thousands of times across dozens of arenas
# → Heap only grows. RSS never returns.

The zombie connection lifecycle was the worst possible pattern for this allocator. Long-lived zombies (up to five minutes) meant their memory got deeply interleaved with memory from healthy connections. When the zombie finally died, the freed memory was Swiss cheese—scattered holes across multiple arenas, reusable for same-sized-or-smaller requests, but never returnable to the OS.

And there's one final detail that seals the trap. The heap can only shrink from the top. Imagine a stack of boxes—you can only remove boxes from the top. If the topmost box is still in use, every box below it is trapped, even if they're all empty. With hundreds of interleaved connection lifetimes across dozens of arenas, there was always something alive near the top of each arena's heap. The entire region below it—potentially hundreds of megabytes of free holes—remained locked in place, counted as "used memory" by the OS and by Kubernetes.

This is what makes it feel like a memory leak even though technically every malloc() had a matching free(). The memory was freed from the application's perspective. But the operating system never got it back.

· · ·

The Layer Below That

The allocator fragmentation explained why freed memory wasn't returned to the OS. But there was yet another layer of native memory accumulation that was even more invisible: TCP send buffers managed by the Linux kernel.

When your application sends data over a network socket, the data doesn't go directly onto the wire. It passes through several buffers along the way. In our case: the serialised entity data went from .NET into Kestrel's I/O pipe buffers, then down into a buffer that the Linux kernel manages for each TCP socket. The kernel is responsible for actually putting bytes on the wire, retransmitting if packets are lost, and waiting for acknowledgements from the other end.

When we pushed entity updates to a zombie connection, the kernel dutifully tried to send them. No acknowledgement came back—the client was gone. TCP doesn't give up easily. It retransmits with exponential backoff, doubling the wait between each attempt. During this entire retry window, the kernel holds onto the data because it might need to resend it.

These buffers live in kernel space—memory managed by the OS itself, completely outside your process's address space. No application-level diagnostic tool can see them. dotnet-dump can't see them. LLDB can't see them. pmap can't see them—they don't exist in the process's virtual address space at all. But the kernel's cgroup memory accounting tracks them. Kubernetes sees them. The OOM-killer sees them.

With a five-minute keepalive timeout, the kernel would hold each zombie's send buffer for that entire window. These buffers typically range from 64KB to 4MB per socket, depending on how much data is queued and how the system is configured.

# Estimated kernel memory from zombies:
2,000 zombie connections × ~1MB TCP send buffer = ~2GB in kernel space
# Completely invisible to any application-level diagnostic

· · ·

Why Kubernetes Made It Worse

In Kubernetes, each pod runs inside an isolation boundary (called a cgroup) that the Linux kernel uses to track and limit resource usage per container. The memory metric that Kubernetes watches includes everything: your managed heap, the fragmented native heap that the allocator was hoarding, and all the kernel TCP buffers for zombie sockets. It doesn't distinguish between "memory your code is actively using" and "memory the allocator is holding onto but not using."

We use the Horizontal Pod Autoscaler (HPA)—a Kubernetes feature that automatically adds or removes pod replicas based on resource metrics. When pods hit high memory, HPA scaled up: more pods, more capacity, more cost. But here's the twist—each new pod that joined the cluster immediately started receiving broadcasts from the Redis backplane and accepting reconnections from mobile clients. Within minutes, the new pod had its own zombie army. HPA was scaling up to handle a memory problem, and each new pod it launched made the memory problem worse.

But when the load calmed down, the pods' RSS never dropped. The managed heap shrank—the garbage collector did its job. Kernel TCP buffers released as zombie sockets closed. But the fragmented native heap, still held by the allocator, kept the RSS high. Memory that had been genuinely in use during peak load left behind a landscape of empty holes that the OS couldn't reclaim. HPA saw pods still consuming far more than the 600MB they actually needed, and never scaled down.

We were paying for ghost memory—empty arena fragments that the OS couldn't reclaim, that the GC couldn't touch, and that Kubernetes couldn't see through. Pods that should have been running at 600MB were bloated with fragmented native heap, and HPA dutifully kept them all alive.

The full cascade

Unstable mobile connections (our own office basement) → zombie SignalR connections living for minutes → Redis backplane broadcasting lead updates to zombies on every pod → per-connection native allocations landing on the shared heap instead of getting dedicated memory → allocator fragmentation across thread pool threads → kernel TCP buffers for dead sockets → cgroup memory grows to 4GB → OOM-kill → pod restart → clients auto-reconnect in 3 seconds → HPA scales up → new pods inherit the same zombie army → repeat.

· · ·

Week 3: Fixing Every Layer

There was no single fix. The problem existed at every layer, and each layer needed its own solution.

Layer 1: Kill the zombies faster

The most impactful change was reducing the SignalR timeout from five minutes to ten seconds. This single change reduced the maximum zombie lifetime by 97%.

Setting	MS Default	Before	After	Why
ClientTimeoutInterval	30 sec	5 minutes	10 seconds	Kill zombies in seconds, not minutes
KeepAliveInterval	15 sec	10 seconds	7 seconds	Below Microsoft's recommended 2:1 — aggressive, for fastest dead-client detection
MaximumReceiveMessageSize	32 KB	128 KB	32 KB	Matched actual 99th percentile payload
StatefulReconnectBufferSize	~98 KB	50 KB	0	Clients refetch on reconnect anyway

var hubConfig = services.AddSignalR(options =>
{
    options.ClientTimeoutInterval = TimeSpan.FromSeconds(10);
    options.KeepAliveInterval = TimeSpan.FromSeconds(7);
    options.MaximumReceiveMessageSize = 32 * 1024;
    options.HandshakeTimeout = TimeSpan.FromSeconds(15);
    options.StreamBufferCapacity = 10;
    options.MaximumParallelInvocationsPerClient = 1;
    options.StatefulReconnectBufferSize = 0;
});

Our "before" values weren't accidental. We set ClientTimeoutInterval to five minutes to be generous with mobile clients on unreliable cellular networks—Microsoft's documentation recommends allowing time for pings to arrive over high-latency connections, and we erred on the side of tolerance. MaximumReceiveMessageSize was raised to 128 KB as headroom for client-to-hub commands (JoinGroup, GetData, and similar pull requests)—generous for payloads that rarely exceed a few hundred bytes. Both were reasonable choices on paper. On unstable mobile networks with silent disconnects, the timeout was catastrophic—and the oversized receive buffer meant each zombie's inbound pipe reserved more memory than it ever needed.

The StatefulReconnectBufferSize deserves a note. This .NET 8 feature retains a per-connection buffer so clients can resume after brief disconnects without missing messages. Sounds perfect for unstable connections—except 60% of our disconnects were network switches that create entirely new TCP connections. The buffer couldn't help with those. We were spending ~10-17MB on a feature that had zero benefit for our workload.

Layer 2: Reduce the blast radius

Our ALB (Application Load Balancer) had an idle timeout of 3,600 seconds—one hour, raised from the AWS default of 60 seconds to prevent the load balancer from closing long-lived WebSocket connections. Idle HTTP connections stayed open for an hour, each holding native memory. We dropped it back to 60 seconds and aligned Kestrel's keepalive timeout to 70 seconds (always slightly above the ALB to avoid 502 errors).

# Kestrel (defaults: KeepAliveTimeout=130s, MaxConcurrentConnections=unlimited,
#          MaxConcurrentUpgradedConnections=unlimited)
options.Limits.KeepAliveTimeout = TimeSpan.FromSeconds(70);
options.Limits.MaxConcurrentConnections = 550;
options.Limits.MaxConcurrentUpgradedConnections = 350;

# ALB (default idle timeout is 60s; ours was set to 3600s for WebSocket longevity)
Connection idle timeout: 60
HTTP client keepalive:   65

Layer 3: Fix the streaming pipeline

Our SystemWaStreamNotifier was processing entity changes every two seconds via a timer-driven queue. But without deduplication, the same entity could be queued and processed multiple times during rapid updates. More processing meant more database reads, more serialization, more data pushed into zombie buffers.

We added global deduplication using a ConcurrentDictionary. If entity App123 was already pending, subsequent updates just refreshed the timestamp—no new queue item. Processing count dropped by 60-80% during burst traffic.

Layer 4: Replace the allocator

All the fixes above reduced the rate of native memory accumulation. But glibc's allocator fragmentation is structural—it's baked into how ptmalloc2 works. Even with shorter zombie lifetimes, the heap would still fragment over days and weeks. Just slower.

The definitive fix for the fragmentation layer was replacing glibc's default allocator entirely. One line in our Dockerfile:

FROM mcr.microsoft.com/dotnet/aspnet:8.0

RUN apt-get update && \
    apt-get install -y libjemalloc2 && \
    rm -rf /var/lib/apt/lists/*

ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2

LD_PRELOAD is a Linux environment variable that tells the system to load a shared library before all others. By pointing it at jemalloc's library, every call to malloc() and free() in the entire process—from .NET's runtime to MySQL's driver to every native library—goes through jemalloc instead of glibc's default allocator. No code changes, no recompilation. The application doesn't even know it's using a different allocator.

jemalloc was built for exactly this workload—long-running servers with concurrent allocations of varying sizes across multiple threads. It's what Redis, Meta, and Firefox use in production. Instead of per-thread arenas with fragmented free lists, jemalloc organises memory into size-class bins (like pre-sorted shelves for different-sized items) and aggressively tells the OS to reclaim unused pages. It does this through a kernel call (madvise) that essentially says: "I'm done with this memory page—take it back, but keep the address reservation in case I need it again." The result: freed memory actually reduces your RSS.

Behaviour	ptmalloc2 (glibc default)	jemalloc
Thread isolation	Per-thread arenas, no sharing	Shared size-class bins
Return to OS	Only from top of heap	Aggressively returns unused pages
Varying allocation sizes	Fragmentation (Swiss cheese)	Size-class bins minimise waste
Long-running servers	RSS grows, never returns	RSS stays bounded

A note on jemalloc's tradeoffs

jemalloc is not magic dust. Its aggressive page purging uses slightly more CPU than glibc's allocator—the kernel has to do more work reclaiming and re-mapping pages. Some monitoring tools may report confusing memory profiles because jemalloc's internal bookkeeping differs from what tools expect. And it's an extra dependency: you need to install it in your Docker image and keep it updated. Note too that jemalloc's upstream development pace has slowed since Meta reduced its investment, though stable branches continue to receive fixes and it remains the default allocator in FreeBSD. For our workload the tradeoff was obvious—we were burning gigabytes of RAM to save a few CPU cycles on page management—but you should always benchmark with your own traffic before committing.

Why not another allocator?

jemalloc wasn't the only option. We considered two alternatives but didn't pursue either:

gperftools tcmalloc (Google) — optimised for fast small-object allocations with per-thread caches. Strong at moderate thread counts, but its central free lists and page heap are unsharded, which can become a bottleneck under heavy cross-thread allocation patterns. Not the profile we wanted for a thread-pool-heavy ASP.NET Core server with high connection churn.

mimalloc (Microsoft Research) — a compact, high-performance allocator with excellent benchmark numbers. However, its v2 design keeps per-thread memory pools and is deliberately not eager to share memory between threads—memory freed by one thread stays reserved for that thread. In a thread pool with variable workload patterns, that's a known path to memory accumulation. Great for raw allocation speed; wrong profile for our memory-constrained containers.

jemalloc won on pedigree and fit. It originated as FreeBSD's libc allocator in 2005—making it the oldest battle-tested alternative in the lot—and has been hardened under extreme production loads at Redis, Meta, and Firefox ever since. Its specific design choice sealed it for us: multiple arenas with cross-thread sharing, size-class bins that minimise fragmentation, and aggressive return of unused pages to the OS via madvise. For a long-running server with high connection churn, variable message sizes, and strict container memory limits, that combination is exactly what we needed.

· · ·

After

~600 MB

Steady-state RSS

Stable

Over 24+ hours

OOM-kills

Yes

HPA scales down

Pods stabilised under 1GB. No more OOM-kills. HPA started scaling down for the first time. The midday restarts stopped.

Remember—this was a pre-launch pilot. 200 devices, half of them our own fulfilment team struggling with signal in the office basement. We had been blocked from launching for three weeks while this issue ate every pod we threw at it. The week after the fix landed, we went to full production rollout. The system that couldn't survive 200 pilot users now runs at scale without a single OOM-kill.

Three weeks of deploying fixes by night and watching them fail by day—for a bug that no .NET diagnostic tool could see. But those three weeks taught me more about how software actually runs on Linux than the previous several years of writing C# ever did.

Why production never went down

Three weeks of pods dying every few hours, and not a single customer-facing outage. That wasn't luck—it was three layers of defence we'd built before we understood the root cause: a readiness probe that pulled pods off the load balancer at 80% memory (while keeping them alive for existing connections), a liveness probe that forced a controlled restart if memory stayed above 93% for too long, and a SmartMemoryTrimService that aggressively ran GC.Collect with LOH compaction to keep the managed heap in check. The key insight: by the time a pod approached OOM, it was already off production traffic—only serving zombie connections. The OOM-kill, when it happened, was invisible to customers.

The side effect was cost. Pods that were alive but not ready kept HPA at maximum replicas. We were paying for stability with infrastructure spend—exactly the trade-off you want while you're still hunting the root cause.

· · ·

What Three Weeks of Lost Sleep Taught Me

Most .NET developers have never needed to think about native memory. The managed heap, the garbage collector, IDisposable—these abstractions are so good that you can build an entire career without knowing how the OS allocates memory beneath your runtime. Until you run on Linux containers with WebSocket connections to unstable mobile devices, and suddenly the abstraction leaks.

dotnet-dump can only see half the picture. On Linux, if your RSS is growing but your managed heap is stable, the leak is in native memory—and dotnet-dump is blind to it. You need LLDB for analysing dumps and pmap on the live process to see memory regions. A quick diagnostic: run dotnet-counters and watch two numbers—GC Heap Size and Working Set. If Working Set keeps climbing while GC Heap stays flat, you have a native memory problem. The .NET tools will tell you everything is fine while Kubernetes is preparing to kill your pod.

The default Linux memory allocator was not designed for your workload. glibc's ptmalloc2 is a general-purpose allocator from an era before containers, before WebSockets, before thread pools handling hundreds of concurrent connections in long-running processes. It assumes that most applications are short-lived, that memory usage patterns are predictable, and that the heap can grow freely because virtual memory is abundant. In a container with a 4GB memory limit running a server that never restarts, every one of those assumptions fails. If you're running a long-lived ASP.NET Core service on Linux, adding LD_PRELOAD=libjemalloc.so to your Dockerfile should be as routine as setting DOTNET_gcServer=1. And this isn't specific to .NET—Rust, Node.js, Python, and any language that uses malloc under the hood can hit the same wall on glibc, to varying degrees.

SignalR's timeout defaults are built for stable networks—and raising them for mobile tolerance makes it worse. We set ClientTimeoutInterval to five minutes thinking it would help unreliable clients. It created a zombie army instead. If your clients are on mobile, go lower than the defaults, not higher. Set ClientTimeoutInterval aggressively. Your clients are already handling reconnection.

The 128KB boundary matters. On Linux, the default allocator treats allocations below 128KB very differently from those above it. In a long-running server, every native allocation—runtime interop, serialization, connection management—lands below this threshold on a shared heap that never shrinks. Long-lived connections act as fragmentation anchors that prevent the heap from being trimmed, even after those connections close.

AI is a force multiplier, not a replacement for engineering judgement. I used Claude extensively—as a research workhorse, not an oracle. It explained ptmalloc2 arena mechanics, estimated zombie memory overhead, helped me reason through buffer configuration tradeoffs, and surfaced allocator alternatives I hadn't considered. But it got things wrong too. I spent as much time challenging its hypotheses as accepting them—correcting memory estimates, disproving buffer assumptions, and forcing it to reconcile with what production metrics actually showed. The direction came from understanding the system. The acceleration came from a research partner that could context-switch from kernel internals to SignalR semantics in the same conversation.

The hardest bugs aren't in your code. They're in the assumptions your platform makes about how your code will behave.

The Timeline

Week 1

Chased the managed heap

Found EF Core cache issue, fixed it. Managed heap dropped. RSS didn't move. Forced GC. Nothing. Realised 95% of memory was native.

Week 2

Discovered the zombie army

Mapped SignalR connection lifecycle. Found 5-minute timeout + 3-second retry creating thousands of overlapping zombie connections. Traced per-connection native overhead and kernel TCP send buffers as the memory sinks.

Week 2-3

Went below the runtime

Learned how the Linux memory allocator works under the hood. Discovered per-connection native allocations were landing on a heap that never returns memory to the OS. Understood how thread-pool fragmentation across arenas compounded the problem. Finally explained why freed memory never came back.

Week 3

Fixed every layer

SignalR timeouts (5min → 10s), ALB timeout (3600s → 60s), stream deduplication, and jemalloc. Pods stabilised under 1GB. OOM-kills stopped. HPA finally scaled down.

If you're running ASP.NET Core on Linux containers with real-time connections, check your native memory. Run dotnet-counters and compare GC Heap Size against Working Set. If Working Set keeps growing while GC Heap stays stable, the leak is in native memory—outside the garbage collector's reach. The fix starts with understanding where your memory actually lives.

#dotnet #aspnetcore #signalr #kubernetes #linux #memory #performance #devops #fintech #claude