March 24, 2026

eager to remove async_trait?

introduction

rust 1.75 shipped native async fn in traits in december 2023, and the ecosystem collectively exhaled. the async_trait crate had been the workaround since 2019, and it was always framed as a workaround, a temporary bridge until the language caught up. now that the bridge is no longer necessary, i’ve watched PRs roll in across dozens of crates removing async_trait as if it were technical debt. sometimes it is. sometimes removing it introduces problems that the crate was quietly solving for you.

while working on a service mesh library last year, i spent a few weeks migrating async trait boundaries and came away with a more nuanced view than “native good, macro bad.” the boxing overhead that everyone fixates on is real, but it’s measured in nanoseconds, and whether those nanoseconds matter depends entirely on where your trait sits in the call stack.

what async_trait actually does

the crate is a proc macro that transforms async methods into regular methods returning Pin<Box<dyn Future + Send + 'async_trait>>. when you write:

#[async_trait]
trait Storage {
    async fn get(&self, key: &str) -> Option<Vec<u8>>;
}

the macro rewrites it to roughly:

trait Storage {
    fn get<'async_trait>(
        &'async_trait self,
        key: &'async_trait str,
    ) -> Pin<Box<dyn Future<Output = Option<Vec<u8>>> + Send + 'async_trait>>
    where
        Self: Sync + 'async_trait;
}

three things happen here: a heap allocation via Box::pin, dynamic dispatch through dyn Future, and an implicit + Send bound on the returned future. the framing of these as pure overhead misses the point, because each one also provides a capability that native async traits currently lack.

the overhead everyone talks about

let’s put actual numbers on it. a Box::pin call costs roughly 20-30 ns per invocation for small futures with a warm allocator. that includes the malloc, the memcpy of the future state machine onto the heap, vtable setup, and the eventual free on drop. under adversarial conditions (i.e large futures, cold allocator, deeply nested call chains), benchmarks from the async working group showed costs up to 1.2 us per call.

sounds bad until you contextualize it against the operations these futures actually wrap. a single TCP round-trip is around 500,000 ns, a database query takes 1-10 ms, and if your async trait method is get_user_by_id hitting postgres, the boxing overhead represents 0.003% of the total call time, which is well below what any profiler will surface as a bottleneck.

that said, if your trait is poll_next_packet in a network stack processing millions of packets per second, 30 ns per call accumulates to 30 ms per million packets. the async working group explicitly noted that the boxing overhead was “not good enough for writing a network stack capable of sustaining gigabit-level throughput.” context matters.

where native async traits win

for purely generic usage (i.e you never put dyn in front of the trait), native async fn in traits is strictly better. the compiler desugars async fn method(&self) -> T into a return-position impl Trait backed by an anonymous generic associated type. each impl gets its own concrete future type, sized exactly, inlined, stack-allocated. no heap, no vtable, no indirection.

axum 0.8 migrated FromRequestParts and FromRequest away from async_trait, and the results were measurable: P99 latency for single-connection scenarios dropped by approximately 50%. axum’s extractors are called on every request, often multiple times per handler, so the per-call savings compounded across the entire request lifecycle, which is exactly the kind of hot-path scenario where the migration pays for itself.

embedded and no_std environments benefit even more. the async_trait crate requires alloc for Box, which is unavailable in many embedded contexts. embassy never used async_trait at all; it statically allocates task futures at compile time. native async traits are the only viable path here.

compile times improve too. removing a proc macro dependency eliminates one layer of syn/quote expansion for every annotated trait and impl block. in a crate with 40+ async trait impls, that adds up. i noticed a 12% reduction in incremental build times after removing async_trait from one of my internal crates, though the absolute savings were only about 3 seconds on a 25-second build.

where removing it breaks things

dyn compatibility

native async traits are not object-safe. you cannot write Box<dyn MyAsyncTrait> if MyAsyncTrait has native async methods. the compiler rejects it because each impl produces a different anonymous future type, and there’s no single type to erase behind dyn. niko matsakis has an entire blog series on solving this, spanning 2021 through 2025, and there is still no stabilized language-level solution.

i found this out the hard way when i removed async_trait from a storage backend trait that had five implementations behind a dyn pointer. the compiler immediately rejected every call site. if your codebase uses dyn Trait with async methods (i.e plugin systems, strategy patterns, heterogeneous collections of backends), removing async_trait means you now have to box the futures yourself. you end up writing the same Pin<Box<dyn Future>> return types that the macro was generating, except now it’s manual and scattered across every method signature, which is strictly worse ergonomics for identical runtime behavior.

the Send bound problem

async_trait adds + Send to the returned future by default. native async traits don’t. if you try to spawn a task using a native async trait method:

trait MyService {
    async fn call(&self) -> Response;
}

fn spawn_task(svc: impl MyService + Send + 'static) {
    tokio::spawn(async move {
        svc.call().await // ERROR: future may not be Send
    });
}

this fails because the compiler can’t prove the opaque future type is Send. the trait-variant crate from the rust-lang org provides a workaround via #[trait_variant::make(SendMyService: Send)], which generates a parallel trait with Send bounds. it works, but it’s another proc macro, which undercuts the “removing proc macros” motivation. the return_type_notation feature that would solve this natively is still unstable as of march 2026.

a practical migration attempt confirmed this: removing async_trait from axum handlers broke specifically because of missing Send bounds. the author concluded that “sticking with the async-trait macros is still the simplest option” for most application code.

testing and mocking

many rust testing patterns rely on dyn Trait for dependency injection. mockall with #[automock] generates mock structs that implement the trait. this works with async_trait because the boxed return type makes the trait dyn-safe. remove the attribute, and mockall can no longer generate mocks for your async trait. you either rewrite your test infrastructure around concrete types or manually reintroduce the boxing you were trying to eliminate, and in practice most teams choose the latter because rewriting test harnesses mid-migration is a recipe for regressions.

the decision framework

i started the migration thinking i’d remove async_trait from everything. by the end, i’d put it back in roughly half the traits i’d touched. the codebase had 23 async traits total; 11 were used via dyn somewhere, and 4 more were mocked in tests. i ended up with a simple heuristic:

remove async_trait when the trait is used only with concrete generic types (never dyn), sits on a hot path where per-call nanoseconds matter, targets no_std or embedded, or the trait is internal to your crate and you control all impls.

keep async_trait when you need dyn Trait anywhere, your trait is part of a public API where callers might need object safety, you use tokio::spawn or similar with the trait’s futures, your test suite relies on mocking the trait, or the trait is on a cold path doing network or disk io where 30 ns is irrelevant.

tower’s Service trait is the canonical example of a trait that cannot migrate yet. it needs dyn safety, Send bounds, and broad ecosystem compatibility. the async working group’s Send bound problem writeup explicitly calls out tower as blocked.

what i’d recommend

for most application code (i.e HTTP handlers, database access layers, business logic services), the boxing overhead of async_trait is noise. your bottleneck is the network, the database, or the disk. removing async_trait from these paths saves you nothing measurable and potentially costs you dyn compatibility, Send bounds, and testability.

for library code on hot paths, profile first. if Box::pin shows up in your flamegraph, migrate. if it doesn’t, you have better things to optimize. jkarneges’ benchmarks showed that with real syscalls in the loop, boxed async trait methods were only 1.3% slower than native ones. that’s within measurement noise for most applications.

the ecosystem is moving toward native async traits, and that direction is correct, but ecosystem momentum alone is a poor justification for migrating your own code. async_trait is battle-tested, well-understood, and still the most ergonomic option when you need dyn compatibility or Send bounds. removing it prematurely optimizes for a cost you probably haven’t measured while simultaneously giving up capabilities you probably depend on.

the language will eventually solve the dyn async trait and Send bound problems. RFC 3245 and the ongoing work on return_type_notation are steps in that direction. when those land, migration will be painless and complete. until then, reaching for async_trait on cold-path, dyn-heavy, or test-mocked traits reflects a mature understanding of what the language can and cannot do today.

Thanks for reading!