Rust compiler optimization

Dead Code Elimination (DCE) Isn't Perfect (or happens late):
- Ideal Scenario: You'd expect the compiler's DCE pass to recognize that the Deserialize implementations generated by the derive macro are never called and completely eliminate them from the final binary anyway. If DCE worked perfectly and early, removing the derive manually shouldn't change the size much, as the code was already being discarded.
- Reality: The analysis for DCE, especially across crates and with complex features like generics and macros (which serde uses heavily), can be intricate. Sometimes, code might appear reachable during earlier compilation stages or might have subtle linkages that prevent easy removal until LTO.
The Butterfly Effect of Optimization Heuristics:
- Compilers use complex heuristics to decide how to optimize code: when to inline functions, how to allocate registers, how to arrange code blocks for cache efficiency, etc.
- Removing the #[derive(Deserialize)] changes the input to these optimization passes. Even though the derived code itself was unused in the final program logic, its mere presence during compilation could have subtly influenced the optimizer's decisions about other, unrelated but used code.
- Example: Perhaps the presence of the Deserialize code caused the compiler to make a specific inlining decision elsewhere that turned out to be slightly more size-efficient. When you removed the derive, the optimizer, facing slightly different input, made a different inlining choice for some other function, and that new choice resulted in slightly larger generated code for that other function.
Link Time Optimization (LTO):
- LTO performs optimizations across the entire program after individual crates are compiled to an intermediate representation (like LLVM IR).
- Removing the derive changes the total pool of code and metadata fed into LTO. LTO might make different global decisions about function placement, outlining, or other whole-program optimizations based on this slightly different input, leading to a marginally larger binary. It's like removing one small piece changes how the rest of the puzzle fits together globally.
Code Layout and Alignment:
- The exact layout of functions and data in the final binary can affect size due to alignment requirements and padding. Removing some code changes the layout of everything that follows it, potentially leading to slightly different padding overall, which could increase the size. 752 bytes (the difference you saw) is small enough that alignment/padding changes could plausibly contribute.

Conclusion:

You're right, it's not intuitive! It's a prime example that compiler optimization is not a simple process of "less source code = smaller binary." It's a complex interplay of analysis, transformation, and heuristics. Removing code can change the optimization landscape in subtle ways that cause the compiler to generate slightly less optimal code for other parts of the program, leading to a net increase in size.

While frustrating, it often means the compiler was already doing a decent job removing the unused code via DCE/LTO in the first build, and the slight increase is just a side effect of perturbing the delicate balance of optimization decisions. It doesn't necessarily mean the compiler is "bad," just that achieving the absolute minimum size is incredibly hard and subject to these kinds of non-local effects.

--- further asking ---

Monomorphization Effects (If Generics Are Involved):
- If your struct or enum is generic (e.g., MyStruct<T>), the #[derive(Deserialize)] generates a generic implementation (impl<'de, T> Deserialize<'de> for MyStruct<T> where T: Deserialize<'de>).
- Even if your code never explicitly calls deserialize on MyStruct<SpecificType>, the potential for this implementation to exist might slightly alter how the compiler handles the monomorphization (creating concrete versions like MyStruct<i32>, MyStruct<String>) of other generic functions that use MyStruct<T>.
- Removing the derive removes this generic impl. This changes the "landscape" of implementations associated with MyStruct<T>, potentially causing the compiler to make different choices about code sharing, inlining, or specialization for other code related to MyStruct<T>, sometimes resulting in slightly larger code for those other functions.
Trait Implementation Caching/Lookup:
- The compiler maintains tables and structures internally to quickly find trait implementations for types.
- The presence of the Deserialize implementation adds entries to these structures. Removing it changes them.
- While seemingly minor, this change could subtly influence the compiler's internal state during optimization passes that query trait implementations, perhaps leading to slightly different optimization paths being taken for unrelated code that does involve trait lookups (though typically not this specific trait). It's a long shot, but internal compiler data structures can have surprising impacts.
Interaction with Other Proc Macros:
- Are there other procedural macros operating on the same struct or enum? Proc macros often run sequentially or in a specific order.
- Removing #[derive(Deserialize)] changes the Abstract Syntax Tree (AST) that subsequent proc macros see. It's possible, though perhaps unlikely, that another macro generates slightly different (and larger) code based on the absence of the Deserialize derive or the code it would have generated.
LLVM Optimization Quirks:
- Rust lowers code to LLVM Intermediate Representation (IR), and LLVM performs most of the heavy lifting for optimization.
- LLVM itself has myriad optimization passes with complex interactions. The specific pattern of LLVM IR generated by rustc with the Deserialize impl (even if unused and later DCE'd by LLVM) might enable or disable certain LLVM optimizations, or cause them to behave differently, compared to the IR generated without the Deserialize impl.
- A classic example is vectorization: the presence of certain code patterns, even if unused, might inhibit or enable the vectorizer for nearby code, affecting performance and size. Removing the derive changes the patterns LLVM sees.
Linker Behavior Details:
- COMDAT Folding: Linkers try to merge identical functions or data sections (COMDATs). Removing the derived code changes the set of sections the linker sees. This might alter how other, unrelated identical sections are folded or laid out, potentially increasing padding or changing relative jump/call offsets slightly.
- Symbol Table / Relocations: Removing the symbols associated with the Deserialize impl changes the symbol table and potentially the number or type of relocations needed. While the code is unused, its symbols exist until LTO/linking. This change might subtly influence the linker's layout decisions or metadata size.
Build Profile / Toolchain Version Specifics:
- Different optimization levels (opt-level = 'z' vs 's' vs 3), LTO settings (fat, thin, off), or codegen-units settings can drastically change compiler/linker behavior. The effect you're seeing might be specific to your exact build profile.
- It could even be a minor, specific behavior or regression in the exact version of rustc or LLVM you are using. Sometimes these things get fixed or change in later releases.

In essence, the generated binary is the result of a highly complex, non-linear process. Changing any input, even seemingly irrelevant code, can ripple through the system and lead to unexpected variations in the output size due to the intricate heuristics and interdependencies within the compiler and linker. Your observation is a testament to this complexity!

bagbag/optimization.md

Select an option

No results found

Select an option

No results found