Skip to content

Instantly share code, notes, and snippets.

@scc-tw
Last active November 30, 2025 07:48
Show Gist options
  • Select an option

  • Save scc-tw/9a221e01aa1ca2ba747a09c072384866 to your computer and use it in GitHub Desktop.

Select an option

Save scc-tw/9a221e01aa1ca2ba747a09c072384866 to your computer and use it in GitHub Desktop.
Ranges compile time analysis
// big_loop.cpp
#include <vector>
// 跟 ranges 版對齊的 functor
template<int N>
struct Mul {
int operator()(int x) const {
return x * (N + 1);
}
};
template<int N>
struct Add {
int operator()(int x) const {
return x + (N % 7);
}
};
template<int N>
struct PredGreater {
bool operator()(int x) const {
return x > (100 + N);
}
};
template<int N>
int heavy_loop_N(const std::vector<int>& v) {
int sum = 0;
Mul<N> mul;
Add<N> add;
PredGreater<N> pred;
for (int x : v) {
if (x % 2 != 0) continue;
const int a = mul(x);
const int b = add(a);
if (!pred(b)) continue;
sum += b;
}
return sum;
}
int use_loop_big(const std::vector<int>& v) {
int r = 0;
for (int i = 0; i < 200; ++i) {
switch (i % 4) {
case 0: r += heavy_loop_N<0>(v); break;
case 1: r += heavy_loop_N<1>(v); break;
case 2: r += heavy_loop_N<2>(v); break;
case 3: r += heavy_loop_N<3>(v); break;
}
}
return r;
}
// big_ranges.cpp
#include <vector>
#include <ranges>
// 給不同 N 產生不同型別的 functor,避免 compiler 重用 instantiation
template<int N>
struct Mul {
int operator()(int x) const {
return x * (N + 1);
}
};
template<int N>
struct Add {
int operator()(int x) const {
return x + (N % 7);
}
};
template<int N>
struct PredGreater {
bool operator()(int x) const {
return x > (100 + N);
}
};
// 這裡的 pipeline type 會隨 N 改變,產生大量 template instantiation
template<int N>
int heavy_ranges_N(const std::vector<int>& v) {
auto rng =
v
| std::views::filter([](int x) { return x % 2 == 0; })
| std::views::transform(Mul<N>{})
| std::views::transform(Add<N>{})
| std::views::filter(PredGreater<N>{});
int sum = 0;
for (int x : rng) {
sum += x;
}
return sum;
}
// 實際呼叫很多不同 N,放大 instantiation 數量
int use_ranges_big(const std::vector<int>& v) {
int r = 0;
// N 的上限可以自行調高,例如 100, 200, 500 …
for (int i = 0; i < 200; ++i) {
// 讓編譯器真的看到不同 N
switch (i % 4) {
case 0: r += heavy_ranges_N<0>(v); break;
case 1: r += heavy_ranges_N<1>(v); break;
case 2: r += heavy_ranges_N<2>(v); break;
case 3: r += heavy_ranges_N<3>(v); break;
}
}
return r;
}
// multi_ranges.cpp - Multiple independent pipelines
#include <vector>
#include <ranges>
int pipe1(const std::vector<int>& v) {
int sum = 0;
for (int x : v | std::views::filter([](int x) { return x % 2 == 0; })
| std::views::transform([](int x) { return x * 2; }))
sum += x;
return sum;
}
int pipe2(const std::vector<int>& v) {
int sum = 0;
for (int x : v | std::views::filter([](int x) { return x > 10; })
| std::views::transform([](int x) { return x + 1; }))
sum += x;
return sum;
}
int pipe3(const std::vector<int>& v) {
int sum = 0;
for (int x : v | std::views::filter([](int x) { return x < 100; })
| std::views::transform([](int x) { return x * 3; }))
sum += x;
return sum;
}
int pipe4(const std::vector<int>& v) {
int sum = 0;
for (int x : v | std::views::filter([](int x) { return x != 0; })
| std::views::transform([](int x) { return x - 1; }))
sum += x;
return sum;
}
int multi(const std::vector<int>& v) {
return pipe1(v) + pipe2(v) + pipe3(v) + pipe4(v);
}

Platform Specification

Test environment for std::ranges compile-time analysis.


Operating System

Property Value
OS Arch Linux (rolling)
Kernel 6.17.9-arch1-1
Architecture x86_64
Kernel Config SMP PREEMPT_DYNAMIC
Build Date Mon, 24 Nov 2025

CPU

Property Value
Model 13th Gen Intel Core i5-13500
Architecture x86_64
Vendor GenuineIntel
Sockets 1
Cores per Socket 14 (6 P-cores + 8 E-cores)
Threads per Core 2
Total Threads 20
Base Frequency 800 MHz
Max Turbo 4800 MHz

Memory

Property Value
Total RAM 32 GB
Type DDR4/DDR5 (system dependent)
Available ~26 GB (at test time)

Compiler Toolchain

Primary Compiler

Property Value
Compiler Clang/LLVM
Version 21.1.6
Target Triple x86_64-pc-linux-gnu
Thread Model posix

C++ Standard Library

Property Value
Library libstdc++ (GNU)
Version 15.2.1
GCC Version 15.2.1 20251112
Header Path /usr/include/c++/15.2.1/

Compiler Options Used

Build Command

clang++ -std=c++20 -O2 -ftime-trace -c <file>.cpp -o <file>.o

Flag Breakdown

Flag Purpose
-std=c++20 Enable C++20 standard (required for <ranges>)
-O2 Optimization level 2 (production-like optimization)
-ftime-trace Generate Chrome trace JSON for compile-time analysis
-c Compile only (no linking)

Why These Options?

  1. -std=c++20: Required for std::ranges, std::views, and C++20 concepts
  2. -O2: Representative of real-world builds; enables inlining and optimizations that affect ranges performance
  3. -ftime-trace: Clang's built-in profiler for compile-time analysis; outputs JSON in Chrome Trace Format

Environment Variables

# No special environment variables set for compilation
# Using system defaults

File System

Property Value
Test Directory /tmp/ranges/
Filesystem tmpfs (RAM-backed)
I/O Impact Minimal (memory-speed I/O)

Versions Summary

OS:        Arch Linux (kernel 6.17.9)
Compiler:  clang++ 21.1.6
Stdlib:    libstdc++ 15.2.1 (GCC 15.2.1)
Standard:  C++20

Reproducibility

To reproduce this analysis:

# 1. Install dependencies (Arch Linux)
sudo pacman -S clang gcc

# 2. Create test files (see report.md Appendix B)

# 3. Compile with tracing
clang++ -std=c++20 -O2 -ftime-trace -c single_loop.cpp -o single_loop.o
clang++ -std=c++20 -O2 -ftime-trace -c single_ranges.cpp -o single_ranges.o

# 4. Analyze JSON traces
# Open .json files in chrome://tracing or use Python scripts

Notes

  • Clang 21.1.6 is a bleeding-edge version; results may vary with older compilers
  • libstdc++ 15.2.1 provides the <ranges> implementation being tested
  • Compile times are measured on a high-performance desktop; results scale with CPU speed
  • Tests run on tmpfs eliminate disk I/O as a variable

Platform specification for std::ranges compile-time analysis

// single_loop.cpp - ONE pipeline, no template variation
#include <vector>
int single_loop(const std::vector<int>& v) {
int sum = 0;
for (int x : v) {
if (x % 2 != 0) continue;
const int a = x * 2;
const int b = a + 3;
if (b <= 100) continue;
sum += b;
}
return sum;
}
// single_ranges.cpp - ONE pipeline, no template variation
#include <vector>
#include <ranges>
int single_ranges(const std::vector<int>& v) {
auto rng =
v
| std::views::filter([](int x) { return x % 2 == 0; })
| std::views::transform([](int x) { return x * 2; })
| std::views::transform([](int x) { return x + 3; })
| std::views::filter([](int x) { return x > 100; });
int sum = 0;
for (int x : rng) {
sum += x;
}
return sum;
}
@scc-tw
Copy link
Author

scc-tw commented Nov 30, 2025

not reviewed yet:

"""

Comprehensive Analysis: std::ranges Compile-Time Cost

Report Generated from clang++ -ftime-trace
Compiler: clang++ with -std=c++20 -O2
Date: 2024


Table of Contents

  1. Executive Summary
  2. Test Files Description
  3. Methodology
  4. Raw Measurements
  5. Header Analysis
  6. Template Instantiation Analysis
  7. Optimizer Pass Analysis
  8. Cost Model Derivation
  9. Conclusions & Recommendations
  10. Appendix A: Source File References
  11. Appendix B: Source Code
  12. Appendix C: Data Files

1. Executive Summary

KEY FINDING: std::ranges incurs a ~267ms FIXED compile-time tax per translation unit, with only ~5ms marginal cost per additional pipeline.

Cost Breakdown

Component Time Percentage
Header parsing 154ms 58%
Type instantiation 40ms 15%
Optimizer passes 22ms 8%
Backend codegen 22ms 8%
Other overhead 29ms 11%
Total Fixed Cost 267ms 100%

Key Metrics

Metric Value
Fixed cost (first #include <ranges>) ~267ms
Marginal cost (each additional pipeline) ~5ms
Overhead ratio (ranges/loop) 3.1x

2. Test Files Description

File Description
single_loop.cpp Single for-loop with filter/transform logic
single_ranges.cpp Single ranges pipeline: filter|transform|transform|filter
multi_ranges.cpp 4 independent ranges pipelines
big_loop.cpp Template-parameterized loops with N=0..3
big_ranges.cpp Template-parameterized ranges with N=0..3

3. Methodology

Data Collection

clang++ -std=c++20 -O2 -ftime-trace -c <file>.cpp -o <file>.o

Output: <file>.json (Chrome trace format)

Trace Event Types Analyzed

Event Type Description
Source (cat) Header file inclusion timing
ParseDeclarationOrFunctionDefinition Declaration parsing with file:line
ParseClass Class definition parsing
InstantiateClass Template class instantiation
InstantiateFunction Template function instantiation
*Pass LLVM optimization passes
Total * Aggregated totals by category

4. Raw Measurements

Compilation Times

File Total Frontend Backend Optimizer
single_loop 125.0ms 117.6ms 6.2ms 3.2ms
single_ranges 391.9ms 360.7ms 28.7ms 25.6ms
multi_ranges 407.5ms 358.2ms 46.9ms 39.4ms
big_loop 137.8ms 119.7ms 16.7ms 10.5ms
big_ranges 595.8ms 501.3ms 91.3ms 82.0ms

Derived Metrics

Metric Value
Fixed cost (<ranges> overhead) 266.9ms
Marginal cost (per pipeline) 5.2ms
Overhead ratio (ranges/loop) 3.1x

5. Header Analysis

5.1 Header Inclusion Comparison

Category Count
Common headers (both include) 36
Ranges-only headers 43

5.2 Headers Unique to <ranges>

array                  ios_base.h            span
atomicity.h            iosfwd                stdexcept
basic_string.h         iterator              stdio.h
basic_string.tcc       locale_classes.h      stdlib.h
cctype                 locale_classes.tcc    stream_iterator.h
char_traits.h          localefwd.h           streambuf
cstdio                 optional              streambuf.tcc
cstdlib                postypes.h            streambuf_iterator.h
ctype.h                pthread.h             string
cwchar                 ranges                string_conversions.h
enable_special_members.h  sched.h            string_view
exception              span                  string_view.tcc
exception_ptr.h                              system_error
gthr-default.h                               time.h
gthr.h                                       typeinfo
                                             types.h
                                             wchar.h

5.3 Header Dependency Chain

<ranges> includes (directly or transitively):

#include <ranges>
#include <iterator>
#include <concepts>
#include <compare>
#include <optional>
#include <span>
#include <tuple>
#include <string_view>
#include <string>

6. Template Instantiation Analysis

6.1 Class Instantiations - single_ranges.cpp

Duration Type
7.1ms std::reverse_iterator<std::_Bit_iterator>
5.3ms std::reverse_iterator<std::_Bit_const_iterator>
5.3ms std::ranges::filter_view<std::ranges::ref_view<const std::vector<int>>...
3.1ms std::iterator_traits<std::ranges::filter_view<std::ranges::ref_view<...
2.3ms std::ranges::filter_view<std::ranges::transform_view<std::ranges::tr...
2.0ms std::basic_string<char>
1.7ms std::basic_string<char32_t>
1.7ms std::basic_string<char16_t>
1.7ms std::basic_string<wchar_t>
1.6ms std::basic_string<char8_t>
1.4ms std::ranges::transform_view<std::ranges::filter_view<std::ranges::re...
1.4ms std::ranges::transform_view<std::ranges::transform_view<std::ranges:...
1.0ms std::vector<int>

6.2 Function Instantiations - single_ranges.cpp

Duration Type
6.8ms std::ranges::filter_view<...>::filter_view
6.0ms std::ranges::__access::_Begin::operator()<std::ranges::filter_view<...
5.8ms std::ranges::filter_view<std::ranges::ref_view<const std::vector<int>>...
3.9ms std::basic_string<wchar_t>::_M_construct<const char *>
3.3ms std::basic_string<wchar_t>::_S_copy_chars<const char *>
1.8ms std::ranges::__find_if_fn::operator()<std::ranges::transform_view<st...
1.3ms std::ranges::__find_if_fn::operator()<__gnu_cxx::__normal_iterator<c...

6.3 Ranges-Specific View Instantiations

Category Events Time
View class instantiations 12 18.2ms
View function instantiations 5 21.7ms
Total view machinery 17 39.9ms

6.4 View Type Nesting Analysis

The ranges pipeline filter | transform | transform | filter creates this nested type:

filter_view<
  transform_view<
    transform_view<
      filter_view<
        ref_view<vector<int>>,
        lambda#1                    // single_ranges.cpp:8:28
      >,
      lambda#2                      // single_ranges.cpp:9:31
    >,
    lambda#3                        // single_ranges.cpp:10:31
  >,
  lambda#4                          // single_ranges.cpp:11:28
>

Each view adapter adds another layer of template nesting.

6.5 View Instantiation Details (with exact source locations)

Duration Type Lambda Location
5.3ms filter_view<ref_view<vector<int>>, λ> single_ranges.cpp:8:28
1.4ms transform_view<filter_view<...>, λ> single_ranges.cpp:9:31
1.4ms transform_view<transform_view<...>, λ> single_ranges.cpp:10:31
2.3ms filter_view<transform_view<...>, λ> single_ranges.cpp:11:28

7. Optimizer Pass Analysis

7.1 Top Optimizer Passes - single_ranges.cpp

Duration Pass Name Detail
21.0ms ModuleInlinerWrapperPass [module]
20.9ms ModuleToPostOrderCGSCCPassAdaptor [module]
2.4ms DevirtSCCRepeatedPass (single_ranges)
2.4ms PassManager<LazyCallGraph::SCC, ...> (single_ranges)
2.3ms CodeGenPasses
1.8ms CGSCCToFunctionPassAdaptor (single_ranges)
1.8ms PassManager<Function> single_ranges
1.5ms ModuleToFunctionPassAdaptor [module]
1.4ms DevirtSCCRepeatedPass (filter_view)
1.4ms PassManager<LazyCallGraph::SCC, ...> (filter_view)

7.2 Optimizer Overhead Comparison

Metric Time
Loop optimizer time 3.2ms
Ranges optimizer time 25.6ms
Overhead +22.4ms (8.0x)

The optimizer must work harder on ranges code because:

  • More template instantiations to inline
  • Deeper call chains (view iterators)
  • More abstract types to devirtualize

8. Cost Model Derivation

8.1 Fixed Cost Analysis

Comparing single_loop.cpp vs single_ranges.cpp:

Category Loop Ranges Delta
Frontend 117.6ms 360.7ms +243.0ms
Backend 6.2ms 28.7ms +22.5ms
Optimizer 3.2ms 25.6ms +22.4ms
Source 114.9ms 269.0ms +154.1ms
InstantiateClass 16.2ms 70.8ms +54.6ms
InstantiateFunction 3.8ms 54.9ms +51.1ms

8.2 Marginal Cost Analysis

Comparing single_ranges.cpp (1 pipeline) vs multi_ranges.cpp (4 pipelines):

Metric Value
1 pipeline 391.9ms
4 pipelines 407.5ms
Delta (3 pipelines) +15.6ms
Per pipeline 5.2ms

8.3 Final Cost Model

T(N pipelines) = T_baseline + T_fixed + (N-1) × T_marginal

Where:

  • T_baseline = 125ms (compiling equivalent loop code)
  • T_fixed = 267ms (first use of <ranges>)
  • T_marginal = 5.2ms (each additional pipeline)

Validation

Pipelines Formula Expected Actual
1 125 + 267 + 0×5 392ms 392ms
4 125 + 267 + 3×5 408ms 408ms
10 125 + 267 + 9×5 ~439ms (predicted)
50 125 + 267 + 49×5 ~647ms (predicted)

9. Conclusions & Recommendations

9.1 Key Findings

  1. HEADER PARSING DOMINATES (58% of overhead)

    • <ranges> pulls in 43 additional headers
    • Includes <optional>, <span>, <tuple>, <string_view>...
    • This is a ONE-TIME cost per TU
  2. VIEW MACHINERY IS REASONABLE (15% of overhead)

    • filter_view + transform_view instantiation: ~40ms
    • Most cost is in first instantiation (cached thereafter)
  3. MARGINAL COST IS LOW (~5ms per pipeline)

    • Template caching works well
    • Only new lambda types require full instantiation
  4. OPTIMIZER OVERHEAD IS MODEST (8% of overhead)

    • ~22ms extra for inlining view iterators
    • Scales linearly with pipeline count

9.2 Recommendations

USE RANGES WHEN:

  • Project already has heavy template usage
  • PCH (precompiled headers) is enabled
  • TU has many pipelines (amortizes fixed cost)
  • Readability/maintainability is priority

AVOID RANGES WHEN:

  • Header-only library (cost paid by every includer)
  • Small utility with 1-2 simple loops
  • Build time is critical constraint
  • Compile-time constrained environments (embedded)

9.3 Mitigation Strategies

  1. PRECOMPILED HEADERS

    • Include <ranges> in PCH
    • Eliminates 154ms header parsing per TU
  2. FORWARD DECLARATIONS

    • Avoid #include <ranges> in headers
    • Include only in .cpp files
  3. MODULE SYSTEM (C++20)

    • import std; or import std.ranges;
    • Potentially eliminates repeated parsing

Appendix A: Source File References

A.1 Parse Events with File:Line

Events from ParseDeclarationOrFunctionDefinition with duration >500μs:

Duration File:Line
69.4ms single_ranges.cpp:5:1
8.8ms type_traits:68:1
4.1ms bits/max_size_type.h:61:5
3.0ms stdlib.h:34:1
2.7ms bits/basic_string.h:5003:5
2.7ms bits/basic_string.h:5008:5
2.6ms bits/max_size_type.h:442:5
2.6ms bits/basic_string.h:4997:5
2.4ms system_error:558:3
2.2ms bits/basic_string.h:4813:3
2.2ms bits/basic_string.h:4453:3
2.1ms bits/basic_string.h:4529:3
2.0ms stdio.h:30:1
1.9ms bits/basic_string.h:4724:3
1.6ms bits/ios_base.h:265:3
1.6ms pthread.h:197:1
1.5ms new:56:1
1.4ms compare:663:5
1.4ms bits/locale_classes.h:559:3

A.2 Full View Type Signatures

// 5.3ms - First filter_view
std::ranges::filter_view<
  std::ranges::ref_view<const std::vector<int>>,
  (lambda at single_ranges.cpp:8:28)
>

// 3.1ms - Iterator traits for first filter
std::iterator_traits<
  std::ranges::filter_view<
    std::ranges::ref_view<const std::vector<int>>,
    (lambda at single_ranges.cpp:8:28)
  >::_Iterator
>

// 1.4ms - First transform_view
std::ranges::transform_view<
  std::ranges::filter_view<
    std::ranges::ref_view<const std::vector<int>>,
    (lambda at single_ranges.cpp:8:28)
  >,
  (lambda at single_ranges.cpp:9:31)
>

// 1.4ms - Second transform_view
std::ranges::transform_view<
  std::ranges::transform_view<
    std::ranges::filter_view<
      std::ranges::ref_view<const std::vector<int>>,
      (lambda at single_ranges.cpp:8:28)
    >,
    (lambda at single_ranges.cpp:9:31)
  >,
  (lambda at single_ranges.cpp:10:31)
>

// 2.3ms - Final filter_view (outermost, most expensive)
std::ranges::filter_view<
  std::ranges::transform_view<
    std::ranges::transform_view<
      std::ranges::filter_view<
        std::ranges::ref_view<const std::vector<int>>,
        (lambda at single_ranges.cpp:8:28)
      >,
      (lambda at single_ranges.cpp:9:31)
    >,
    (lambda at single_ranges.cpp:10:31)
  >,
  (lambda at single_ranges.cpp:11:28)
>

Appendix B: Source Code

see above.


Appendix C: Data Files

Trace Files Generated

File Description
/tmp/ranges/single_loop.json Baseline loop trace
/tmp/ranges/single_ranges.json Single pipeline trace
/tmp/ranges/multi_ranges.json 4-pipeline trace
/tmp/ranges/big_loop.json Original NTTP loop trace
/tmp/ranges/big_ranges.json Original NTTP ranges trace

Object Files

File Size
single_loop.o 2.1 KB
single_ranges.o 2.5 KB
multi_ranges.o 3.8 KB
big_loop.o 3.2 KB
big_ranges.o 4.1 KB

Summary

The std::ranges library provides elegant, composable pipelines but comes with a measurable compile-time cost:

Aspect Cost
Fixed overhead ~267ms per TU
Marginal overhead ~5ms per pipeline
Primary cause Header parsing (58%)
Secondary cause View type instantiation (15%)

Bottom line: For projects with many pipelines or using PCH, the cost is amortized and acceptable. For header-only libraries or small utilities, traditional loops may be preferable.


Report generated using clang++ -ftime-trace analysis
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment