Skip to content

Instantly share code, notes, and snippets.

@kebien6020
Last active June 30, 2023 04:01
Show Gist options
  • Select an option

  • Save kebien6020/5dc10f24630cbeebe8b7fb8e5bd207dd to your computer and use it in GitHub Desktop.

Select an option

Save kebien6020/5dc10f24630cbeebe8b7fb8e5bd207dd to your computer and use it in GitHub Desktop.
Benchmark testing vector vs list for different sized elements
// Compile:
// g++ -O3 -std=c++20 -lbenchmark main.cpp -o main
// You need google benchmark installed to compile: https://github.com/google/benchmark
// Run:
// ./main
#include <benchmark/benchmark.h>
#include <array>
#include <iterator>
#include <list>
#include <memory>
#include <random>
#include <ranges>
#include <vector>
using std::advance;
using std::array;
using std::begin;
using std::list;
using std::make_unique;
using std::mt19937;
using std::ssize;
using std::uniform_int_distribution;
using std::unique_ptr;
using std::vector;
using std::views::iota;
template <int N>
using Blob = array<char, N>;
auto gen = mt19937{42}; // fixed seed for reproducible results
auto dist = uniform_int_distribution{};
auto rand_idx(int size) -> int {
using D = decltype(dist);
return dist(gen, D::param_type{0, size - 1});
}
template <int N> // N is size of each element
static void VectorInsert(benchmark::State& state) {
auto vec = vector<Blob<N>>(100);
for (auto _ : state) {
for (auto _ : iota(0, state.range(0))) {
auto pos = begin(vec);
advance(pos, rand_idx(ssize(vec)));
vec.emplace(pos); // If you insert instead of emplacing you incur either an additional copy or a move
// For the Blob type (array under the hood) move is exactly the same as copy
}
}
benchmark::DoNotOptimize(vec); // Makes the compiler think that the contents of the vector are used in some way
}
BENCHMARK_TEMPLATE(VectorInsert, 8)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(VectorInsert, 64)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(VectorInsert, 512)->Range(8, 8 << 10);
template <int N>
static void ListInsert(benchmark::State& state) {
auto lst = list<Blob<N>>(100);
for (auto _ : state) {
for (auto _ : iota(0, state.range(0))) {
auto pos = begin(lst);
advance(pos, rand_idx(ssize(lst))); // O(n) for lists, O(1) for vector
lst.emplace(pos);
}
}
benchmark::DoNotOptimize(lst);
}
BENCHMARK_TEMPLATE(ListInsert, 8)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(ListInsert, 64)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(ListInsert, 512)->Range(8, 8 << 10);
template <int N>
static void VecPtrInsert(benchmark::State& state) {
auto vec = vector<unique_ptr<Blob<N>>>(100);
for (auto _ : state) {
for (auto _ : iota(0, state.range(0))) {
auto pos = begin(vec);
advance(pos, rand_idx(ssize(vec)));
vec.emplace(pos, make_unique<Blob<N>>());
}
}
benchmark::DoNotOptimize(vec);
}
BENCHMARK_TEMPLATE(VecPtrInsert, 8)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(VecPtrInsert, 64)->Range(8, 8 << 10);
BENCHMARK_TEMPLATE(VecPtrInsert, 512)->Range(8, 8 << 10);
BENCHMARK_MAIN();
@kebien6020
Copy link
Author

kebien6020 commented Jun 30, 2023

Results

8-byte elements 64-byte elements 512-byte elements
image image image

As expected, list is mostly trash even at higher sized elements. No idea what's happening with VectorInsert<512>/4096 maybe it hit bad rng with the insertion indices, VectorInsert<512>/8192 is literally more work no matter how you look at it. You might want to test with a different random seed.

What surprised me the most is that VectorInsert didn't really outperform VecPtrInsert at 8-byte elements, but maybe with a more chaotic memory allocation pattern a vector of pointers could be more unpredictable while the vector of elements maintains predictable performace, idk tho. One important aspect is that we're not really doing a linear search on the vector cases, we just pick an index an insert there. With the linear search we would have to chase each pointer in the vector up to the insertion point in order to see the contents, probably in that case the plain vector would outperform the vector of pointers.

Only vector Only list Only vector of pointers
image image image

As you might expect, both the linked list and the vector of pointers don't really care about the size of the elements, and both appear to scale more or less linearly over the amount of insertions, it's just that the linked list has a much steeper slope. The plain vector does struggle as you increase the element size.

Raw output

Run on (4 X 3100.14 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 6144 KiB (x1)
Load Average: 0.15, 0.59, 0.82
-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
VectorInsert<8>/8           74545 ns        74495 ns        15717
VectorInsert<8>/64         782783 ns       782281 ns         2530
VectorInsert<8>/512       8292463 ns      8286817 ns          411
VectorInsert<8>/4096     88339220 ns     88211642 ns           67
VectorInsert<8>/8192     75936176 ns     75838765 ns           15
VectorInsert<64>/8         411961 ns       411397 ns        10000
VectorInsert<64>/64       2617072 ns      2613818 ns         1000
VectorInsert<64>/512     16870942 ns     16848009 ns          100
VectorInsert<64>/4096   106039178 ns    105905133 ns           10
VectorInsert<64>/8192   429650398 ns    429225193 ns           10
VectorInsert<512>/8        781486 ns       779554 ns         2123
VectorInsert<512>/64      8541715 ns      8517097 ns          326
VectorInsert<512>/512    93736835 ns     93453340 ns           50
VectorInsert<512>/4096 1097361294 ns   1093809941 ns            8
VectorInsert<512>/8192  772711384 ns    770818075 ns            2
ListInsert<8>/8           2266679 ns      2264575 ns        11044
ListInsert<8>/64         11301978 ns     11291161 ns          859
ListInsert<8>/512        83774594 ns     83714394 ns          100
ListInsert<8>/4096      533302633 ns    532842145 ns           10
ListInsert<8>/8192     1073722640 ns   1072725718 ns            5
ListInsert<64>/8          1313283 ns      1310716 ns         5082
ListInsert<64>/64        12544258 ns     12436616 ns          811
ListInsert<64>/512       89133467 ns     88692786 ns          100
ListInsert<64>/4096     528846351 ns    527814077 ns           10
ListInsert<64>/8192    1069490810 ns   1066938751 ns            5
ListInsert<512>/8         1041730 ns      1035414 ns         4808
ListInsert<512>/64       12827315 ns     12777015 ns          899
ListInsert<512>/512      95470433 ns     94790813 ns          100
ListInsert<512>/4096    573666184 ns    571230090 ns           10
ListInsert<512>/8192   1111161562 ns   1108023310 ns            5
VecPtrInsert<8>/8          111164 ns       111021 ns        10000
VecPtrInsert<8>/64         709260 ns       708513 ns         1000
VecPtrInsert<8>/512       7014860 ns      7005140 ns          153
VecPtrInsert<8>/4096     70104617 ns     70022264 ns           24
VecPtrInsert<8>/8192    116833191 ns    116664935 ns           10
VecPtrInsert<64>/8         111792 ns       111652 ns        10000
VecPtrInsert<64>/64        720380 ns       718173 ns         1000
VecPtrInsert<64>/512      7089906 ns      7078501 ns          155
VecPtrInsert<64>/4096    67241787 ns     67128039 ns           23
VecPtrInsert<64>/8192   117147299 ns    116983334 ns           10
VecPtrInsert<512>/8        112072 ns       111842 ns        10000
VecPtrInsert<512>/64       710448 ns       709179 ns         1000
VecPtrInsert<512>/512     7080990 ns      7067280 ns          155
VecPtrInsert<512>/4096   70184347 ns     70025292 ns           24
VecPtrInsert<512>/8192  117501952 ns    117158188 ns           10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment