Skip to content

Instantly share code, notes, and snippets.

@johngray1965
Created December 5, 2025 13:23
Show Gist options
  • Select an option

  • Save johngray1965/5553e030b9a61ae095802a4b5956d04d to your computer and use it in GitHub Desktop.

Select an option

Save johngray1965/5553e030b9a61ae095802a4b5956d04d to your computer and use it in GitHub Desktop.
# From 981ns to 82ns: A UUIDv7 Optimization Story
I recently read an article on how the random nature of standard UUIDs can kill database performance due to index fragmentation. In the comments, someone mentioned UUIDv7 was designed to solve that very problem by being time-ordered and sortable.
Intrigued, I did a Google search for "kotlin uuidv7" and found an open-source implementation. I was currently using the standard `UUID.randomUUID()` (v4) combined with the `fast-uuid` library for fast string conversion. I swapped in the new UUIDv7 library, ran my benchmarks, and was immediately disappointed. It was slower.
But the library was open-source. And that’s where our story begins. What followed was a deep dive into the guts of UUID generation, peeling back layers of abstraction and overhead to chase every last nanosecond. This is the journey of how we made a UUIDv7 generator **~12x faster** than the native `java.util.UUID` on Android.
### The Baseline: Where We Started
All benchmarks were run on a Pixel 10 Pro using Jetpack Microbenchmark, giving us stable, real-world numbers on the Android Runtime (ART).
The initial situation was bleak. Our "new and improved" ID was slower than the problem it was meant to solve. Profiling was the only way forward.
| Implementation | Time (Total) | Speedup |
| :--- |:-------------|:------------------|
| Initial UUIDv7 | ~330 ns | **Slower** |
| `java.util.UUID` (v4) | 981 ns | 1x (Baseline) |
### The Optimization Journey
#### Round 1: The Obvious Culprit (`SecureRandom`)
The first profile immediately pointed to the entropy source. The library was using `java.security.SecureRandom`. While essential for cryptography, it's incredibly slow for generating simple IDs due to OS-level entropy gathering and thread synchronization.
* **The Fix:** Swap `SecureRandom` for `java.util.concurrent.ThreadLocalRandom`.
* **The Result:** Performance instantly jumped from ~330ns to **~95ns**. A huge win, proving that for non-security use cases, the choice of random number generator (RNG) is critical.
#### Round 2: Chasing Allocations
At 95ns, the profiler started showing "death by a thousand cuts." The code was creating intermediate `ByteArray` and `Pair` objects. Each allocation adds GC pressure and slows down execution in a tight loop.
* **The Fix:** Refactor the generator to work directly on primitive `Long`s, eliminating the `ByteArray` and `Pair`.
* **The Result:** We shaved off another ~40ns, bringing the generation time down to **~55ns**.
#### Round 3: The `toString()` Bottleneck
With generation optimized, the new bottleneck was converting the UUID to a `String`. A benchmark of the native `java.util.UUID.toString()` on Android revealed a shocking truth: it took **~675ns** on its own.
Why? Unlike the desktop JVM which uses a fast native intrinsic, the AOSP implementation is often pure Java and creates **at least 7 temporary objects** (5 strings for the parts, a `StringBuilder`, and the final `String`) for every single call.
* **The Fix:** Write a custom `fastUuidString` that uses a pre-allocated `CharArray` from a `ThreadLocal`. This avoids all intermediate allocations.
* **The Result:** Our custom formatting logic clocked in at a mere **~48ns**. A **14x speedup** over the native method.
#### Round 4: The Final Squeeze
The last step was pure micro-optimization. The profiler showed that even the 32 `Long` to `Int` conversions inside the formatting function were adding up.
* **The Fix:** Split the two 64-bit `Long`s (`msb`, `lsb`) into four 32-bit `Int`s *once* at the start of the function. All subsequent bit-shifting operates on the faster `Int` primitives.
* **The Result:** This shaved off the final few nanoseconds, giving us our final, blazing-fast implementation.
### The Final Code
Here is the culmination of that journey. It combines a single `AtomicLong` for monotonic state, zero-allocation generation logic, and the hyper-optimized `toString` implementation.
kotlin @Suppress("MagicNumber") object UUIDv7 { private val HEX_DIGITS = charArrayOf('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
private val state = AtomicLong(0L)
private val threadLocalBuffer =
object : ThreadLocal<CharArray>() {
override fun initialValue() = CharArray(36)
}
fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }
private inline fun <T> generate(block: (Long, Long) -> T): T {
val provider = ThreadLocalRandom.current()
val now = System.currentTimeMillis()
var ts: Long
var newState: Long
while (true) {
val current = state.get()
val lastTs = current ushr 16
val lastSeq = current and 0xFFF
ts = if (now >= lastTs) now else lastTs
val sequenceOverflow = lastSeq == 0xFFFL
val needsRandom = (ts != lastTs) || sequenceOverflow
val nextSeq =
if (needsRandom) {
if (sequenceOverflow && ts == lastTs) {
ts++
}
provider.nextInt(4096).toLong()
} else {
lastSeq + 1
}
newState = (ts shl 16) or 0x7000L or nextSeq
if (state.compareAndSet(current, newState)) {
break
}
}
val msb = newState
val randomLow = provider.nextLong()
val lsb = (randomLow ushr 2) or Long.MIN_VALUE
return block(msb, lsb)
}
private val state = AtomicLong(0L)
private val threadLocalBuffer =
object : ThreadLocal<CharArray>() {
override fun initialValue() = CharArray(36)
}
fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }
private inline fun <T> generate(block: (Long, Long) -> T): T {
val provider = ThreadLocalRandom.current()
val now = System.currentTimeMillis()
var ts: Long
var newState: Long
while (true) {
val current = state.get()
val lastTs = current ushr 16
val lastSeq = current and 0xFFF
ts = if (now >= lastTs) now else lastTs
val sequenceOverflow = lastSeq == 0xFFFL
val needsRandom = (ts != lastTs) || sequenceOverflow
val nextSeq =
if (needsRandom) {
if (sequenceOverflow && ts == lastTs) {
ts++
}
provider.nextInt(4096).toLong()
} else {
lastSeq + 1
}
newState = (ts shl 16) or 0x7000L or nextSeq
if (state.compareAndSet(current, newState)) {
break
}
}
val msb = newState
val randomLow = provider.nextLong()
val lsb = (randomLow ushr 2) or Long.MIN_VALUE
return block(msb, lsb)
}
fun fastUuidString(msb: Long, lsb: Long): String {
val buf = threadLocalBuffer.get()!!
val digits = HEX_DIGITS
val msbHi = (msb ushr 32).toInt()
val msbLo = msb.toInt()
val lsbHi = (lsb ushr 32).toInt()
val lsbLo = lsb.toInt()
buf[0] = digits[(msbHi ushr 28) and 0xF]
buf[1] = digits[(msbHi ushr 24) and 0xF]
buf[2] = digits[(msbHi ushr 20) and 0xF]
buf[3] = digits[(msbHi ushr 16) and 0xF]
buf[4] = digits[(msbHi ushr 12) and 0xF]
buf[5] = digits[(msbHi ushr 8) and 0xF]
buf[6] = digits[(msbHi ushr 4) and 0xF]
buf[7] = digits[msbHi and 0xF]
buf[8] = '-'
buf[9] = digits[(msbLo ushr 28) and 0xF]
buf[10] = digits[(msbLo ushr 24) and 0xF]
buf[11] = digits[(msbLo ushr 20) and 0xF]
buf[12] = digits[(msbLo ushr 16) and 0xF]
buf[13] = '-'
buf[14] = digits[(msbLo ushr 12) and 0xF]
buf[15] = digits[(msbLo ushr 8) and 0xF]
buf[16] = digits[(msbLo ushr 4) and 0xF]
buf[17] = digits[msbLo and 0xF]
buf[18] = '-'
buf[19] = digits[(lsbHi ushr 28) and 0xF]
buf[20] = digits[(lsbHi ushr 24) and 0xF]
buf[21] = digits[(lsbHi ushr 20) and 0xF]
buf[22] = digits[(lsbHi ushr 16) and 0xF]
buf[23] = '-'
buf[24] = digits[(lsbHi ushr 12) and 0xF]
buf[25] = digits[(lsbHi ushr 8) and 0xF]
buf[26] = digits[(lsbHi ushr 4) and 0xF]
buf[27] = digits[lsbHi and 0xF]
buf[28] = digits[(lsbLo ushr 28) and 0xF]
buf[29] = digits[(lsbLo ushr 24) and 0xF]
buf[30] = digits[(lsbLo ushr 20) and 0xF]
buf[31] = digits[(lsbLo ushr 16) and 0xF]
buf[32] = digits[(lsbLo ushr 12) and 0xF]
buf[33] = digits[(lsbLo ushr 8) and 0xF]
buf[34] = digits[(lsbLo ushr 4) and 0xF]
buf[35] = digits[lsbLo and 0xF]
return String(buf)
}
}
### The Final Tally
| Implementation | Time (Total) | Allocations | Speedup |
| :--- |:-------------|:------------|:------------------|
| **Legere UUIDv7** | **82.0 ns** | **1** | **~11.9x Faster** |
| `java.util.UUID` (v4) | 981 ns | ~20 | 1x (Baseline) |
From a disappointing start to a result that's an order of magnitude faster than the platform default, this journey was a powerful reminder: for performance-critical code, you have to measure, profile on your target platform, and never be afraid to challenge the "standard" way of doing things.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment