johngray1965 · December 5, 2025 13:23
diff --git a/gistfile1.txt b/gistfile1.txt
 # From 981ns to 82ns: A UUIDv7 Optimization Story

 I recently read an article on how the random nature of standard UUIDs can kill database performance due to index fragmentation. In the comments, someone mentioned UUIDv7 was designed to solve that very problem by being time-ordered and sortable.

 Intrigued, I did a Google search for "kotlin uuidv7" and found an open-source implementation. I was currently using the standard `UUID.randomUUID()` (v4) combined with the `fast-uuid` library for fast string conversion. I swapped in the new UUIDv7 library, ran my benchmarks, and was immediately disappointed. It was slower.

 But the library was open-source. And that’s where our story begins. What followed was a deep dive into the guts of UUID generation, peeling back layers of abstraction and overhead to chase every last nanosecond. This is the journey of how we made a UUIDv7 generator **~12x faster** than the native `java.util.UUID` on Android.

 ### The Baseline: Where We Started

 All benchmarks were run on a Pixel 10 Pro using Jetpack Microbenchmark, giving us stable, real-world numbers on the Android Runtime (ART).

 The initial situation was bleak. Our "new and improved" ID was slower than the problem it was meant to solve. Profiling was the only way forward.

 | Implementation | Time (Total) | Speedup |
 | :--- |:-------------|:------------------|
 | Initial UUIDv7 | ~330 ns | **Slower** |
 | `java.util.UUID` (v4) | 981 ns | 1x (Baseline) |

 ### The Optimization Journey

 #### Round 1: The Obvious Culprit (`SecureRandom`)

 The first profile immediately pointed to the entropy source. The library was using `java.security.SecureRandom`. While essential for cryptography, it's incredibly slow for generating simple IDs due to OS-level entropy gathering and thread synchronization.

 *   **The Fix:** Swap `SecureRandom` for `java.util.concurrent.ThreadLocalRandom`.
 *   **The Result:** Performance instantly jumped from ~330ns to **~95ns**. A huge win, proving that for non-security use cases, the choice of random number generator (RNG) is critical.

 #### Round 2: Chasing Allocations

 At 95ns, the profiler started showing "death by a thousand cuts." The code was creating intermediate `ByteArray` and `Pair` objects. Each allocation adds GC pressure and slows down execution in a tight loop.

 *   **The Fix:** Refactor the generator to work directly on primitive `Long`s, eliminating the `ByteArray` and `Pair`.
 *   **The Result:** We shaved off another ~40ns, bringing the generation time down to **~55ns**.

 #### Round 3: The `toString()` Bottleneck

 With generation optimized, the new bottleneck was converting the UUID to a `String`. A benchmark of the native `java.util.UUID.toString()` on Android revealed a shocking truth: it took **~675ns** on its own.

 Why? Unlike the desktop JVM which uses a fast native intrinsic, the AOSP implementation is often pure Java and creates **at least 7 temporary objects** (5 strings for the parts, a `StringBuilder`, and the final `String`) for every single call.

 *   **The Fix:** Write a custom `fastUuidString` that uses a pre-allocated `CharArray` from a `ThreadLocal`. This avoids all intermediate allocations.
 *   **The Result:** Our custom formatting logic clocked in at a mere **~48ns**. A **14x speedup** over the native method.

 #### Round 4: The Final Squeeze

 The last step was pure micro-optimization. The profiler showed that even the 32 `Long` to `Int` conversions inside the formatting function were adding up.

 *   **The Fix:** Split the two 64-bit `Long`s (`msb`, `lsb`) into four 32-bit `Int`s *once* at the start of the function. All subsequent bit-shifting operates on the faster `Int` primitives.
 *   **The Result:** This shaved off the final few nanoseconds, giving us our final, blazing-fast implementation.

 ### The Final Code

 Here is the culmination of that journey. It combines a single `AtomicLong` for monotonic state, zero-allocation generation logic, and the hyper-optimized `toString` implementation.
 kotlin @Suppress("MagicNumber") object UUIDv7 { private val HEX_DIGITS = charArrayOf('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
 private val state = AtomicLong(0L)

 private val threadLocalBuffer =
    object : ThreadLocal<CharArray>() {
        override fun initialValue() = CharArray(36)
    }

 fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }

 private inline fun <T> generate(block: (Long, Long) -> T): T {
    val provider = ThreadLocalRandom.current()
    val now = System.currentTimeMillis()
    var ts: Long
    var newState: Long

    while (true) {
        val current = state.get()
        val lastTs = current ushr 16
        val lastSeq = current and 0xFFF
        ts = if (now >= lastTs) now else lastTs

        val sequenceOverflow = lastSeq == 0xFFFL
        val needsRandom = (ts != lastTs) || sequenceOverflow

        val nextSeq =
            if (needsRandom) {
                if (sequenceOverflow && ts == lastTs) {
                    ts++
                }
                provider.nextInt(4096).toLong()
            } else {
                lastSeq + 1
            }

        newState = (ts shl 16) or 0x7000L or nextSeq
        if (state.compareAndSet(current, newState)) {
            break
        }
    }

    val msb = newState
    val randomLow = provider.nextLong()
    val lsb = (randomLow ushr 2) or Long.MIN_VALUE
    return block(msb, lsb)
 }
 private val state = AtomicLong(0L)

 private val threadLocalBuffer =
    object : ThreadLocal<CharArray>() {
        override fun initialValue() = CharArray(36)
    }

 fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }

 private inline fun <T> generate(block: (Long, Long) -> T): T {
    val provider = ThreadLocalRandom.current()
    val now = System.currentTimeMillis()
    var ts: Long
    var newState: Long

    while (true) {
        val current = state.get()
        val lastTs = current ushr 16
        val lastSeq = current and 0xFFF
        ts = if (now >= lastTs) now else lastTs

        val sequenceOverflow = lastSeq == 0xFFFL
        val needsRandom = (ts != lastTs) || sequenceOverflow

        val nextSeq =
            if (needsRandom) {
                if (sequenceOverflow && ts == lastTs) {
                    ts++
                }
                provider.nextInt(4096).toLong()
            } else {
                lastSeq + 1
            }

        newState = (ts shl 16) or 0x7000L or nextSeq
        if (state.compareAndSet(current, newState)) {
            break
        }
    }

    val msb = newState
    val randomLow = provider.nextLong()
    val lsb = (randomLow ushr 2) or Long.MIN_VALUE
    return block(msb, lsb)
 }

 fun fastUuidString(msb: Long, lsb: Long): String {
    val buf = threadLocalBuffer.get()!!
    val digits = HEX_DIGITS

    val msbHi = (msb ushr 32).toInt()
    val msbLo = msb.toInt()
    val lsbHi = (lsb ushr 32).toInt()
    val lsbLo = lsb.toInt()

    buf[0] = digits[(msbHi ushr 28) and 0xF]
    buf[1] = digits[(msbHi ushr 24) and 0xF]
    buf[2] = digits[(msbHi ushr 20) and 0xF]
    buf[3] = digits[(msbHi ushr 16) and 0xF]
    buf[4] = digits[(msbHi ushr 12) and 0xF]
    buf[5] = digits[(msbHi ushr 8) and 0xF]
    buf[6] = digits[(msbHi ushr 4) and 0xF]
    buf[7] = digits[msbHi and 0xF]
    buf[8] = '-'
    buf[9] = digits[(msbLo ushr 28) and 0xF]
    buf[10] = digits[(msbLo ushr 24) and 0xF]
    buf[11] = digits[(msbLo ushr 20) and 0xF]
    buf[12] = digits[(msbLo ushr 16) and 0xF]
    buf[13] = '-'
    buf[14] = digits[(msbLo ushr 12) and 0xF]
    buf[15] = digits[(msbLo ushr 8) and 0xF]
    buf[16] = digits[(msbLo ushr 4) and 0xF]
    buf[17] = digits[msbLo and 0xF]
    buf[18] = '-'
    buf[19] = digits[(lsbHi ushr 28) and 0xF]
    buf[20] = digits[(lsbHi ushr 24) and 0xF]
    buf[21] = digits[(lsbHi ushr 20) and 0xF]
    buf[22] = digits[(lsbHi ushr 16) and 0xF]
    buf[23] = '-'
    buf[24] = digits[(lsbHi ushr 12) and 0xF]
    buf[25] = digits[(lsbHi ushr 8) and 0xF]
    buf[26] = digits[(lsbHi ushr 4) and 0xF]
    buf[27] = digits[lsbHi and 0xF]
    buf[28] = digits[(lsbLo ushr 28) and 0xF]
    buf[29] = digits[(lsbLo ushr 24) and 0xF]
    buf[30] = digits[(lsbLo ushr 20) and 0xF]
    buf[31] = digits[(lsbLo ushr 16) and 0xF]
    buf[32] = digits[(lsbLo ushr 12) and 0xF]
    buf[33] = digits[(lsbLo ushr 8) and 0xF]
    buf[34] = digits[(lsbLo ushr 4) and 0xF]
    buf[35] = digits[lsbLo and 0xF]

    return String(buf)
 }
 }


 ### The Final Tally

 | Implementation | Time (Total) | Allocations | Speedup |
 | :--- |:-------------|:------------|:------------------|
 | **Legere UUIDv7** | **82.0 ns** | **1** | **~11.9x Faster** |
 | `java.util.UUID` (v4) | 981 ns | ~20 | 1x (Baseline) |

 From a disappointing start to a result that's an order of magnitude faster than the platform default, this journey was a powerful reminder: for performance-critical code, you have to measure, profile on your target platform, and never be afraid to challenge the "standard" way of doing things.
	# From 981ns to 82ns: A UUIDv7 Optimization Story

	I recently read an article on how the random nature of standard UUIDs can kill database performance due to index fragmentation. In the comments, someone mentioned UUIDv7 was designed to solve that very problem by being time-ordered and sortable.

	Intrigued, I did a Google search for "kotlin uuidv7" and found an open-source implementation. I was currently using the standard `UUID.randomUUID()` (v4) combined with the `fast-uuid` library for fast string conversion. I swapped in the new UUIDv7 library, ran my benchmarks, and was immediately disappointed. It was slower.

	But the library was open-source. And that’s where our story begins. What followed was a deep dive into the guts of UUID generation, peeling back layers of abstraction and overhead to chase every last nanosecond. This is the journey of how we made a UUIDv7 generator ~12x faster than the native `java.util.UUID` on Android.

	### The Baseline: Where We Started

	All benchmarks were run on a Pixel 10 Pro using Jetpack Microbenchmark, giving us stable, real-world numbers on the Android Runtime (ART).

	The initial situation was bleak. Our "new and improved" ID was slower than the problem it was meant to solve. Profiling was the only way forward.

	\| Implementation \| Time (Total) \| Speedup \|
	\| :--- \|:-------------\|:------------------\|
	\| Initial UUIDv7 \| ~330 ns \| Slower \|
	\| `java.util.UUID` (v4) \| 981 ns \| 1x (Baseline) \|

	### The Optimization Journey

	#### Round 1: The Obvious Culprit (`SecureRandom`)

	The first profile immediately pointed to the entropy source. The library was using `java.security.SecureRandom`. While essential for cryptography, it's incredibly slow for generating simple IDs due to OS-level entropy gathering and thread synchronization.

	* The Fix: Swap `SecureRandom` for `java.util.concurrent.ThreadLocalRandom`.
	* The Result: Performance instantly jumped from ~330ns to ~95ns. A huge win, proving that for non-security use cases, the choice of random number generator (RNG) is critical.

	#### Round 2: Chasing Allocations

	At 95ns, the profiler started showing "death by a thousand cuts." The code was creating intermediate `ByteArray` and `Pair` objects. Each allocation adds GC pressure and slows down execution in a tight loop.

	* The Fix: Refactor the generator to work directly on primitive `Long`s, eliminating the `ByteArray` and `Pair`.
	* The Result: We shaved off another ~40ns, bringing the generation time down to ~55ns.

	#### Round 3: The `toString()` Bottleneck

	With generation optimized, the new bottleneck was converting the UUID to a `String`. A benchmark of the native `java.util.UUID.toString()` on Android revealed a shocking truth: it took ~675ns on its own.

	Why? Unlike the desktop JVM which uses a fast native intrinsic, the AOSP implementation is often pure Java and creates at least 7 temporary objects (5 strings for the parts, a `StringBuilder`, and the final `String`) for every single call.

	* The Fix: Write a custom `fastUuidString` that uses a pre-allocated `CharArray` from a `ThreadLocal`. This avoids all intermediate allocations.
	* The Result: Our custom formatting logic clocked in at a mere ~48ns. A 14x speedup over the native method.

	#### Round 4: The Final Squeeze

	The last step was pure micro-optimization. The profiler showed that even the 32 `Long` to `Int` conversions inside the formatting function were adding up.

	* The Fix: Split the two 64-bit `Long`s (`msb`, `lsb`) into four 32-bit `Int`s once at the start of the function. All subsequent bit-shifting operates on the faster `Int` primitives.
	* The Result: This shaved off the final few nanoseconds, giving us our final, blazing-fast implementation.

	### The Final Code

	Here is the culmination of that journey. It combines a single `AtomicLong` for monotonic state, zero-allocation generation logic, and the hyper-optimized `toString` implementation.
	kotlin @Suppress("MagicNumber") object UUIDv7 { private val HEX_DIGITS = charArrayOf('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
	private val state = AtomicLong(0L)

	private val threadLocalBuffer =
	object : ThreadLocal<CharArray>() {
	override fun initialValue() = CharArray(36)
	}

	fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }

	private inline fun <T> generate(block: (Long, Long) -> T): T {
	val provider = ThreadLocalRandom.current()
	val now = System.currentTimeMillis()
	var ts: Long
	var newState: Long

	while (true) {
	val current = state.get()
	val lastTs = current ushr 16
	val lastSeq = current and 0xFFF
	ts = if (now >= lastTs) now else lastTs

	val sequenceOverflow = lastSeq == 0xFFFL
	val needsRandom = (ts != lastTs) \|\| sequenceOverflow

	val nextSeq =
	if (needsRandom) {
	if (sequenceOverflow && ts == lastTs) {
	ts++
	}
	provider.nextInt(4096).toLong()
	} else {
	lastSeq + 1
	}

	newState = (ts shl 16) or 0x7000L or nextSeq
	if (state.compareAndSet(current, newState)) {
	break
	}
	}

	val msb = newState
	val randomLow = provider.nextLong()
	val lsb = (randomLow ushr 2) or Long.MIN_VALUE
	return block(msb, lsb)
	}
	private val state = AtomicLong(0L)

	private val threadLocalBuffer =
	object : ThreadLocal<CharArray>() {
	override fun initialValue() = CharArray(36)
	}

	fun randomUUIDString(): String = generate { msb, lsb -> fastUuidString(msb, lsb) }

	private inline fun <T> generate(block: (Long, Long) -> T): T {
	val provider = ThreadLocalRandom.current()
	val now = System.currentTimeMillis()
	var ts: Long
	var newState: Long

	while (true) {
	val current = state.get()
	val lastTs = current ushr 16
	val lastSeq = current and 0xFFF
	ts = if (now >= lastTs) now else lastTs

	val sequenceOverflow = lastSeq == 0xFFFL
	val needsRandom = (ts != lastTs) \|\| sequenceOverflow

	val nextSeq =
	if (needsRandom) {
	if (sequenceOverflow && ts == lastTs) {
	ts++
	}
	provider.nextInt(4096).toLong()
	} else {
	lastSeq + 1
	}

	newState = (ts shl 16) or 0x7000L or nextSeq
	if (state.compareAndSet(current, newState)) {
	break
	}
	}

	val msb = newState
	val randomLow = provider.nextLong()
	val lsb = (randomLow ushr 2) or Long.MIN_VALUE
	return block(msb, lsb)
	}

	fun fastUuidString(msb: Long, lsb: Long): String {
	val buf = threadLocalBuffer.get()!!
	val digits = HEX_DIGITS

	val msbHi = (msb ushr 32).toInt()
	val msbLo = msb.toInt()
	val lsbHi = (lsb ushr 32).toInt()
	val lsbLo = lsb.toInt()

	buf[0] = digits[(msbHi ushr 28) and 0xF]
	buf[1] = digits[(msbHi ushr 24) and 0xF]
	buf[2] = digits[(msbHi ushr 20) and 0xF]
	buf[3] = digits[(msbHi ushr 16) and 0xF]
	buf[4] = digits[(msbHi ushr 12) and 0xF]
	buf[5] = digits[(msbHi ushr 8) and 0xF]
	buf[6] = digits[(msbHi ushr 4) and 0xF]
	buf[7] = digits[msbHi and 0xF]
	buf[8] = '-'
	buf[9] = digits[(msbLo ushr 28) and 0xF]
	buf[10] = digits[(msbLo ushr 24) and 0xF]
	buf[11] = digits[(msbLo ushr 20) and 0xF]
	buf[12] = digits[(msbLo ushr 16) and 0xF]
	buf[13] = '-'
	buf[14] = digits[(msbLo ushr 12) and 0xF]
	buf[15] = digits[(msbLo ushr 8) and 0xF]
	buf[16] = digits[(msbLo ushr 4) and 0xF]
	buf[17] = digits[msbLo and 0xF]
	buf[18] = '-'
	buf[19] = digits[(lsbHi ushr 28) and 0xF]
	buf[20] = digits[(lsbHi ushr 24) and 0xF]
	buf[21] = digits[(lsbHi ushr 20) and 0xF]
	buf[22] = digits[(lsbHi ushr 16) and 0xF]
	buf[23] = '-'
	buf[24] = digits[(lsbHi ushr 12) and 0xF]
	buf[25] = digits[(lsbHi ushr 8) and 0xF]
	buf[26] = digits[(lsbHi ushr 4) and 0xF]
	buf[27] = digits[lsbHi and 0xF]
	buf[28] = digits[(lsbLo ushr 28) and 0xF]
	buf[29] = digits[(lsbLo ushr 24) and 0xF]
	buf[30] = digits[(lsbLo ushr 20) and 0xF]
	buf[31] = digits[(lsbLo ushr 16) and 0xF]
	buf[32] = digits[(lsbLo ushr 12) and 0xF]
	buf[33] = digits[(lsbLo ushr 8) and 0xF]
	buf[34] = digits[(lsbLo ushr 4) and 0xF]
	buf[35] = digits[lsbLo and 0xF]

	return String(buf)
	}
	}


	### The Final Tally

	\| Implementation \| Time (Total) \| Allocations \| Speedup \|
	\| :--- \|:-------------\|:------------\|:------------------\|
	\| Legere UUIDv7 \| 82.0 ns \| 1 \| ~11.9x Faster \|
	\| `java.util.UUID` (v4) \| 981 ns \| ~20 \| 1x (Baseline) \|

	From a disappointing start to a result that's an order of magnitude faster than the platform default, this journey was a powerful reminder: for performance-critical code, you have to measure, profile on your target platform, and never be afraid to challenge the "standard" way of doing things.
No results found