Skip to content

Instantly share code, notes, and snippets.

@sdrapkin
Last active December 9, 2025 20:23
Show Gist options
  • Select an option

  • Save sdrapkin/03b13a9f7ba80afe62c3308b91c943ed to your computer and use it in GitHub Desktop.

Select an option

Save sdrapkin/03b13a9f7ba80afe62c3308b91c943ed to your computer and use it in GitHub Desktop.
Avoid using Guid.CreateVersion7 in .NET

.NET: Avoid using Guid.CreateVersion7

TL;DR: Guid.CreateVersion7 in .NET 9+ claims RFC 9562 compliance but violates its big-endian requirement for binary storage. This causes the same database index fragmentation that v7 UUIDs were designed to prevent. Testing with 100K PostgreSQL inserts shows rampant fragmentation (35% larger indexes) versus properly-implemented sequential GUIDs.

Guid.CreateVersion7 method was introduced in .NET 9 and is now included for the first time in a long-term-supported .NET 10. Microsoft docs for Guid.CreateVersion7 state “Creates a new Guid according to RFC 9562, following the Version 7 format.” We will see about that.

RFC 9562

RFC 9562 defines a UUID as a 128-bit/16-byte long structure (which System.Guid is, so far so good). RFC 9562 requires UUIDv7 versions to store a 48-bit/6-byte big-endian Unix timestamp in milliseconds in the most significant 48 bits. Guid.CreateVersion7 does not do that, and hence violates its RFC 9562 claims.

RFC 9562 UUIDv7 Expected Byte Order:
┌─────────────────┬──────────────────────┐
│  MSB first:     │                      │
│  Timestamp (6)  │  Mostly Random (10)  │
└─────────────────┴──────────────────────┘

Let’s test it out:

// helper structures
Span<byte> bytes8 = stackalloc byte[8];
Span<byte> bytes16 = stackalloc byte[16];

var ts = DateTimeOffset.UtcNow; // get UTC timestamp
long ts_ms = ts.ToUnixTimeMilliseconds(); // get Unix milliseconds
ts_ms.Dump(); // print out ts_ms - for example: 1762550326422

// convert ts_ms to 8 bytes
Unsafe.WriteUnaligned(ref bytes8[0], ts_ms);

// print the hex bytes of ts_ms, for example: 96-A4-2F-60-9A-01-00-00
BitConverter.ToString(bytes8.ToArray()).Dump();

// We now expect that Guid.CreateVersion7() will start with the above 6 bytes in reverse order:
// specifically: 01-9A-60-2F-A4-96 followed by 10 more bytes

var uuid_v7 = Guid.CreateVersion7(ts); // creating v7 version from previously generated timestamp
BitConverter.ToString(uuid_v7.ToByteArray()).Dump(); // print the .ToByteArray() conversion of uuid_v7

// Print out the 16 in-memory uuid_v7 bytes directly, without any helper conversions:
Unsafe.WriteUnaligned(ref bytes16[0], uuid_v7);
BitConverter.ToString(bytes16.ToArray()).Dump();

// Output (2 lines):
// 2F-60-9A-01-96-A4-2C-7E-8B-BF-68-FB-69-1C-A8-03
// 2F-60-9A-01-96-A4-2C-7E-8B-BF-68-FB-69-1C-A8-03

// 1. We see that both in-memory and .ToByteArray() bytes are identical.
// 2. We see that the byte order is *NOT* what we expected above,
//    and does not match RFC 9562 v7-required byte order.

// Expected big-endian: 01-9A-60-2F-A4-96-...
// Actual in-memory:    2F-60-9A-01-96-A4-...
// ❌ First 6 bytes are NOT in big-endian order

uuid_v7.ToString().Dump(); // 019a602f-a496-7e2c-8bbf-68fb691ca803
// The string representation of uuid_v7 does match the expected left-to-right byte order.

Note that RFC 9562 is first and foremost a byte-order specification. The .NET implementation of Guid.CreateVersion7 does not store the timestamp in big-endian order - neither in-memory nor in the result of .ToByteArray().

The .NET implementation instead makes the v7 string representation of the Guid appear correct by storing the underlying bytes in (v7-incorrect) non-big-endian way. However, this string "correctness" is mostly useless, since storing UUIDs as strings is an anti-pattern (RFC 9562: "where feasible, UUIDs SHOULD be stored within database applications as the underlying 128-bit binary value").

Also note that this problem is unrelated to RFC 9562 Section 6.2 which deals with optional monotonicity in cases of multiple UUIDs generated within the same Unix timestamp.

Who cares? Why this matters

This issue is not just a technicality or a minor documentation omission. The primary purpose of Version 7 UUIDs is to create sequentially ordered IDs that can be used as database keys (e.g., PostgreSQL) to prevent index fragmentation.

Databases sort UUIDs based on their 16-byte order, and the .NET implementation of Guid.CreateVersion7 fails to provide the correct big-endian sequential ordering over the first 6 bytes. As implemented, Guid.CreateVersion7 increments its first byte roughly every minute, wrapping around after ~4.27 hours. This improper behavior leads to the exact database fragmentation that Version 7 UUIDs were designed to prevent.

The only thing worse than a "lack of sequential-GUID support in .NET" is Microsoft-blessed supposedly trustworthy implementation that does not deliver. Let's see this failure in action. Npgsql is a de facto standard OSS .NET client for PostgreSQL, with 3.6k stars on Github. Npgsql v10 added Guid.CreateVersion7 as the implementation of NpgsqlSequentialGuidValueGenerator more than a year ago.

We'll test PostgreSQL 18 by inserting 100_000 UUIDs as primary keys using the following UUID-creation strategies:

  1. uuid = Guid.NewGuid(); which is mostly random, and we expect lots of fragmentation (no surprises).
  2. uuid = Guid.CreateVersion7(); which is supposedly big-endian ordered on 6 first bytes, and should reduce fragmentation.
  3. uuid = instance of NpgsqlSequentialGuidValueGenerator.Next(); which is identical to #2 (just making sure).
  4. uuid = FastGuid.NewPostgreSqlGuid(); from FastGuid, which not only reduces fragmentation, but is also very fast (see benchmarks).
-- PostgreSQL:
-- DROP TABLE IF EXISTS public.my_table; 
CREATE TABLE IF NOT EXISTS public.my_table 
( 
    id uuid NOT NULL, 
    name text, 
    CONSTRAINT my_table_pkey PRIMARY KEY (id) 
)

c# code to populate the above table:

async Task Main()
{
	string connectionString = "Host=localhost;Port=5432;Username=postgres;Password=postgres;Database=testdb";

	using var connection = new NpgsqlConnection(connectionString);

	if (true)
	{
		const int N_GUIDS = 100_000;
		var guids = new Guid[N_GUIDS];

		var entityFrameworkCore = new Npgsql.EntityFrameworkCore.PostgreSQL.ValueGeneration.NpgsqlSequentialGuidValueGenerator();

		for (int i = 0; i < guids.Length; ++i)
		{
			//guids[i] = Guid.NewGuid();
			//guids[i] = Guid.CreateVersion7();
			//guids[i] = SecurityDriven.FastGuid.NewPostgreSqlGuid();
			guids[i] = entityFrameworkCore.Next(null);
		}

		for (int i = 0; i < guids.Length; ++i)
		{
			using var conn = new NpgsqlConnection(connectionString);
			conn.Open();
			using var comm = new NpgsqlCommand($"INSERT INTO public.my_table(id, name) VALUES(@id, @name);", conn);

			var p_id = comm.Parameters.Add("@id", NpgsqlTypes.NpgsqlDbType.Uuid);
			p_id.Value = guids[i];

			var p_name = comm.Parameters.Add("@name", NpgsqlTypes.NpgsqlDbType.Integer);
			p_name.Value = i;

			comm.ExecuteScalar();
		}
	}

	using var conn2 = new NpgsqlConnection(connectionString);
	conn2.Open();
	using var command = new NpgsqlCommand("SELECT * FROM public.my_table ORDER BY id ASC LIMIT 100", conn2);

	using var reader = await command.ExecuteReaderAsync();
	while (reader.Read()) // Iterate through the results and display table details
	{
		// Fetch column values by index or column name
		Guid id = reader.GetGuid(0);
		string name = reader.GetString(1);

		// Display the information (using Dump for LINQPad or Console.WriteLine for other environments)
		$@"{id,-50} [{name}]".Dump();
	}
}//main

We'll run the database inserts and then check fragmentation via:

SELECT * FROM pgstattuple('my_table_pkey');

Case-1: using Guid.NewGuid() (no surprises) ↓

image

After VACUUM FULL my_table;: image

Case-2: using Guid.CreateVersion7()

image

After VACUUM FULL my_table;: image

Case-3: using NpgsqlSequentialGuidValueGenerator.Next(); (should be identical to #2) ↓

image

After VACUUM FULL my_table;: image

Case-4: using FastGuid.NewPostgreSqlGuid();

image

After VACUUM FULL my_table;: image

Understanding the results:

  • table_len is total physical size (in bytes) of the index file on disk.
  • tuple_percent is percentage of the index file used by live tuples. This is roughly equivalent to page density.
  • free_space is the total amount of unused space within the allocated pages.
  • free_percent is free_space as a percentage (free_space / table_len).

Note that tuple_percent and free_percent do not add up to 100% because ~15% of this index is occupied by internal metadata (page headers, item pointers, padding, etc).

Key observations:

  • In Cases-1/2/3 the database size (and #pages) was ~35% higher than for Case-4.
  • In Case-4 the page density was optimal (ie. VACUUM FULL had no effect).
  • Cases-2/3 (which use Guid.CreateVersion7) were virtually identical to Case-1 (which used a random Guid). Using Guid.CreateVersion7 showed zero improvement over random Guid.NewGuid().

Findings: Cases 1-3 produce identical fragmentation patterns (before and after VACUUM). Guid.CreateVersion7 provides zero benefit over random GUIDs. FastGuid requires no VACUUM as insertions are already optimal.

Microsoft's perspective

This issue was already raised and discussed with Microsoft in January 2025. Microsoft's implementation of Guid.CreateVersion7 is intentional and by design. They will not be changing the byte-order behavior or updating the documentation.

Summary and Conclusion

Problem:

Microsoft's Guid.CreateVersion7 (introduced in .NET 9) claims to implement RFC 9562's Version 7 UUID specification, but it violates the core big-endian byte-order requirement, which causes real database performance problems:

  • 35% larger indexes compared to properly-implemented sequential identifiers
  • 20% worse page density
  • Zero improvement over Guid.NewGuid() for preventing fragmentation

The irony: Version 7 UUIDs were specifically designed to prevent the exact fragmentation that Guid.CreateVersion7 still causes. Millions of developers will be tempted to use it, believe they are solving fragmentation, and actually be making it just as bad as with random identifiers, all while burning CPU to generate a "sequential" ID that isn't.

Solution:

  • Step-1: Avoid using Guid.CreateVersion7 for 16-byte database identifiers.
  • Step-2: Fix your database fragmentation with FastGuid: a lightweight, high-performance library that generates sequential 16-byte identifiers specifically optimized for database use.
    • .NewPostgreSqlGuid for PostgreSQL
    • .NewSqlServerGuid for SQL Server

Disclosure: I'm the author of FastGuid. This article presents reproducible benchmarks with verifiable results.

@tannergooding
Copy link

@justinbrick you're using some fairly volatile wording, but that wording could be equally applied to people doing the simple thing of "reading the RFC which clearly covers the historical aspects and problems for this across a broad range of implementations and that they are not specific to Microsoft".

That is, the latest RFC 9562 explicitly details that the UUIDs have evolved a lot since their original introduction and that there have been many errata having to clarify common bugs, byte ordering issues, decoupling of concepts, fixing security issues, and more. Because this isn't a Microsoft specific issue, it is a general problem with UUIDs and them being a 35+ year old construct that originated long before the RFC existed and where multiple platforms have historically handled things differently for many reasons.

The history is important and there are many resources on it. But the basics is that it started in the 1980s for the Network Computer System, later evolved into what was formally called UUIDs for the Distributed Computing Environment, and was then adopted by Windows. All of this happened over a decade before the first RFC 4122 was published in 2005.

You can still see the original field layout in the latest RFC as part of UUID v1, but what it boils down to is that it was formally defined as a C struct that looked like this (using somewhat more modern C syntax to avoid ambiguity):

struct UUID
{
    uint32_t time_low;
    uint16_t time_mid;
    uint16_t time_hi_and_version;
    uint8_t clock_seq_hi_and_reserved;
    uint8_t clock_seq_low;
    uint8_t node[6];
};

The original DCE/R{C spec also explicitly calls out (because it was given a C definition)

Depending on the network data representation, the multi-octet unsigned integer fields are subject to byte swapping when communicated between different endian machines.

This was important because it was still the "earlier" days of networking. There was a lot of variance and while RFC 1700 in 1994 started requiring that TPC/IP use big-endian as the standard, there was a monumental amount of code, RPC layers, and other things that still worked in "host order" and so was often little-endian as that was the most prominent for CPUs at the time. -- A reader who is paying attention may also note that the Win32 definition of GUID remains compatible with this original C definition (largely just collapsing the 8 trailing uint8_t fields into a single Data4[8]).

Now shift forward more years, people have introduced several new versions (v2-v5) and while there are various docs here and there, there isn't a single centralized body aggregating it all. So some people get together and publish RFC 4122 (https://www.rfc-editor.org/rfc/rfc4122). This initial spec is (to my knowledge) the first time that UUID specification puts a heavier restriction on the binary representation of the fields, describing them as "16x octets" and specifying that outside of a spec saying otherwise that they should be "big endian" (network byte order). But this was explicitly done and called out because of the literal decade+ worth of code that was doing it over other protocols or scenarios and where host-endianness was required and the attempt to normalize it moving forward was in recognition of some of the primary use-cases and importance of having a well-defined standard for cross computer communication, which particularly tends to happen over the network where big-endian is standard.

By this point, it is impossible to "fix" Win32, COM, .NET, or any of the dozens of other protocols that existed (many of which are still used to this day, including on Linux and Unix in general) to require big-endian. It would be a massive and detrimental binary break that would set everything back significantly. You also can't expose a new type because that causes additional confusion and problems, it also wouldn't be compatible with all the scenarios using UUIDs that are documented (and allowed to be, per the RFC) to be using host-endianness (especially for back-compat).

We then move forward another 19 years and we get today's spec, RFC 9562 (https://www.rfc-editor.org/rfc/rfc9562), which attempts to further clarify the 35+ years of historical issues that exist across the industry by simplifying parts of the spec and declarations there-in; but which still calls out the nuance and importance that many things still deviate and many implementations get this stuff wrong.


This isn't some simple mistake, it isn't some Microsoft specific problem, it isn't something that can just be fixed with new types or people reading docs. It is a widespread and fairly pervasive issue that has existed for 35+ years and which will likely never go away. It is likewise no different from every other C-like struct in existence which has to be serialized as big-endian to be sent as part of a TCP/IP or general network packet. This is just part of programming due to most machines being little-endian and most networking being big-endian.

You have to ensure your data is correctly serialized and deserialized.

@justinbrick
Copy link

@tannergooding Beautifully written - seeing this context in place before everything else makes it much more clear on the reasons that define why it has come to be at this point.

One question that still lives from this, is why has this dated behavior not been fixed under breaking changes? Surely with the addition of .NET versions, it would be reasonable to assume that since these new .NET versions are considered major under semantic versioning, a change to newer .NET may require consideration for changes to existing usages? This has been done already for other areas in .NET.

The code shape as it stands seems perfectly reasonable to maintain exactly as it is, save for getting byte representations - fixing a minor issue such as default byte ordering might cause headaches for those using dated systems, but adhering to standards will make a much more cohesive system in the long run.

If the justification is to match COM, or any other system that might have the same arbitrary structure, I think that is ultimately an issue that should be posed to those still maintaining these legacy systems (most likely Microsoft w/ Windows, I imagine?). It seems to me a fool's errand to bend backwards for supporting systems built in legacy, as if they're legacy to this point, they obviously need to be swapped out, and are only liabilities.

Every other language has since fixed this behavior, if it was an issue for them before - I was not able to find examples like you mention where you've stated it is an ongoing problem, except for legacy systems like COM. It seems unreasonable to justify giving C# a pass, when the tech stacks everywhere else do not demonstrate this issue.

@tannergooding
Copy link

One question that still lives from this, is why has this dated behavior not been fixed under breaking changes

Because it wouldn't fix it, it would just cause more breaks including introducing security vulnerabilities and other problems, particularly as it applies to existing databases.

What would actually happen is that the millions of lines of existing COM, RPC, and other code that rely on the existing little-endian behavior would now silently break when they roll forward onto modern .NET. In practice this risks arbitrary code execution and potentially even security vulnerabilities due to buffer overruns, stack return overwriting, and all number of other issues. You would equally find that existing databases that have been using Guid for their ID's for 25+ years now can no longer resolve their keys or worse they resolve to the wrong key. Codebases that are manually working around any known bugs in a database provider would also be broken.

The only viable fix, both due to the sheer amount of existing code but also due to the potential confusion that introducing a new type would cause, is to have overloads of the serialization/deserialization APIs that allow you to specify the endianness. This is also then inline with how you need to consider serialization for every other type, including types like int or long which are much more common to serialize as part of databases, over the network, etc.

Every other language has since fixed this behavior, if it was an issue for them before

Every language has this same general issue where the user needs to be considerate of the serialization format. Newer languages that provide a built-in UUID type tend to default to big-endian, but you still have to be considerate when interacting with things like RPC or various file systems. Older languages tend to not provide anything built-in and so what you get is dependent on the library/framework you're using. Where something is provided you still have to consider which its documented to use and what the target format is.

I was not able to find examples like you mention where you've stated it is an ongoing problem

One common example that every computer has to deal with is ACPI (Advanced Configuration and Power Interface). This is the fundamental power management interface for all computers that allows it to do things like sleep, hibernate, shutdown, restart, etc. There are also various RPC (Remote Procedure Call), file system, and other specs (particularly those with origins from the early/mid 90's) that all document themselves as following the ISO/IEC 11578:1996 - OSI - RPC or DCE 1.1 spec which maintains "host endianness". It's only newer specs, typically those started post 2005, which tend to instead use RFC 4122. One example of such a newer spec is UEFI which is maintained alongside the ACPI spec and so is a core scenario that any modern OS needs to handle both big and little-endian UUIDs as part of their basic boot process and general hardware management.

These would be areas where other languages have the inverse problem to C#/.NET. That is, the default serialization is big-endian, but they must explicitly ensure they load/store as little-endian on most machines to correctly interface with these scenarios.

Serialization is always something code needs to be considerate of and binary serialization in particular is tricky. It is something where apps need to be explicit and where they can easily write bugs if they aren't. What works on an x64 machine (little-endian) may not work on an IBM Z/Architecture machine (big-endian). It may not work between the Xbox 360 (PowerPC - Big Endian) vs the Xbox One (x64 - Little Endian) or on Arm devices where the default is little-endian, but where they can support and sometimes even dynamically toggle the mode they target.

It is why we (and other ecosystems) typically provide both big and little-endian APIs where binary serialization is a consideration. So developers can do the right thing based on host architecture and depending on what domain they are working with (networking, some spec for a file extension like elf/pe/png/jpg/zip/etc, file system, hardware, ...). Because it is a fundamental and very basic consideration that people can't get away from, no matter how the ecosystem is setup.

@MilanNemeth-Eviden
Copy link

MilanNemeth-Eviden commented Dec 7, 2025

My 2 cents: MongoDB's C# driver perfectly handles byte order differences...
So in theory, network transmission and DB ops can benefit from big-endianness, while the app can leverage little-endianness at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment