Skip to content

Instantly share code, notes, and snippets.

@SchrodingerZhu
Last active November 26, 2024 05:53
Show Gist options
  • Select an option

  • Save SchrodingerZhu/4b26e2dead25bce907aba8bb31a97d82 to your computer and use it in GitHub Desktop.

Select an option

Save SchrodingerZhu/4b26e2dead25bce907aba8bb31a97d82 to your computer and use it in GitHub Desktop.
UCRT Notes

UCRT Notes

Background

At LLVM libc, I am exploring a harder approach (than MinGW) to implement a brand new CRT. As such, I am reading the UCRT code (open sourced portion). This document is for recording some interesting findings I have come across in the codebase. I will also write down some thoughts that might be useful for future implementation, which does not restrict to the UCRT codebase.

The code is from https://www.nuget.org/packages/Microsoft.Windows.SDK.CRTSource.

For convenience, there is also a maintained version on github, thanks to @huangqinjin.

For the time being, I will focus on approaches that are related to C23 APIs. POSIX compatibility is not my first priority.

General initialization order

It is at initialization.cpp. I guess I will go through all the functions there, eventually. But my notes may not be in the exact order.

  • Notice that initialize_global_variables only initializes locale data, which is out of the concern of llvm-libc as we have argued that programs with serious convern reguarding locale shall never use POSIX APIs to handle locale.

How does UCRT maintain environment variables and commandline arguments?

The dirty part of the code is at env, which is rather complicated as the UCRT itself maintains a table layer before referring to OS.

One can trace the code into the lowest-level part, where the interaction with OS is done with APIs provided in processenv.h.

At startup, environment and argument handles are populated by calling system APIs (See related startup code ). Then the maintainence of the process environment table is done separately as in the env subdirectory.

One complication for windows is that it requires the CRT to maintain the ascii and unicode variants in the same time.

How is TLS setup?

If you compile the following code with clang-cl:

inline thread_local long t {};
long foo() {
    return t;
}

You will get the following code for x86-64 target:

long foo(void):                          # @"?foo@@YAJXZ"
        mov     eax, dword ptr [rip + _tls_index]
        mov     rcx, qword ptr gs:[88]
        mov     rax, qword ptr [rcx + 8*rax]
        mov     eax, dword ptr [rax + long t@SECREL32]
        ret
long t:

So TLS data for C programs on windows are located via several indirections.

For now, I will only focus on how the _tls_index value is setup. If you refer to per_thread_data.cpp, you can see that such index is initialized via TlsAlloc. Well, it is actually using FlsAlloc, which is due to historical limitations of the Store Apps. I think we can go straight with the TlsAlloc API as we do not need to consider legacy platforms.

Security Cookie

Let's take a detour to learn about security cookie first:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment