Skip to content

Instantly share code, notes, and snippets.

@CodingKoopa
Last active April 2, 2025 02:41
Show Gist options
  • Select an option

  • Save CodingKoopa/4c9e558c087f27e036988d80a6dbfc08 to your computer and use it in GitHub Desktop.

Select an option

Save CodingKoopa/4c9e558c087f27e036988d80a6dbfc08 to your computer and use it in GitHub Desktop.
Virtual memory in practice

Virtual memory in practice

Hiya! This article will draw some connections between the mechanics of virtual memory and the funny numbers that you see in your operating system's task manager. We will talk about basic systemwide memory statistics as well as per-process properties. The main contribution of this article is illustrating the similarities between Windows and Linux in a way that can be observed using a couple of systems and some curiosity.

Terminology

TODO: Linux does use "commit" in the same way. also, talk about optimistic malloc

TODO: consider https://sw.kovidgoyal.net/kitty/faq/#i-opened-and-closed-a-lot-of-windows-tabs-and-top-shows-kitty-s-memory-usage-is-very-high

Modern operating systems provide programs with virtual memory. This includes a virtual address space which all memory accesses go through.

The program loader initializes the virtual memory by reserving space at the bottom for the kernel and mapping the program image in the middle (including .text, .data, and the full .bss)

The program may create more mappings within the virtual address space by allocating memory (malloc on Linux, VirtualAlloc on Windows) or mapping a file to a virtual memory address (mmap on Linux, CreateFileMapping on Windows).

Windows allows you to reserve a range of virtual addresses without reserving physical memory to back it. Once the program requests physical memory, the kernel transitions the page to committed and guarantees that physical memory is available for it. Linux does not have a distinguished state for virtual pages that are only reserved, but the effect can be emulated. Linux doesn't use the term "committed" either - the top man page seems to loosely use "reserved" for physical pages that are promised to exist.

Even for committed virtual pages, Windows and Linux back them lazily, during page faults (Linux calls this demand paging). A virtual page is said to be a page is resident (RES) if it resides in physical memory, as opposed to residing in swap or not yet backed. On Windows, the working set is said to be the set of all resident pages, with the SetProcessWorkingSetSize API provided to indicate how much resident memory a program needs. Linux does not use the term "working set", opting to just speak of reisdent pages. If you see somebody talking about the working set size (WSS) outside of the context of paging on Windows, they are likely just talking about the size of the data that the program is actively operating on. In fairness, that working set would indeed inform the hint you give to Windows as to how many resident pages to maintain.

System stats

Before going into the process stats we want to decipher, let's talk about systemwide stats for a bit. Since we aren't going inside processes, everything here will be about physical memory and not virtual memory.

At a given moment, not all of your physical memory is going towards processes. Some memory is reserved by the hardware during boot, such as for shared graphics memory. If there is memory left over after allocating for processes, then it is almost entirely put towards caching. This leaves very little memory that is truly free, as explained by the popular site https://www.linuxatemyram.com/:

Like all modern operating systems, Linux is borrowing unused memory for disk caching. Disk caching makes the system much faster and more responsive! There are no downsides, except for confusing users who are new to computing, and unfamiliar with the concept of a filesystem cache. It doesn't generally take memory away from applications.

As explained by the site, Linux's free-memory utility free exposes the following stats:

  • used: Amount of (resident) memory dedicated to processes.
  • buff/cache: Amount of memory dedicated to buffers and caches managed by the kernel.
  • free: Amount of memory that is completely unused.
  • available: Amount of memory that is, for practical purposes, available to processes. This is the sum of free and the part of buff/cache which is able to be freed (almost all of it!)

Caching makes the system go faster, therefore, the operating system aims to minimize free in practice (as it distributes more memory to buff/cache).

All of this goes for Windows too. Task Manager exposes the following measurements:

  • In use: Sum of the resident memory for all processes. Analogous to used.
    • Recall that the definition of "resident" necesitates that the page live in physical memory, not swap.
  • Cached: Memory used by the OS for filesystem caches. Analogous to buff/cache.
  • Available: Memory that can (again, for practical purposes) be used by processes. Analogous to available.
  • Committed: Ratio of how much memory has been committed, to how much can be committed.
    • Recall that, by committing memory, the operating system is pledging to provide physical pages to the process. However, the commit limit is actually the size of physical memory plus the size of the pagefile. It seems that the kernel is open to swapping less-used pages out to disk in order to provide all of the committed pages.

image

The percentage at the top of the Memory tab of task manager indicates the proportion of In use, to the total physical memory (minus hardware-reserved). By the end, we'll understand the per-process amounts too.

image

Task Manager doesn't even expose the free memory, because it's a really confusing and useless number. For that, you'll have to go into the Memory tab of Resource Monitor:

  • In Use: Same as in Task Manager.
  • Standby: Same as Task Manager's Available.
  • Free: The small amount of truly free memory. Analogous to free.

image

Resource Monitor doesn't display systemwide information about committed pages, but it does have per-process info for us to look at.

Process stats

We will start with Windows because its simpler. The official user-facing tools on Windows are only really concerned with consumption of physical pages, whereas Linux includes more info about how the pages are used and usage of non-resident pages.

Stats on Windows

In the view of Resource Monitor, there are two types of programs with respect to CPU usage:

  • Process: standalone programs with a 1:1 relationship between a program image and a process.
  • Service: background process managed by the Service Control Manager and executed within an instance of svchost.exe. One svchost.exe process may host multiple services to reduce resource consumptiom.

For memory usage, however, resource monitor only works at a granularity of processes. This is perhaps because ascribing CPU cycles to a service is easier than understanding memory which could be shared by multiple services.

Here are the memory stats that Resource Monitor exposes for each process:

  • Commit: Total size of pages in this virtual address space in the "committed" state. These are pages that the program has requested backing for, that the operating system has committed to providing (by internal transaction of "memory charges").
  • Working Set: Total size of the working set from before. These are committed pages that are physically backed (as promised!). Conceptually partitioned into private and shared.
  • Shareable: Part of the working set which is eligible to be shared with other processes. This is likely referring to common DLLs, but it's unclear how often this takes effect with the nature of packaging on Windows.
  • Private: Part of the working set which is internal to this process.

Stats on Linux

Linux exposes virtual memory information via procfs, which can be viewed using the top command, e.g. from procps.

  • RES: Total amount of resident memory dedicated to this process, including shared memory.
    • This is analogous to Working Set on Windows. There is no anologue to Commit.
    • The top man page suggests that this includes memory-mapped files (RESfd). It's unclear what this means, since one of the properties of mmap is that it doesn't take up physical memory
  • SHR: Part of the resident memory which may be used by other processes, e.g. libraries. Analogous to Shareable on Windows. There does not appear to be a named analogue of Private.
  • SWAP: Total amount of former resident memory moved to swap.
    • Windows' graphical tools do not seem to expose this.
  • USED: Sum of RES and SWAP.
  • VIRT: Total virtual memory used by the process, "including pages that have been mapped but not used".
    • This is kind of a ridiculous measurement, which will be expanded on during the interpretation section.

Interpretation

There are two goals you may have when investigating memory usage on a system:

  • Determining how much of an impact a program is having on physical memory.
  • Determining how much memory is being used.

Tools provided by the operating system, especially those on Linux, don't make it the easiest to interpret the information they provide.

How much impact is this program having on the system?

If we want to know how much of an impact a process has, we are concerned with how much of our finite physical memory is using.

On Windows, this is Working Set. In the non-detail view, the Memory tab is populated with the working set sizes. Alternatively, Pavel Yosifovich advocates for Committed to assess system state, as committed memory determines when memory allocations will start to fail.

TODO: is "total physical memory" below including hardware-reserved memory?

On Linux's top, this is RES. You can also look at MEM%, which is RES divided by the total physical memory. With Linux's task managers, things are hazy. The Memory column of KDE's System Monitor reported a value that was sometimes nowhere near RES (or VIRT, or USED, or SHR, or DATA, or CODE). It's unclear where this value is coming from, but hovering over it indicates / <total physical memory>, so it's probably reasonable to use for purposes relating to system load.

How much memory does this program use?

If we want to know how much memory a program is using (e.g. as a programmer), then we might want to consider more than just the physical memory it's currently using. Failing this, however, the above methods are satisfactory. After all, if a virtual page isn't backed by a physical page, it could be for good reason: perhaps the program hasn't accessed it yet (and won't for a while), or perhaps the program has accessed it but the OS' swapping heuristic determined that it's not likely to be needed again.

On Windows, Commit is a good choice for this. It's not perfect, since it includes memory that has been requested but not used (which, depending on the program, could be misleading depending on how liberal it is with preemptively allocating memory), but it's worth looking at.

On Linux, USED is precisely what we want: resident physical memory plus swapped out memory, which is exactly what we asked for. VIRT, on the other hand, is way too broad, since it includes not only allocations that haven't been backed (like Windows' Commit), but other kinds of virtual memory mappings. Google Chrome is a well-known case where VIRT is highly misleading: It uses a separate virtual address space for JavaScript objects that hugely inflates the process virtual memory to over a terabyte.

Despite this, top's default behavior is to display VIRT first, and not display USED at all. Oh well!

Appendix

Here are a couple of things that were not touched on in this article:

The detail view in Task Manager introduces the term "active", pertaining to how UWP processes reduce memory usage of inactive processes. You can read more in this article.

Task Manager also includes information about pooled and non-pooled memory. You can read more about this in this Super User post and this MSDN page

places where mmap shines:
- sharing memory between processes (easier for kernel to work with)
- when you can madvise
- can use structures from the file instead of copying
https://stackoverflow.com/a/9818473/5719930
https://stackoverflow.com/a/258097/5719930

a couple of notes: (for anyone who has taken CS350)

  1. free() generally does not return memory to the OS (https://stackoverflow.com/a/1119334/5719930) [tl;dr: this is to prevent external fragmentation in the heap]

  2. you likely know that, roughly, malloc(3) on Linux does not guarantee the memory is available (https://man7.org/linux/man-pages/man3/malloc.3.html). you may also know also that mmap(2) "implements" (!) demand paging / lazy loading of mmap'd files (https://en.wikipedia.org/wiki/Mmap)

something that is useful to realize, though, is that (2) is not really a property of malloc but rather the underlying virtual memory system. even if malloc wanted to not overcommit, that's not really in its jurisdiction! (https://stackoverflow.com/a/48593093/5719930)

naturally, the overcommitting in (2) is powered by page faults (that page in the requested memory upon access). this is easy to see in the mmap case, since mmap is aligned to pages. this also goes for big malloc allocations, since those use an anonymous (read: not mapped to a file), but not small malloc allocations, since those use sbrk(2).sbrk, funnily enough, actually specifies no alignment/rounding! (https://stackoverflow.com/a/27340848/5719930). but i think it's still not inherently safe from overcommitting if you expand the heap into a new page

this is another fun thread: portably changing the size of a shared non-anonymous mapping. https://stackoverflow.com/q/15684771/5719930

the MAP_NORESERVE flag is kind of funny because

  1. under Linux, one does NOT have "the guarantee that it is possible to modify the mapping" (which is acknowledged in the "bugs" section of mmap(2), far below the description of the flag
  2. VM is either heuristically overcommitting (mode 0) or never overcommitting (mode 2). in mode 2 it always performs the accounting + check described at https://www.kernel.org/doc/Documentation/vm/overcommit-accounting (as mentioned in this QEMU patch: https://patchwork.kernel.org/project/qemu-devel/patch/[email protected]/#24155115. in mode 0, it performs the accounting + reservation, unless MAP_NORESERVE is set (https://man7.org/linux/man-pages/man5/proc_sys_vm.5.html). but i thought we didn't reserve memory like that in this mode?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment