Diverting trains of thought, wasting precious time
...if an mmap() could map memory? I've had to ask this question of Linux on 64-bit x86 platforms recently.
For reasons I will only hint at, I want to allocate a huge region of virtual address space for a structure a bit like a linear page table (called a “virtualized page table” on Wikipedia). We rely on a particular behaviour of Linux's mmap(): that mapping some memory isn't the same as committing to the availability of any underlying physical memory. Passing the MAP_NORESERVE flag means that memory will only be allocated when written to, hence allowing us to create a sparse table easily.
I decided my table should have one word per 4KB of memory. For a 32-bit machine, which has 4-byte words and 220 such (aligned) regions, this means I need 4MB of virtual address space for my table (i.e. about a thousandth of the VAS). If we ask mmap() for such a region, it will clearly oblige us. On a 64-bit machine, which has 8-byte words and 252 such regions, I need 255 bytes of virtual address space for my table---32 petabytes, or about eight billion times as much as in the 32-bit case, but again, only a small fraction of the total address space (in this case a 512th, because words are twice as big).
Here's a quick program you can run to test whether you can do an mmap() of a given size.
#include <stdio.h> #include <errno.h> #include <string.h> #include <stdlib.h> #include <sys/mman.h> #include <assert.h> int main(int argc, char **argv) { assert(argc > 1); size_t mapping_size = strtol(argv[1], NULL, 0); assert(errno != ERANGE); assert(mapping_size > 0); assert(sizeof(size_t) == 8); void *ret = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0); if (ret == MAP_FAILED) { fprintf(stderr, "error: %s\n", strerror(errno)); return 1; } else { fprintf(stderr, "success!\n"); return 0; } }
And here's a shell script to drive it with powers of two until it fails.
#!/bin/bash for exp in `seq 10 50`; do ./test $(( 2 ** $exp )) || break; echo "Succeeded for 2 ^ $exp" done
I'd be curious to know whether anyone on an x86-64 Linux machine maxes out anywhere different than 246 bytes. The kernel source will have the answer, but I can't be bothered wading through it right now. Interestingly, turning off the overcommit check (i.e. writing "1" to /proc/sys/vm/overcommit_memory) doesn't increase the limit for me.
By the way, I'm using strtol because atol seemed to be narrowing the result to 32 bits even though a long is 64 bits. Instead of 231 I got -231, which unsurprisingly made mmap() fail. This seems like a bug, but probably isn't for some reason (possibly including a stupid mistake by me).
As you might have guessed, I'm using this huge region of memory as a big flat structure to record metadata about memory. The main trick of a linear page table is that we can use virtual memory to encode large sparse arrays, without allocating memory for page-sized regions of the table that are empty. This generalises to sparse tables other than page tables. The one I'm building is for tracking allocated heap regions.
[Update, 2011-3-23: thanks to Malcolm Scott who pointed out that my problem might be more tractable, because current x86-64 processors only implement a 48-bit address space. This also means that the 46-bit limit makes more sense---my mmap() successfully allocated a quarter of the usable virtual address space! Now I'm wondering: are those 48 bits something I can rely on for the nominal x86-64 architecture, or will running the same binaries on future hardware silently issue addresses from the larger 64-bit space? For now it doesn't really matter, but send answers if you have them (on a postcard, if you like) please.]
[/devel] permanent link contact