Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 22 Mar 2011

How much memory could an mmap() map...

...if an mmap() could map memory? I've had to ask this question of Linux on 64-bit x86 platforms recently.

For reasons I will only hint at, I want to allocate a huge region of virtual address space for a structure a bit like a linear page table (called a “virtualized page table” on Wikipedia). We rely on a particular behaviour of Linux's mmap(): that mapping some memory isn't the same as committing to the availability of any underlying physical memory. Passing the MAP_NORESERVE flag means that memory will only be allocated when written to, hence allowing us to create a sparse table easily.

I decided my table should have one word per 4KB of memory. For a 32-bit machine, which has 4-byte words and 220 such (aligned) regions, this means I need 4MB of virtual address space for my table (i.e. about a thousandth of the VAS). If we ask mmap() for such a region, it will clearly oblige us. On a 64-bit machine, which has 8-byte words and 252 such regions, I need 255 bytes of virtual address space for my table---32 petabytes, or about eight billion times as much as in the 32-bit case, but again, only a small fraction of the total address space (in this case a 512th, because words are twice as big).

Here's a quick program you can run to test whether you can do an mmap() of a given size.

#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <assert.h>

int main(int argc, char **argv)
{
    assert(argc > 1);
    size_t mapping_size = strtol(argv[1], NULL, 0);
    assert(errno != ERANGE);
    assert(mapping_size > 0);
    assert(sizeof(size_t) == 8);
        
    void *ret = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, 
        MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0);

    if (ret == MAP_FAILED)
    {
        fprintf(stderr, "error: %s\n", strerror(errno));
        return 1;
    }
    else 
    {
        fprintf(stderr, "success!\n");
        return 0;
    }
}

And here's a shell script to drive it with powers of two until it fails.

#!/bin/bash

for exp in `seq 10 50`; do
    ./test $(( 2 ** $exp )) || break;
    echo "Succeeded for 2 ^ $exp"
done

I'd be curious to know whether anyone on an x86-64 Linux machine maxes out anywhere different than 246 bytes. The kernel source will have the answer, but I can't be bothered wading through it right now. Interestingly, turning off the overcommit check (i.e. writing "1" to /proc/sys/vm/overcommit_memory) doesn't increase the limit for me.

By the way, I'm using strtol because atol seemed to be narrowing the result to 32 bits even though a long is 64 bits. Instead of 231 I got -231, which unsurprisingly made mmap() fail. This seems like a bug, but probably isn't for some reason (possibly including a stupid mistake by me).

As you might have guessed, I'm using this huge region of memory as a big flat structure to record metadata about memory. The main trick of a linear page table is that we can use virtual memory to encode large sparse arrays, without allocating memory for page-sized regions of the table that are empty. This generalises to sparse tables other than page tables. The one I'm building is for tracking allocated heap regions.

[Update, 2011-3-23: thanks to Malcolm Scott who pointed out that my problem might be more tractable, because current x86-64 processors only implement a 48-bit address space. This also means that the 46-bit limit makes more sense---my mmap() successfully allocated a quarter of the usable virtual address space! Now I'm wondering: are those 48 bits something I can rely on for the nominal x86-64 architecture, or will running the same binaries on future hardware silently issue addresses from the larger 64-bit space? For now it doesn't really matter, but send answers if you have them (on a postcard, if you like) please.]

[/devel] permanent link contact


Powered by blosxom

validate this page