Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 27 Jun 2012

32 bits should be enough for anyone

For a brief while, 32-bit Intel machines were the de facto standard in commodity hardware , and life was simple. Certainly, it's an ugly architecture, gigantically overcomplicated by backwards-compatibility. Its virtual addressing features are terrifying. But the subset of it which user-level programmers on modern OSes use is fairly comprehensible. There is an ISA-defined stack with its own registers and a well-defined calling convention. Pointers and most integers are both 32 bits, meaning that the “word” is a useful and well-understood unit of storage.

All this changed in the early 2000s as AMD's 64-bit extension of the ISA came into popularity. Upon us were forced bigger integers, bigger pointers, and a more complicated stack and calling convention (in the name of “performance”, but at huge cost in complexity). I believe these were completely retrograde steps. Now that pointers are 64 bits, our software's memory footprint and disk footprint are bloated considerably. To “alleviate” this, and to avoid certain paradoxes about the size relationships between short, int and long, an int in most C compilers stayed at 32 bits. Unfortunately, this is completely braindead, because int is very much supposed to be a word-sized type. This is the reason that C's “defaults to int” semantics, as applying to unprototyped functions and untyped variables, are sane.

Does this matter? Yes! Here is some code that was mysteriously segfaulting for me this morning. It's from DTrace, or more specifically, Paul Fox's Linux port of it.

if ((P->rap = rd_new(P, P->pid)) != NULL)
  (void) rd_loadobj_iter(P->rap, map_iter, P);

Somehow, the pointer returned by the rd_new---which just wraps a simple calloc() call---gets corrupted immediately after returning. Suspiciously, said corruption is that the top four bytes are 0xffffffff, whereas the lower four bytes are those of the pointer returned by calloc(). Inspecting the disassembly around the rd_new call, we see something suspicious.

   0x000000000047ed12 <+150>:   callq  0x462bc6 <rd_new>
=> 0x000000000047ed17 <+155>:   cltq

What's this cltq thing? Well, it takes the lower 32 bits of %rax (the 64-bit register holding the return value from rd_new()) and sign-extends them to fill the full 64 bits. This is exactly the corruption I was seeing. Why did the compiler insert this unwanted instruction? The answer is revealed if we recompile the file with -Wall.

Psymtab.c:645:3: warning: implicit declaration of function `rd_new' [-Wimplicit-function-declaration]

The function is not prototyped, so its return value defaults to int. But because int is now 32 bits wide, and the register holding the return value is 64 bits wide, the compiler helpfully obliterates the top 32 bits of the return value for us by sign-extending the lower 32 bits into them. If the compiler implementors had stuck with the intention of the int data type, that it be exactly a word in size, and therefore that “defaults to int” is sensible, this would not have arisen.

Now, this is clearly sloppy code. We should just fix it so that rd_new() is prototyped. It probably seems a bit of a nonsequitur that I am blaming this problem on 64-bit architectures. But on the other hand, how often in your code have you actually wanted integers that can store values in excess of 2³²? If you are a systems programmer, you might value the ability to encode large offsets. But we already had long long for this. In other cases, the vast bulk of our code deals with small integers, characters and pointers. Giving us an extra 32 bits of width in our ALU operations is a waste of transistors.

Why did we waste them this way? Well, we had to waste them somehow. In the early 2000s, we didn't really know what else to do with them, because (I suspect) there was little perceived demand for multiple cores in the commodity market (outside of servers). Nowadays, we have even more transistors, and even hardware guys realise that giving us 128-bit architectures would be a pointless waste. So, they spent some effort convincing us that we really did want multiple cores after all. And now we are busy complicating our software so that we can “exploit” these too. I have ranted before about how that risks generating a generation's worth of bad software. Perhaps I should say “another generation's worth”.

By the way, I'll concede that 64-bit address spaces can be useful, if they are used to support persistence or sharing. No need for pointer swizzling! But AMD's 64-bit x86 extensions do not provide the separation of protection from mapping to realise the sharing use-case. In other words, switching protection domains still means invalidating the TLB entries of shared mappings. Meanwhile, I haven't seen anyone using the extra address space for accessing persistent storage in a radically new way, although I'd love to see some approaches in this space.

I don't completely doubt the value of multiple cores either. The right way to see parallelism is as an enabler for radically more computation-intensive applications---likely to be in domains such as scientific computation, machine learning, and simulation---than what we can currently support. As I have also ranted about before, I am deeply disturbed by the fervour for mass rewriting of everyday software, and the disruption to the infrastructure it runs on, that is resulting from mindless multicore mania, in the same way that the 64-bit shift has disrupted our infrastructure. It's all in the name of performance, but it costs us far more of human beings' time and energy than it saves.

[/devel] permanent link contact

validate this page