Conventional arrays offer indexed storage that can be updated in place
A[k] := x
Purely functional languages do not have arrays implemented as a contiguous block of memory locations.
A functional array is a mapping from a finite range of integers, with lookup and update functions
lookup : 'a array -> int -> 'a update : 'a array -> int -> 'a -> 'a array
update
returns a new array, which is the same as
the old array except for the value at the given index.
We can implement an array as a binary tree. The binary digits of the subscript dictate the path down the tree
lookup
uses the least-significant bit at each step
to decide whether to go left or right.
- fun lookup Lf _ = raise Array | lookup (Br(elt, l, r)) i = if i = 1 then elt else if i mod 2 = 0 then lookup l (i div 2) else lookup r (i div 2); > val 'a lookup = fn : 'a tree -> int -> 'a
update
is similar, but it copies everything
on the way down. It can also create a new branch, but only if
no branches were already created in the ancestral path. This is
sufficient provided there are no gaps in the subscripting
- fun update Lf 1 v = Br (v, Lf, Lf) | update Lf _ _ = raise Array | update (Br (_, l, r)) 1 v = Br (v, l, r) | update (Br (elt, l, r)) i v = if i mod 2 = 0 then Br (elt, update l (i div 2) v, r) else Br (elt, l, update r (i div 2) v); > val 'a update = fn : 'a tree -> int -> 'a -> 'a tree
Consider a representation of binary numbers as a list of digits, with the least-significant bit first
- datatype digit = Zero | One;
We can write increment and decrement functions that carry and borrow as needed
- fun inc [] = [One] | inc (Zero :: rest) = One :: rest | inc (One :: rest) = Zero :: inc rest; > val inc = fn : digit list -> digit list - fun dec [One] = [] | dec (One :: rest) = Zero :: rest | dec (Zero :: rest) = One :: dec rest; > val dec = fn : digit list -> digit list
Another way to represent binary numbers is to skip zeros and only store ones. This requires marking each one, possibly with a weight.
Our representation can now be an int list
.
Carry and borrow operations work based on the weights
- fun carry w [] = [w] | carry w (ws as w' :: ws') = if w < w' then w :: ws else carry (2*w) ws'; > val carry = fn : int -> int list -> int list - fun borrow w (ws as w' :: ws') = if w = w' then ws' else w :: borrow (2*w) ws; > val borrow = fn : int -> int list -> int list
Increment and decrement operations are just special cases of carry and borrow
- val inc = carry 1; > val inc = fn : int list -> int list - val dec = borrow 1; > val dec = fn : int list -> int list
The increment and decrement operations on binary numbers can
ripple all the way through the number, taking
O(log n)
steps.
We can do better by changing the way we represent numbers. Instead of
the weight of each digit at position n
being
2n
, we set it to
(2n+1-1)
.
We further change our system to allow the digits 0, 1, and 2, but stipulate that a 2 is only allowed as the first non-zero digit.
In this representation, the decimal number 92
could be
written 002101
(least-significant digit first).
This system yields a single, canonical representation for every natural number.
Observe that
2·(2n+1-1) + 1 = (2n+2-1)
In other words, adding a 1 to a 2 yields a 1 in the next position.
We can increment a skew binary number that contains a two by resetting the two to zero and incrementing the next digit from zero to one, or from one to two (the second digit cannot be two).
To increment a skew binary number that doesn't contain a two, increment the lowest digit from zero to one or one to two. The result will still be in canonical form.
Note that there are no rippling carry operations. If we use a sparse representation, increment and decrement operations run in constant time.
The positions in skew binary numbers have weights:
1, 3, 7, 15, 31, 63, 127, 255, 1023, 2047, …
Counting looks like this (again with least-significant digit first):
1, 2, 01, 11, 21, 02, 001, 101, 201, 011, 111, 211, 021, 002, 0001, 1001, 2001, 0101, 1101, 2101, 0201, …
The code is simpler than the description. We represent numbers as a list of integer weights. The same weight repeated indicates a two
- fun inc (ws as w1 :: w2 :: rest) = if w1 = w2 then (1 + w1 + w2) :: rest else 1 :: ws | inc ws = 1 :: ws; > val inc = fn : int list -> int list - fun dec (1 :: ws) = ws | dec (w :: ws) = (w div 2) :: (w div 2) :: ws; > val dec = fn : int list -> int list
With this representation, counting looks like:
[1], [1,1], [3], [1,3], [1,1,3], [3,3], [7], [1,7], [1,1,7], [3,7], [1,3,7], [1,1,3,7], [3,3,7], [7,7], [15], …
Normal lists have O(1)
cons
,
head
, and tail
operations, but
looking up an element by subscript n
takes
n
operations. With a data structure based
on skew binary numbers, we can retain O(1)
complexity
for list operations, and permit lookup and update operations by subscript
with O(log n)
complexity.
The idea is to have a skew binary list, where each position holds a binary tree with as many elements as the weight of that position.
Here is how a skew binary random-access list would look with
13 elements, i.e., 021
or
[3, 3, 7]
:
Here is how a skew binary random-access list would look with
17 elements, i.e., 2001
or
[1, 1, 15]
:
We use a datatype for the trees. Each tree is a complete preorder binary tree with the following datatype
- datatype 'a tree = Lf of 'a | Br of 'a * 'a tree * 'a tree;
The skew binary random-access list is just a list of
(weight, tree)
pairs, where each tree
is a complete binary tree with weight
elements, and the
weights follow the rules outlined earlier for skew binary numbers.
The cons
, head
, and tail
operations are patterned after inc
and dec
fun cons (x, ts as (w1, t1) :: (w2, t2) :: ts') = if w1 = w2 then (1+w1+w2, Br(x, t1, t2)) :: ts' else (1, Lf x) :: ts | cons (x, ts) = (1, Lf x) :: ts fun head [] = raise Empty | head ((1, Lf x) :: ts) = x | head ((w, Br(x, t1, t2)) :: ts) = x fun tail [] = raise Empty | tail ((1, Lf x) :: ts) = ts | tail ((w, Br(x, t1, t2)) :: ts) = (w div 2, t1) :: (w div 2, t2) :: ts; > val 'a cons = fn : 'a * (int * 'a tree) list -> (int * 'a tree) list val 'a head = fn : (int * 'a tree) list -> 'a val 'a tail = fn : (int * 'a tree) list -> (int * 'a tree) list
Lookup operations have two parts: first scan the list to find the right tree, and then search the tree for the right element. Given an index, the first step is just a matter of skipping trees and subtracting their weights from the index
fun lookup (i, []) = raise Subscript | lookup (i, (w, t) :: ts) = if i < w then lookupTree (w, i, t) else lookup (i - w, ts); > val 'a lookup = fn : int * (int * 'a tree) list -> 'a
Searching the tree is similar to other tree searches we've seen, but it uses the subscript to search the tree in preorder, i.e., ordered root node first, then everything to the left, the everything to the right
fun lookupTree (1, 0, Lf x) = x | lookupTree (1, i, Lf x) = raise Subscript | lookupTree (w, 0, Br(x, t1, t2)) = x | lookupTree (w, i, Br(x, t1, t2)) = if i <= w div 2 then lookupTree (w div 2, i-1, t1) else lookupTree (w div 2, i-1 - w div 2, t2); > val 'a lookupTree = fn : int * int * 'a tree -> 'a
Update is very similar, but it makes copies as it goes
fun updateTree (1, 0, y, Lf x) = Lf y | updateTree (1, i, y, Lf x) = raise Subscript | updateTree (w, 0, y, Br(x, t1, t2)) = Br(y, t1, t2) | updateTree (w, i, y, Br(x, t1, t2)) = if i <= w div 2 then Br(x, updateTree (w div 2, i-1, y, t1), t2) else Br(x, t1, updateTree (w div 2, i-1 - w div 2, y, t2)) fun update (i, y, []) = raise Subscript | update (i, y, (w, t) :: ts) = if i < w then (w, updateTree (w, i, y, t)) :: ts else (w, t) :: update (i-w, y, ts); > val 'a updateTree = fn : int * int * 'a * 'a tree -> 'a tree val 'a update = fn : int * 'a * (int * 'a tree) list -> (int * 'a tree) list