Internet Draft -- Expires Nov. 20, 1992 PRELIMINARY DRAFT: Pip: The `P' Internet Protocol Paul F. Tsuchiya Bellcore tsuchiya@thumper.bellcore.com May 19, 1992 Status This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Disclaimer: This text version does not contain the figures from the postscript version. As such, it is missing information essential to the paper, and so it is strongly suggested that the postscript version be read. 1.0 Purpose of this draft Pip is an IP protocol that scales, encodes policy, and is high speed. The purpose of this draft is to explain the basic concepts behind Pip so that people can start thinking about potential pitfalls. I am proposing Pip as an alternative to the two "medium term" proposals that emerged from the Road (Routing and Addressing) group to deal with the dual IP problems of scaling and address depletion. Because this proposal, which represents new ideas, is competing with old (and therefore well thought-out) ideas, I wish to circulate it (and get the process started) as quickly as possible, albeit in not as complete a form as I would like. I expect to have a complete proposal by the beginning of September. There will be a plenary presentation and a BOF covering this material at the Boston meeting of IETF. 2.0 Pip General Pip has the following features: 1. Pip carries multiple address types in a common format. As such, it is beneficial for transition from one address to another, and for future evolution (of routing techniques as well as of addressing schemes). 2. The Pip address is completely general (multiple levels of hierarchy, expands to any number of systems). 3. The Pip address is compact-it grows with the number of systems. 4. The Pip address efficiently encodes policy (source-based) routes, both in "long form" (explicit path) and "short form" (path identifier). 5. Because the Pip address can be a path identifier (multi-layer if de- sired, like the ATM VCI/VPI), Pip can be used in a connection-orient- ed fashion (this paper only briefly touches on mechanisms for controlling connections). 6. The Pip address includes multicasting (potentially substantially more sophisticated than what is for IP multicast numbers, for instance, hier- archical multicast). 7. Pip efficiently encodes QOS (Quality-of-Service) information. 8. The routing table lookup with Pip is well-bounded (by the depth of the address hierarchy). 9. Pip accommodates "multiple defaults" routing from (multi-homed) stub domains. 10. Pip allows intra-domain routing and hosts to operate with no notion of the "inter-domain" parts of their address, if desired. This is equiva- lent to current IP hosts and intra-domain routers not needing to know their own network number. 11. Pip accommodates tunneling across transit domains. 12. By virtue of 8 and 9, Pip accommodates separation of interior and ex- terior routing. 13. Pip simplifies handling mobile systems (by having flat network layer identifiers). In short, Pip is a "next generation" protocol, intended to allow the internet to evolve over the foreseeable future. One of the design philosophies behind Pip is that it encodes all "routing" information (what is traditionally spread over the address and QOS fields) in a single structure (the Routing Directive). The rules for parsing the structure are simple on one hand, but provide a rich set of routing functions. Therefore, it is possible to build a single forwarding engine that will accommodate many different types of routing styles, including traditional hierarchical addresses, policy, source route, and virtual circuit. This way, the forwarding engine can be built in hardware and can remain constant even while internet routing evolves. Another design philosophy behind Pip is that it delays the definition of how internet packet should be composed and interpreted. The meaning of addresses and QOS information are dynamically determined by information in Directory Services, distributed protocols such as routing protocols, and MIBs, rather than in a protocol specification. Current internet protocols have continuously been moving towards this philosophy, but with header formats that are not conducive to late semantic definition. Pip facilitates late semantic definition of the internet protocol header. This on one hand makes it easier to evolve the internet incrementally, but requires that all systems (hosts, routers, and directory servers) be a little smarter, and that algorithms be a little more complex. This, in a nutshell, is the trade-off being made by Pip. 3.0 Transition Approach Like IP, Pip by itself is nothing more than a header format and some rules about how to forward the header. It is nothing without routing and addressing and related algorithms behind it. But since Pip can encode the semantics of existing internet headers (addresses, QOS, etc.), it can take advantage of existing routing protocols and addressing schemes. This is one of the main virtues of the proposal to move to CLNP [OSI2]-that it takes advantage of an existing body of work. However, Pip will allow us to move forward into advanced features that CLNP will not handle, while still allowing us to take advantage of existing work (although not as easily as moving to CLNP will). Since Pip can encode backbone-oriented "addresses" that are semantically equivalent to NSAP addresses, transition to Pip will be almost identical to the transition to CLNP already described by Callon [Ref]. Once most of IP has disappeared (and therefore scaling and address depletion are no longer concerns), we can evolve advanced features into the internet (policy, mobility, flow control) without having to change the internet protocol. (Of course not having to change the internet protocol doesn't mean not having to change routers. But not having to change the internet protocol is still better than having to change it, especially because it facilitates piece-wise evolution). In the following sections, I show how Pip works outside of the context of interoperation with existing addressing and routing schemes. 4.0 Pip Header Structure Figure 1 shows the Pip header structure. The Pip header has 5 parts not found (at least in this form) in current internet protocols. They are the Handling Directive (HD), the Tunnel, the Logical Router (LR), the Routing Hints (RH), and the IDs. While these parts are fundamental to Pip, the details of their layout, and the layout of other fields, is open to change. The IDs field contains flat (non-hierarchical) values that do nothing more than identify the source and destination of a Pip packet. The Routing Directive (RD), which consists of the Tunnel, the LR and the RH, contains routing information. Either the Tunnel or the RH are used, but not both. The RH holds routing information such as (hierarchical) addressing, source-route (including policy), and virtual circuit information. The Tunnel simply marks entry and exit points of a domain, and is used to temporarily over-ride the RH. The LR holds route-effecting QOS information (such as routing metrics), plus various information needed to make the RH operate properly. The HD holds non-route- effecting QOS information, such as queueing directives, congestion avoidance and control, and priority. This packet structure better represents internet protocol functions than traditional internet protocols. For instance, traditional internet protocols combine the functions of identification and routing into the address fields. Doing this generally limits the flexibility of the protocol. For instance, host mobility is harder when the address combines these two functions. Traditional internet protocols also split the routing function over multiple fields (the address and the QOS fields). While this doesn't necessarily limit functionality, it generally complicates the routing table lookup function, or more accurately, it generally results in router implementations that ignore the QOS fields, thus making it harder to add QOS routing to the existing infrastructure. Traditional internet protocols must use self-encapsulation in order to tunnel through groups of routers. Pip has a specific field for this purpose, thus eliminating the overhead of replicating the entire header. No Pip header checksum is shown in Figure 1. I am undecided as to whether or not one is necessary, particularly since the HD, Hop Count, Tunnel, and RH fields will commonly change values from router to router. In fact, of the first 5+ (32-bit) words, only the first word will potentially not be modified. No fragmentation/reassembly fields are shown. I am strongly inclined to leave these out, and just depend on dynamic MaxPDU discovery to handle this. Finally, no version number field is shown. Protocol identification (at the previous layer) can serve this function. The following sections cover the various parts of the Pip header in detail. 4.1 Boring Parts The "boring" parts of the Pip header are the ID Type field (4 bits), the Options length field (4 bits), the Total Length field (24 bits), and the Protocol field (8 bits), and the Hop Count field (8 bits). The ID Type describes the length and type of the Source and Destination IDs. The IDs can be 0, 4, 6, or 8 octets each (the actual types, which are not so boring, are described in the separate section on IDs below). The Options Length field gives the number of 32-bit options that come after the RD. The Total Length field gives the total length of the Pip packet, including the Pip header, in octets. The maximum size Pip packet is 224 = 16,777,216 octets. This is substantially larger than the corresponding fields in IP or CLNP, both of which allow for maximum packet sizes of 65536 octets. These fields comprise the first 32-bit word. The Protocol field indicates the higher layer protocol, and is equivalent to the IP Protocol field. The Hop Count field counts down the number of hops before the packet should be dropped. It is the same size as the corresponding fields in IP or CLNP, allowing for 256 hops. The Hops field falls on a 32-bit (and 64-bit) boundary, making it convenient to modify. 4.2 Tunnel and Routing Directive (RD) The RD is the most novel and powerful aspect of Pip. The RD is general, compact, and fast. It is general in that it can accommodate any address type and any routing algorithm type, including source-based routing. It is compact in that it encodes hierarchical addresses efficiently. And, it is fast because 1) the number of steps required for the forwarding function is small, even in the worst case, and 2) the same steps are used for forwarding all types of routing, so an efficient and general forwarding engine can be built. The RD composed of three parts, the Tunnel, the Logical Router (LR), and the Routing Hints (RH). Because a router can be playing multiple roles, Pip models a router as multiple "Logical Routers". For instance, a router may be operating at multiple levels of the hierarchy, may be participating in multiple routing algorithms, including multicast, may be operating with multiple routing metrics, and so on. While the function of logical routers is for most purposes a feature, it is required to make the RH mechanism work properly, as is described below. The basic algorithm for finding a route is to 1) determine the forwarding table index, 2) determine which forwarding table to use (that is, which logical router is active for this packet), 3) index directly into the forwarding table (no search technique such as hashing or tree search is necessary) and retrieve the routing information, 4) modify the RD for the next-hop router. This is explained in more detail below (see Section 4.2.4)Tunnel The 32-bit Tunnel is composed of two 16-bit fields, the Source Exit ID (SEI) and the Destination Exit ID (DEI). The DEI comes after the SEI, and so falls on the least significant bits of a word boundary. When the DEI is 0, then the Tunnel is ignored and the RH is used to route the packet. Otherwise, the RD is ignored and the Tunnel is used. The purpose of the Tunnel is as follows. Consider two routers, X and Y, both of which understand the RH (at the level at which the RH is operating). Between X and Y are a series of routers that do not understand the RH (at that level). Assume that a Pip packet (with a NULL Tunnel) arrives at X and should be routed to Y. In order to get the packet to Y, X fills the DEI field with a value that is understood by the intermediate routers to mean "route to Y". X fills the SEI field with a value that is understood by the intermediate routers to mean "route to X". The purpose of the SEI field is to handle the case where a return packet (an error packet or control packet of some sort) needs to be sent (either to X or to the original source host). When Y receives the packet, it recognizes the Tunnel as terminating at itself, writes the Tunnel field to 0, and forwards based on the RH. Tunneling is traditionally useful for preventing external routing information from being required internally. It is also used by the ISIS routing protocol for repairing area partitions. Pip tunneling can be used for both of these purposes. Because of the way "addresses" (called RH Numbers in Pip) are assigned in Pip, however, tunneling turns out to be necessary just to make Pip work. There are no nested tunnels in Pip (that is, tunnels cannot have tunnels). While nested tunnels could be of some use, it seems that the usefulness of tunneling diminishes with the number of nested levels. By having only one level of tunneling, the packet format is simplified (and the size kept small). To make nested tunneling work, it would be necessary to either modify the size of the packet en route (to add and delete tunnels), or for the originating host to put in enough Tunnel fields for the deepest nesting. The former case is difficult because it requires changing the packet size, which doesn't work for instance with (cut-through) ATM switching. The latter requires extra complexity and overhead in informing the originating host how many Tunnel fields to include in the packet. For these reasons, I have chosen to limit tunneling to one level. 4.2.1 Logical Router (LR) As described above, the LR field indicates which of multiple forwarding tables should be used when routing a packet. The many uses of the LR will become clear throughout the coming examples. Note that in theory one can always use different indexing values, rather than different forwarding tables, as a means of distinguishing logical routers. This, however, couples "addressing" (RH numbering) between different logical domains, thus generally complicating things. For instance, one could use different RH values to indicate different QOSs (cost, delay, etc.), but that would require that each system have an RH Number indicating cost, another indicating delay, and so on. So, unless such coupling is convenient, it is best to decouple RH numbering using the LR field. Even though the LR field can be treated as a flat field by a router, the individual bits have specific meaning. My goal is that most or all of the bits' meaning be determined dynamically (via system management or the routing protocol or some other distributed protocol), and not be specified in a standards document. This allows for the maximum flexibility in evolving the protocol (adding new features, purging old ones). For instance, upon booting, a host should, as part of its configuration process, contact a local router and learn the meaning of each bit of the LR field. A network debugger, even, could query attached routers for these definitions, so that meaningful information could be logged and displayed. The following bits are likely to be required: 1. Level. This indicates what level of a hierarchical RH Number is being routed on at a given time. This use of the LR field is only necessary if hierarchical RH Numbers are being used. 2. Multicast. If multicast is used, at least one bit may be needed to indi- cate whether the packet should be multicast or unicast. If several mul- ticast algorithms are in use, multiple bits may be needed. 3. Route-effecting QOS. This would be any QOS type that influences the route chosen, such as cost or high-bandwidth. Note that QOS need not be route effecting. For instance, a QOS type of low delay might only influence how packets are queued (given priority in the queue), but not influence how they are routed. In this case, the HD would have certain bits set aside for "low delay" (actually, priority queue- ing), but the LR would not. In other cases, a given QOS might effect both routing and handling. 4.2.2 Routing Hints (RH) The RH is the most interesting and novel aspect of Pip. It holds what is normally thought of as the "address" in a traditional internet header. It can also hold many other kinds of routing information, such as policy information. The RH consists of the RH Descriptor and the Routing Hint Fields (RHF, see Figure 2). The RH Descriptor tells how to interpret the RHFs. The RHFs are a series of fields, listed in the order that they will be required by the routers in the path from source to destination. This should not be taken to assume that the RHFs necessarily specify a source route, in some conventional sense of the term. Most normally, the RHFs will simply contain a hierarchical source and destination RH Number, where each RHF denotes one level of the hierarchical RH Number. This and other uses of the RHFs (such as virtual circuit or path identifiers, true source routes, and Sirpent- or Paris-style source routes) are given later. Each pair of RHFs are separated by an RHF Relator (RHFR). The RHFR is a two-bit field that shows the relationship between the field before it and the field after. It has three values, up, down, and none. If down, the previous RHF is hierarchically above the subsequent RHF. If up, the previous RHF is hierarchically below the subsequent RHF. If none, the two RHFs are not hierarchically related. The RH Descriptor and RH are parsed as follows. The 6-bit RHF Offset field determines which RHF is currently active. The RHF Length field indicates the size of each RHF (all of which are the same length). The RHF sizes represented by each RHF Length value are given in the following table: After this is a series of 1 or more RHFs. Where the actual values needed in the RHFs vary greatly (some small, some large), this structure will result in a larger RH than seems necessary. I don't know how to shrink each RHF to its smallest size and still make the header parsing simple (and therefore fast). After the RHFs comes enough padding to make the RD fall on a 32-bit word boundary. The combined 10-bit RHF Offset/RHF Length, then, is used to isolate the current RHF that a router should be routing on. A typical implementation on a common CPU/RAM processor would be to use the full 10 bits as a direct index into an array of size 1024, each entry of which contains data on how to isolate the current field. For instance, if RHF Offset = 3 and RHF Length= 8 (meaning each RHF/RHFR is 14 bits long), the data would instruct the processor to fetch the first (32-bit) word of the RH, shift left 10, mask with 0x00003c00, fetch the second word, shift right 22, mask with 0x000003ff, and OR the two results. In this example, the RHF/ RHFR straddled 32-bit word boundaries, and so two fetches were needed. (The RHF Relator should also be saved off at this time to be used later.) Once the RHF is isolated, it is used as a direct index into a forwarding table. The forwarding table can be well populated because (as is discussed later in this paper) the RHF values are chosen not based on how many things might have to be encoded at a given level of the hierarchy, but on how many things are actually encoded at a given level. In other words, the "address" that is ultimately carried in packets is, unlike current internet protocol addresses, well-utilized. In addition to the information in the forwarding table described above, the forwarding table entry must also indicate whether the RHF Offset needs to be decremented. The RHF Offset is usually decremented when a packet crosses a hierarchical boundary. For instance, if the packet was being forwarded based on the equivalent of "network number" through a backbone, the router bordering the indicated network would decrement the RHF Offset so that the next router (the router in the indicated network) would automatically look at the "subnet number" field. Often a single router is acting at two or more levels of the hierarchy, for instance a level 2 router in the ISIS routing protocol. In this case, the forwarding table entry and RHFR would indicate that, instead of routing the packet to another router, the next RHF should also be examined (and, another forwarding table used). It would be unusual to find a router operating at more than three levels of the hierarchy. Further, address hierarchies are shallow. Telephone numbers in the USA have only 4 levels of hierarchy (including the international code). Therefore, the number of iterations of this search is well-bounded. Note that this "field indexing" style of lookup is not just a cute optimization. Pip derives most of its routing flexibility from it, and wouldn't be general without it. 4.2.3 Fowarding Algorithm This section describes the algorithm for forwarding a packet, based on the contents of the Tunnel and the RD (see Figure 3). For expository reasons, the unicast algorithm is defined, followed by the modifications needed for multicast. These same algorithm is used no matter what kind of routing algorithm is being used (hierarchical, policy, source, virtual circuit). Getting the appropriate behavior, according to the routing algorithm used, requires configuring the tables shown in Figure 3 correctly. 1. If the Tunnel Field is not 0, index into the Tunnel Table using the val- ue in the Tunnel Field, and go to step 2. Otherwise (the Tunnel Field is 0), index into the Logical Router Table (LR Table) with the value in the LR Field, and go to step 3. 2. If the Information column contains forwarding info, then modify the Tunnel Field value according to the instructions in the Information column, and forward the packet. Otherwise, if it contains a pointer to the LR Table, set the Tunnel Field to 0 and go to step 1. Otherwise, if it contains a pointer to a forwarding table, then go to step 4. 3. If the Information column contains forwarding info, then modify the LR Field and Tunnel Field values according to the instructions in the Information column, and forward the packet accordingly. Otherwise, if it contains a pointer to another forwarding table, then go to step 4. 4. Using the RH Descriptor (RHF Offset/RHF Length), isolate the cor- rect RHF and RHFR. Using the RHF, index into the correct forward- ing table (determined by the pointer in the previous step). If the Information column contains forwarding info, then modify RHF Off- set field, the value of the isolated RHF, the Tunnel Field, and the LR Field value according to the instructions in the Information column, and forward the packet accordingly. Otherwise, if it contains a pointer to another forwarding table, modify the isolated RHF field value ac- cording to the instructions in the Information column, and repeat step 4 (using the new forwarding table). If tunneling is being used, and the router receiving the Pip packet is not the last router of the tunnel, then the router will find the forwarding information in the Tunnel Table, and not index any other tables. If the router is the last router of the tunnel, and the Tunnel Field has not been set to zero by the previous router, then the router will find a pointer in the Tunnel Table, and forward according to the RH. If tunneling is not being used, the router receiving the packet will normally find a pointer in the Logical Router Table. When a router finds a pointer in a forwarding table (thus pointing it to another forwarding table), it is normally the result of "routing down the hierarchy". That is, the router is operating at multiple levels of the hierarchy, and is parsing the hierarchical RH Number. Section 5 gives examples of the algorithm described above. Multicast Algorithm For multicast, the tables in Figure 3 are modified such that the Information column in each table contains a set of information blocks, each one being a pointer or forwarding info. When there are multiple forwarding info blocks (either in the same table entry, or by virtue of multiple pointers reaching multiple tables), then multiple packets are transmitted. Each packet may have the Tunnel or RD fields modified differently, so each information block contains these instructions. 4.3 Handling Directive (HD) The HD is something of a catch-all field for any packet handling mechanisms that don't influence the route taken by a packet. Typical handling types would be queueing directives, such as priority queueing, security directives, such as encryption, and so on. The meaning of the specific bits is meant to be handled in the same way as the LR-that is, the meaning of the bits is defined dynamically through system management or configuration protocols, not through hard-coded definition in a standards document. Each domain autonomously determines what meaning is assigned to each bit. When different domains use different bits for the same purpose, the value of the HD must be modified when a packet crosses domain borders so that the next domain may correctly interpret the meaning of the HD. The border router determines the proper translation via protocol exchange with the neighboring domain or via system management. By packing all of the handling bits together, an implementation style whereby the HD is used as a direct index into a RAM memory, thus retrieving the appropriate handling mechanisms and values, is possible. This paper does not further discuss the HD. Most notably, it does not discuss how a dynamic routing protocol would propagate HD information. 4.4 IDs When an ID is present, it alone is used to identify the source and destination hosts. However, IDs can be mapped to the associated RH, so that the RH implies a certain ID The ID therefore need not be carried in most packets. This works as follows. When a packet is first sent from a source host X to a destination host Y, the ID is included. The destination host Y, upon receiving the packet, associates the source ID with the "Source RH Number". These are the RHFs that describe the "source address" of the source host (see example 1). When Y returns a packet to X, it writes X's ID in the destination ID field, and X's Source RH Number in the RH (as the Destination RH Number). This indicates to X that Y has recorded the mapping between X's source RHs and X's ID, and subsequent packets from X that contain the same source RH need not include the ID field. If the host is mobile, and changes RH Numbers while communicating with another host, then it includes the ID when it uses a new RH Number. This lets the destination host associate another Source RH Number with the ID, so that subsequent packets can again leave the ID off. An out-of- band message can be used to de-associate no-longer-valid RH Numbers. (If both hosts are mobile, then some kind of third party server will be necessary, so that current RH Numbers can be determined, in case both hosts get new RH Numbers simultaneously.) If the hosts get new RH Numbers often, then the ID can simply be included in every packet. The ID Type field is interpreted as follows. The first two bits indicate the type (and length) of the source ID, and the second two bits indicate the type of the destination ID. The meaning of the four values are: 0 = no IDs; 1 = 32-bit IP number; 2 = 48-bit IEEE 802 number; 3 = 64 bit number. The 64-bit number can have multiple interpretations, including X.121 number, E.164 number, and so on. While the ID field never influences routing, the IP-type ID can be used during transition from IP to Pip to determine how to fill in parts of the RD as the packet traverses the internet. The ID field is padded out to a 32-bit boundary. It may make sense to pad out to a 64-bit boundary, given the introduction of 64-bit word processors. 4.5 Options No options are defined at this time. In the future there might be options to establish virtual paths in lieu of policy routes, reserve bandwidth, manage mobile hosts, manage multicast lists, or whatever. In general, I would assume that, if options are present, the packet leaves the normal forwarding code (or hardware) path for special (and slower) processing. Options are not further discussed in this paper. 4.6 Messages Pip requires the following "ICMP"-type messages: Use/don't use tunneling message Incorrect RH message (usually means not enough levels of RH Number given) Max PDU exceeded notification Received ID incorrect (used to flush old RH Number from sending host) Normal redirect Tunnel redirect ARP The use of these messages are explained by the following examples. 5.0 Examples Following are descriptions of how various routing and addressing styles are used with Pip. These will further explain the use of the RD. 5.1 Example 1: IP-style Hierarchical RH Numbers (Addresses) The examples in this section are primarily for the purpose of introducing the various concepts of Pip, particularly the RD. None of the examples are give the complete algorithm, but they get successively more complex and complete. Later examples (Examples 2 and on) will be complete. Consider the network of Figure 4. The RH Numbers shown correspond to IP-style addressing. The Pip analogue to existing IP and CLNP addressing styles is hierarchical RH Numbers. When plain hierarchical RH Numbers (plain means with no QOS or policy information) are used, the RHFs (and RHFRs) are structured as shown in Figure 5. The first group of RHFs are called the "Source RHFs". These are separated by "up" RHFRs, and are roughly equivalent to the source address in a traditional IP packet. The second (and last) group of RHFs are called the "Destination RHFs". These are separated by "down" RHFRs, and are roughly equivalent to the destination address in a traditional IP packet. The Source RHFs are listed in order of lowest level of the hierarchy first. That is, this field will come in on the wire first. The Destination RHFs are listed in order of highest level of the hierarchy first. Note that this is the order in which the fields (specifically the Destination RHFs in this case) will be used by routers. The RHFR between the source and destination RH Number indicates "none". 5.1.1 Example 1.1: No tunneling, no default routing. Assume that no tunneling is needed, and that default routing is not being used. In other words, the forwarding tables of the routers within the network have network numbers for other networks. The Tunnel Table for router x consists of one entry, indicating that all non-zero tunnel values are invalid. If a Pip packet with a non-zero Tunnel was received, the "Don't use tunneling" message would be sent to the sender. The LR table for router x is as follows: LR table = [ ] For these examples, the only information in the LR Table is that concerning the hierarchical level at which the packet is operating. Since the bits denoting this do not necessarily need to be in the least significant positions of the LR Field, the "LR.level=X" notation implies the index into the LR table. The reason the LR.level=1 is ambiguous is that router x is attached to two level 1 areas (subnets), and therefore wouldn't know which level 1 table (FT1a or FT1b) to use. As seen from x's forwarding tables below, FT2 must first be indexed to determine whether FT1a or FT1b should be used. The forwarding tables for router x are as follows: These table are simplified in that they do not show, for pedagogical reasons, information relating to the RHF Relators. This will be shown in later examples. Example 1.1a: From 2.2.1 to 2.2.2 First consider a packet from 2.2.1 to 2.2.2. Host 2.2.1 would initially make a directory service query and get back an RH Number in the following form: . By comparing its own RH Number with that for the destination, 2.2.1 would conclude that they share the same level 3 and level 2 (that is, are in the same network and subnet). 2.2.1 would then compose the following RD: RD = < Tunnel = 0; LR.level = 1; RHF Offset = 2; RH = 1 (none) 2 >, where "LR.level" indicates the bits in the LR field indicating the hierarchical level, and "RH = 1 (none) 2" means that the first RHF is value 1, the second RHF is value 2, and the RHFR between them is "none". The source knows to set Tunnel = 0 because of a local parameter indicating that tunneling is not in effect. Normally, a host will assume that tunneling is not in effect unless told otherwise (either by a configuration message or by a "Don't use tunneling" error message). The source host initially sets LR.level = 1 because that is the highest uncommon level between source and dest (and therefore a level at which routing must take place). The RH contains the level 1 value from the source (1) followed by the level 1 value from the destination (2). Because the host is setting the RH.level to 1, the host doesn't have to include any RH Number components higher than that in the RH. Since neither value is hierarchically above the other, the RHFR is set to "none". Finally, the RHF Offset is set to point to the beginning of the Destination RHF of the RH (value 2). In all examples, the RHF being pointed to by the RHF Offset will be printed in bold type. If the host knew that strict subnet-per-LAN IP-style RH Numbering were being used, it could deduce that the destination host is on the same LAN as itself, and ARP for the destination. But assuming that the source host doesn't know this, the source host would send the packet to its "default" router, which is x. When router x receives the packet, it goes into the LR table with LR.level=1, and determines that the LR is ambiguous in this case. It therefore sends an "LR ambiguous" message to the host. The host would label router x as being ambiguous at level 1, so that future packets (even to different destinations) would start at level 2. Normally, a configuration message from router x (as part of router discovery) would have prevented the need for the error message. The host composes another RH, this time with level 2 included: RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 1 (up) 2 (none) 2 (down) 2>. Now, the bottom two levels of the source RH Number (2.1) occupy the first two RHFs (but in reverse order), and the bottom two levels of the destination RH Number (2.2) occupy the last two RHFs. When router x received this packet, it would index the LR table with LR.level=2, and determine that forwarding table FT2 should be used. Using the RHF Offset, router x would isolate the third RHF (value 2) from the RH. Router x would index 2 into forwarding table FT2, and retrieve a result indicating that it needs to move to level 1, using forwarding table FT1a. Router x would increment RHF Offset, isolate the fourth RHF (value 2) from the RH, use this as an index into FT1a, and determine that the destination is on subnet 2.2. It would then use an ARP function to discover the LAN RH Number of 2.2.2. Router x would also redirect host 2.2.1. After the redirect, packets from 2.2.1 would go directly to 2.2.2, and would use an RH with only level 1. To form a return packet, 2.2.2 would reverse the order of the RHFs, and calculate the values of LR.level and RHF Offset similarly to the way that 2.2.1 calculated them. As such, 2.2.2 would copy the level of the incoming packet into the return packet. Note that the RH for level 1 packets (after the redirect) would only be 1 word long. Only putting as much of the RH Number in the RH as needed is one reason that Pip is compact. Since most traffic is local, most packets will be able to take advantage of this particular optimization. Example 1.1b: From 2.2.1 to 2.1.3 For a packet from 2.2.1 to 2.1.3, the directory service query would return . By comparing its own RH Number with that for the destination, 2.2.1 would conclude that they share the same level 3 (network), but not the same level 2 or level 1. 2.2.1 would then compose the following RD: RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 1 (up) 2 (none) 1 (down) 3>. The bottom two levels of the source RH Number (2.1) occupy the first two RHFs (but in reverse order), and the bottom two levels of the destination RH Number (1.3) occupy the last two RHFs. When router x receives this packet, it would parse the packet as described above, go into FT2 with index 1, then go into FT1b with index 3, and route the packet to subnet 2.1. Example 1.1c: From 2.2.1 to 1.5.11 For a packet from 2.2.1 to 1.5.11 (a host in Net 1), host 2.2.1 would determine that there is no common level, and so would form an RD starting at level 3: RD = . The full three levels of the source address (2.2.1) occupy the first three RHFs (but in reverse order), and the full three levels of the destination address (1.5.11) occupy the last three RHFs. When router x receives this packet, it would go to forwarding table FT3 (based on the RL.level of 3) with an index of 1, and forward the packet to router z without incrementing the RHF Offset or changing the LR.level. 5.1.2 Example 1.2: With default routing, no tunneling In the previous examples, the level 3 table (FT3), at least in the IP case, would be very large, because it must hold all active network numbers. One way to reduce forwarding table size in general is to use default routing. With current IP networks, default routing works best if there is only one exit point, because since there is only one path out of a private network, default routing doesn't degrade the quality of paths found. If default routing to multiple exits is used, then sometimes a non-optimal exit point can be chosen. With Pip, tunneling would normally be used to handle default routing with multiple exits. For pedagogical purposes, we give an example here where default routing is used without tunneling (again from the network of Figure 4). The level 1 and 2 forwarding tables for router x (FT1a, FT1b, and FT2) are the same as for Example 1.1. The forwarding table for level 3 (FT3), however, has a single entry of: FT3 (level 3, tunnel=0) = [ *, y, 3 ], where * means all possible index values, y means next hop router y, and 3 means the transmitted packet should operate at level 3 (LR.level = 3, RHF Offset = unchanged). Assume the same host pair as Example 1.1c above (2.2.1 to 1.5.11). Host 2.2.1 would form the same RD as shown in example 1.1.c. Upon receiving this packet, router x would not even need to isolate the RHF, because it knows that all packets at level 3 are routed to y. Assuming that y defaults level 3 packets to Backbone 1, the packet would take a longer path than necessary. 5.1.3 Example 1.3: With default routing and tunneling Now, we consider the case where tunneling is in use. The level 1 and 2 forwarding tables (FT1a, FT1b, and FT2) for router x are the same as in the first example. There is no level 3 forwarding table. The Tunnel Table (TT) is shown below: Note that there is a new column in the table (the 4th column). This is the value the Tunnel field gets written to upon transmission. Note that the Tunnel Table is small (just two entries, one for each exit point). Router x's LR table is modified as follows (to indicate the lack of a level 3 forwarding table): LR table = [ The Tunnel Table and level 3 Forwarding Table for router y are as follows: Example 1.3a: From 2.2.1 to 1.5.11, host fails to use tunnel Normally hosts would be configured to use or not use tunnels as appropriate (via some router-to-host configuration protocol). Assume for this example though that host 2.2.1 has somehow not been informed to use tunnels for inter-domain (level 3) traffic. Host 2.2.1 would generate an RD as shown in Example 1.1c. When router x receives this packet, it goes to the LR Table entry for LR.level=3. This results in the error shown. Router x sends an error message to 2.2.1 indicating that it must use tunneling for level 3 traffic. Example 1.3b: From 2.2.1 to 1.5.11, host uses tunnel value Now assume that either because of proper configuration or the error message of the previous example, host 2.2.1 knows to use a tunnel for level 3 traffic. Now, host 2.2.1 generates the following RD: RD = . In general, a host will know which Tunnel values are valid, via a configuration message. Barring this, it probably makes sense to have a convention where, lacking better information, a host simply chooses value 1. The routing algorithm could treat this value to mean "route to closest exit point", so that a single exit point doesn't get overloaded with default-tunneled packets. In this example, host 2.2.1 arbitrarily picks a Tunnel value of 1. Upon receiving this packet, router x indexes into TT by 1 (the Tunnel value), and forwards the packet to router y with no changes in the RD. When y receives the packet, it indexes 1 into its Tunnel Table TT. The resulting entry indicates that the appropriate exit point has been reached (which is y for Tunnel value 1), and that the level 3 (inter-domain) forwarding table FT3 should be consulted. (Alternatively, router x could have written the Tunnel Field to 0 upon transmission to y. In this case, y would go directly to the RH). For this, router y isolates the appropriate RHF in the RH, which is the 4th RHF (destination network number), value 1. The first entry in FT3 reveals that the appropriate exit point is actually z. Therefore, y puts z's tunnel value (2) in the Tunnel field and forwards the packet to z. Router y also sends a "Tunnel Redirect" message to 2.2.1, indicating that for this particular level 3 value (network number 1), the appropriate tunnel value is 2. As a result, subsequent packets from 2.2.1 to 1.*.* (where "*" means "anything") will go via z. Discussion The "Tunnel Redirect" described in Example 1.3a, combined with use of the Tunnel Field, are what make multiple defaults routing work. With multiple defaults routing, the host's relationship with the exit border routers is analogous to a host's relationship with its directly connected (next-hop) routers. In the latter case, the connected router sends a conventional redirect to the host to get to use an alternate router attached to the same network. In the former case, the Tunnel Redirect serves the same purpose with respect to an alternate border router attached to the same stub domain. This is a powerful technique useful for isolating the internal stub routing from external routing. A few more comments about router y's level 3 forwarding tables is called for. Note first that if router y receives an RD with a tunnel of 2 (FT3a, second entry), it will forward that packet onto z. This would be necessary, for instance, if a host on subnet 2.4 tunneled a packet to z. If a packet is tunneled to y destined for network 3, y would write the tunnel to 0 (assuming that it didn't subsequently have to tunnel through backbone 1), and forward the packet onto Backbone 1 (FT3b, third entry). As with router x, router y should never receive an RD at level 3 with a NULL tunnel (except from a mis-configured host). When router y receives a packet from Backbone 1, the RD should indicate level 2, as y's neighbor router in Backbone 1 would know to decrement the LR.level (and increment the RHF Offset) before forwarding a packet to y. 5.1.4 Example 1.4: Using tunneling for policy This example shows how tunneling can be used as a limited policy mechanism. Later examples will show how full policy information can be encoded in the RD. For this example, assume that x's and y's level 3 forwarding tables are as shown in example 1.3, and that z's level 3 forwarding tables are structured similarly to y's, except that z uses Backbone 2 to get to Network 1, uses y to get to Network 3, and uses Backbone 2 to get to Network 4. Therefore, there are two ways to get to Network 4, either via Backbone 1 (via y), or via Backbone 2 (via z). Assume that Host 2.2.1 has a packet to send to a host on Network 4. If host uses a tunnel value of 1, then the packet will travel via Backbone 1. If the host uses a tunnel value of 2, then the packet will travel via Backbone 2. In this manner, the tunnel value acts as a policy mechanism. Although it is not the best method for getting policy, note that, with the topology of Figure 4, it could be possible for Host 2.2.1 to choose between Backbone 1 and 2 even for sending packets to Networks 1 or 3. This could be done, for instance, by modifying y's and z's routing tables so that they didn't send tunnel redirects, but instead blindly forwarded the packet onto their connected backbones. (This is assuming that Network 2 does not advertise itself as a transit network, and therefore packets would not be routed back to 2, thus causing a loop.) A variation on this would be to define a bit in the LR to mean "force indicated tunnel", so that if this bit was off, the border routers (y or z) would pick the best path, but if this bit were on, it would override the router's better judgement and force the packet directly onto the backbone as described in the last paragraph. As with all host-initiated policy mechanisms, this requires that the host (or policy server) be knowledgable about the route it is choosing. 5.2 Example 2: Backbone-oriented Hierarchical RH Numbers It is well-known that IP-style addresses do not scale well. NSAP addresses (at least as defined by RFC 1237 [CGC]) scale better because the addresses are rooted at the backbones. Figure 6 shows an example topology and backbone-oriented RH Numbers for use with this and subsequent examples. Each backbone has its own number, which is advertised in routing updates to all other backbones. (Hierarchically grouped backbones, for instance, where all backbones in a country are given the same RH Number prefix, are possible, but are not shown in Figure 6.) Note that stub network X has two levels of hierarchy internally, while stub Y only has one. One of the outstanding problems with the address assignment technique of RFC 1237 is how to handle stub networks that are attached to more than one backbone. One solution is to have multiple RH Numbers, one per attached backbone. This type of solution can be used for Pip. For instance, stub X (and its hosts) is shown to have two RH Number prefixes (1.14 and 26.81), one reflecting its attachment to A and the other its attachment to D. The negative aspects of the multiple addresses solution are not as bad with Pip as with CLNP. Indeed, with Pip, hosts can be completely isolated from inter-domain RH Numbering conventions. One reason that multiple RH Number prefixes is easier with Pip is the simple fact that "inter-domain" levels of the RH Number are not included in intra-domain RDs. For instance, the RD for a packet from host w to host y would be: RD = < Tunnel = 0; LR.level = 2; RHF Offset = 3; RH = 9 (up) 27 (none) 12 (down) 58>. Neither of the prefixes for stub domain X (1.14 or 26.81) are in the packet. Internal communications are not affected by backbone RH Numbering conventions. Hosts may (or may not) need to know their backbone RH Numbers for inter-domain traffic, and so the functions for reconfiguring these parts of all host RH Numbers may be required. This would be done alongside other host configuration (such as how to use tunnels, etc.), and is not particularly difficult. Another reason why multiple RH Numbers is less of a problem with Pip is that the transport protocol uses only the ID field for the purpose of labeling connections. This means that the RH Number prefix (or any other part of the RD) can change arbitrarily during a transport connection without effecting the connection. Appendix A shows the forwarding tables for various routers in Figure 6. 5.2.1 Example 2.1: Inter-domain communications without backbone selection (with tunneling) For these examples, host x wishes to send a packet to host z, and does not care which backbone (A or D) is used, but would like the routers to choose the best path. Assume that routing will find D as the best backbone for reaching Y from X. Example 2.1a: Complete host isolation from external RH Numbering conventions. This example describes a mode of operation where hosts (or internal routers) do not need to know the "inter-domain" components of their RH Numbers (although directory systems still must). This is the extreme case of isolating internal network operation from external influences. At a minimum, the host must initially know 1) that the stub-domain border routers will handle the inter-domain RH Numbers, and 2) which bit in the LR Field determines that so-called RH-Tunneling will be used to find exit routers. The host must eventually know 1) how many levels of inter-domain RH Number there are, and 2) the minimum RHF length for these levels. Initially, the host makes its best guess at the number of levels and the minimum RHF length. For example, if host x thought that there was only one level of RH Number above the stub domain, it might create the following RD: RD = . Note that the host is not using the Tunnel Field per se for this packet. Instead, the use of an "RH-Tunnel" is encoded in the LR Field. The RH- Tunnel number is placed in the third RHF. The entries in the RH-Tunnel forwarding table contain routes to stub exit points. The purpose for using this method of tunneling, which only works for stubs, not for backbones, will become clear later in this example. The RH-Tunnel value of 1 is just a guess on the part of the host. Since the host has assumed only one level of hierarchy above its own RH Number, it puts one RHF above its known RH Number (21.96). Since this field will need to be written to its correct value by the border router, the RHF Offset initially points to this field. Through the tunneling mechanism similar to that already described, x will eventually discover a tunnel that will get the packet to router b. Looking at the forwarding tables for router b in Appendix A, we see that router b would first access forwarding table FTt with index 1. This entry contains a pointer rather than forwarding info (as can be seen by the fact that the "next-hop" column is empty). Since the RHFR proceeding the third RHF is "none", the "none" column in the table is used. The exclamation point ("!") indicates that this is an error, and that an error message of some sort should be sent. In this case, it is a "Incorrect RH" message indicating to host x that it has not set the correct number of levels in the RH Number. Upon receiving this message, the host would assume 2 levels of RH Number above the stub domain, and create the following RD: RD = . This RD shows 4 levels of source RH Number instead of 3. Both source RH Number levels 3 and 4 are filled in with the RH-Tunnel value of 1. When router b receives this packet, it goes through the same steps as before up to the point where it accesses forwarding table FTt, index 1. This time, it refers to the "up" column, writes the RHF to value 14, and increments the RHF Offset (as indicated by the "+"). The question mark ("?") after the value 14 in the new-value field indicates that a check should be made at this point for sending an error message. In this case, the check is to make sure that the RHF Length is big enough to hold the new value. If it weren't, an error message indicating the correct minimum RHF Length for the inter-domain parts of the RH Number would be sent to the host. At this point, the RH is as follows: RH = 96 (up) 12 (up) 14 (up) 1 (none) 61 (down) 92 (down) 7>. Next, router b goes to forwarding table FT4b index 1, writes the RHF to value 1, checks again for correct RHF Length, increments the RHF Offset, and goes to forwarding table FT4a, index 61. This entry indicates router j (backbone D) as the next hop. The "?" here refers to a check to see if an RH-Tunnel redirect should be sent. In this case the answer is yes, because the RH-Tunnel value of 1 indicates backbone A. The RH-Tunnel redirect would direct host x to subsequently use RH-Tunnel 2 to reach level 4 RH Number "61". When router b forwards the packet to router j, the RD is as follows: RD = . The RH-Tunnel bit is not relevant to router j, and its semantics no longer exist in the LR Field. The LR.level has been set at 4, as indicated in the "none" column for FT4a, index 61. The source RH Number has been filled in by router b. It is as though host x knew its full source RH Number. Note that, in a sense, the wrong source RH Number has been formed. This is because a return packet based on this source RH Number will come back via backbone A instead of backbone D-asymmetric paths. The source RH Number is composed according to the RH-Tunnel value, not according to the actual exit point. Because of the redirect to host x, however, subsequent packets will go via RH-Tunnel=2, and therefore the "correct" source RH Number of 26.81.12.96 will be formed. Example 2.1b: Partial host isolation from external RH Numbering conventions Any number of variations on this theme are possible. For instance, hosts could normally not know the inter-domain RH Numbers, but learn them on an as-needed basis. In this mode of operation, a host could create an RD with RH-Tunnels, as in the previous example, but intentionally incorrectly compose the RD, for instance, by putting no levels above the intra-domain RH Numbers. The error message sent by the border router could include the proper inter-domain RH numbers. In subsequent packets, the host would compose correct RDs, with RH.level = 4 and RHF Offset pointing to the highest-level destination RH Number. This saves the border router from having to work through two extra levels of hierarchy. The learned inter-domain RH numbers would be used only for the appropriate destination(s), and would be flushed periodically. Or, the host could operate as in example 2.1a, but when it receives a return packet from the destination host, it can learn the appropriate inter- domain RH Numbers from the Destination RHFs of the received packet. If the host later receives a tunnel redirect (implying that a different outgoing backbone was being used), the host could again write the inter- domain RH Numbers to zero, thus learning the new RH Number is subsequent return packets. Once the host learns and uses the correct inter-domain RH Numbers, it may use the Tunnel Field to exit the stub domain. Example 2.1c: No host isolation from external RH Numbering conventions This example is quite similar to the previous two examples, except that the function of the border router filling in the proper inter-domain RH Numbers is not used. Instead, hosts are configured with tuples, one for each exit backbone. All hosts in stub X would have two tuples: and . A Tunnel value of 1, then, represents exit points that reach backbone A (1), and a Tunnel value of 2 represents exit points that reach backbone D (26). Note that these tunnel values are not pointing to exit routers per se-they are pointing to exit backbones. Therefore, a Tunnel value of either 1 or 2 could cause a packet to go to router b, since it is connected to both backbone A and backbone D. Since x wants routing to pick the appropriate exit backbone, it creates the following RD: RD = . The thing about this RD that will force routing to choose the best path (according to routers) is that the RHF Offset points to the destination backbone (61), and doesn't pre-suppose what exit point to use. Presumably, routing will have an opinion about what is the best way to get to backbone 61, and will do the right thing. The Tunnel value (1) was picked arbitrarily. Looking at the forwarding tables for b (in Appendix A), we see that b will access forwarding table TT (because the Tunnel is non-zero), and index 1. Router b would write the Tunnel value to 0, index the LR Table at LR.level=4, and go to table FT4a, index 61. (This is deduced from the tables shown in Appendix A because the other level 4 forwarding tables, FT4b, is only reached via FT3.) At this point, the behavior is similar to that of example 2.1a, where a tunnel redirect is sent. As a result of the tunnel redirect, host x subsequently composes the following RD: RD = . Discussion Note that both modes of operation (hosts that do not know the inter- domain RH Numbers and hosts that do) can operate in the same domain using the forwarding tables shown for router b. Note that the "dumb host" mode of operation (that of Example 2.1a) can work because the ID function has been partitioned from the "routing" function. This allows routers to change aspects of the routing information while still allowing hosts to recognize the source and destination of packets. I have mixed feelings about the "dumb host" mode of operation. On one hand, the notion of not having to administer inter-domain RH Numbers in machines other than border routers and directory service is appealing. On the other hand, it seems to me that, given the right protocols, it should be easy to manage inter-domain RH Numbers in all hosts and routers. For instance, OSI is in the process of defining a means whereby all hosts in an area can be informed of new NSAP prefixes. This technique is tied to current ISIS and ESIS functions, and is actually quite simple. 5.2.2 Example 2.2: Inter-domain communications with backbone selection, with tunneling. For these examples, the source host wishes to manipulate the exit backbone chosen, rather than let the routers choose. Note that this use assumes that the host (or user) has the knowledge necessary to choose a backbone that makes sense. For instance, it might be silly for a host to choose backbone A over backbone D, when backbone A forwards the packet onto backbone D anyway. Example 2.2a: Punching holes, different hierarchy depths, and symmetric paths As with the previous examples (2.1), host x wishes to send a packet to host z. But, host x wants the packet to go through and return via backbone A. We assume that the hosts in X have the same information as with example 2.1c, that is, that they know which inter-domain RH Numbers are associated with which Tunnel values. Host x creates the following RD: RD = . The difference between this and the previous example is that the RHF Offset is set to 4 instead of 5, and is therefore pointing to the highest Source RH Number (1) instead of the highest Destination RH Number (61). As a result, when router b receives this packet, it replicates the actions of example 2.1c, except that it indexes FT4a with value 1 instead of value 61. This retrieves a next-hop of g, which matches the implied Tunnel value, and so no tunnel redirect is necessary. The RHFR in this case is "none", and so router b keeps the LR.level at 4. Since the next hop is in backbone A, router b increments the RHF Offset. Note that if router b had not incremented the RHF Offset, router g would have taken the extra step of determining that the RHF (1) indicated itself and incrementing the RHF Offset itself. Router b forwards the packet to backbone A (router g) rather than backbone D, as it otherwise would have. At this point, it is instructive to follow the packet through the internet to the destination. Router g receives the following RD: RD = . Router g access its forwarding table FT4, index 61, and routes the packet to router j. (Here we see that, from a purely topological perspective anyway, host x's choice of A as its backbone does nothing more than incur 2 extra hops.) The RD received by router j is the same as that shown above (router g does not change the semantics of the RD, although it could have modified the bit positions in the LR or HD, if backbone D interprets the bits differently than backbone A). Note that appropriate exit point from backbone D to backbone J is router l. Within backbone D, however, two ways are shown to get from j to l. By inspecting the forwarding tables for router j, we see that a "QOS" metric determines which way is taken. This metric would be encoded in the LR (along with level), and is used to choose the "Logical Router" (that is, the appropriate forwarding table) for the metric type. Note that this metric example only influences the path inside a backbone. A metric could just as well influence the path of backbones. For this example, assume that the "QOS" metric bit is 0, and so forwarding table FT4a is used, indexed by 61. This returns a next hop of l, and the RD is not modified. Note that if there are routers between routers j and l, router j would have to tunnel to reach router l. When router l receives the packet, it indexes 61 into table FT4a. Instead of retrieving an entry indicating that the packet should be routed to backbone J, route l is instructed to look into a level 3 table (FT3b). This is surprising, as the destination stub is not under backbone D. The reason for it in this case is that 1) there are two ways to enter backbone J, and 2) router l would like to pick the most appropriate entry point into backbone J for the given stub. This is analogous to the "east coast/west coast" problem found sometimes in the USA, where a neighbor backbone can be entered on either coast, and more detailed information about the location of the destination is desired to know which entry point to take. Router l increments the RHF Offset, and indexes 92 into forwarding table FT3b. This entry indicates that router p is the best next hop into backbone J. Note that router l has two level 3 forwarding tables, FT3a and FT3b. It is necessary to separate the forwarding tables for the level 3 destinations within backbone D from those in backbone J. And indeed, it would be necessary to have a separate level 3 table for every level 4 entity whose level 3 details were known. This is in order to distinguish between identical level 3 values in the different level 4 areas. This form of gathering detailed information about the internal structure of other domains is sometimes called "hole punching", and is a feature of the IDRP routing protocol. Router p receives a packet with the following RD: RD = . Router p indexes 92 into forwarding table FT3. This entry returns a next hop of q. Note that the LR.level of the RD transmitted by router p is set to 1, even though it came in as level 3 and even though the RHF Offset was incremented only once. This is necessary because stub domain Y only has one level of hierarchy, and therefore views the "top" of the hierarchy as level 3 rather than level 4. A host in a stub domain will view the top level of the RH Number hierarchy as being the number of levels in its RH Number. This is true whether or not the destination host has the same number of levels. A router can view the top level of the hierarchy as being any level equal to or greater than the number of levels it is aware of. As such, router g, for instance, could view the top level as level 2. The stub domains would then be level 1. As long as one router translates the level into the proper value for the next router, the level value can be chosen somewhat arbitrarily. To continue the example, router q receives the following RD: RD = . Router q transmits the packet to r, which transmits it to z (the forwarding tables for q and r are not shown). To form a return packet, host z reverses the order of RHFs, resulting in the following RD: RD = . The destination RHF pointed to by the RHF Offset (1) signifies backbone A. This means that the reverse path will be symmetric with the forward path (at least at the level of domains). 5.2.3 Example 2.3: General Policy Routing The previous example showed a small level of policy routing, in that the source host was able to choose the exit backbone. Recent work [BE, LS] indicates that policy routing in general can best be achieved with domain- level source routing. In this example, we show how this can be encoded with Pip. For general policy routing, but still with hierarchical RH Numbers, the RD is of the form shown in Figure 7. In between the source and destination RHFs are the intermediate RHFs. These designate the backbones on the path from source to destination. Example 2.3a: Choosing the inter-domain path For this example, assume that host x not only wants the packet to go via backbone A, but to traverse backbones B and C as well. To do this, host x forms the following RD: RD = . The packet would reach router g similarly to example 2.2a. Router g receives the following RD: RD = . Instead of pointing to the destination backbone (61), the RD points to backbone B (14). Therefore, router g forwards the packet to router f, which forwards it to router h. When router h receives the packet, it would point to backbone C (9), and so on. The domain path taken by the packet would be X-A-B-C-J-Y. When host z receives this packet, it knows by inspecting the RD that 14 and 9 are intermediate backbones (because of the "none" RHFRs), and strictly speaking are not necessary for returning the packet to y. If z wanted the return path to be symmetric with the forward path, then it can form an RD by reversing the RHFs. However, if z doesn't care about the return path, or wishes a different return path, it can remove the intermediate RHFs (14 and 9), and potentially add some of its own. Example 2.3b: Choosing the intra-domain path In this example, host w is sending a packet to host z. Host w doesn't care about the inter-domain path, but wishes the intra-domain path to transit areas 19 and 14 before exiting the domain. To do this, host w forms the following RD: RD = . When router c receives this RD, it will index 19 into its forwarding table FT2 (not shown, but analogous to router b's FT2), and route the packet to router a, which will forward it to b (based on an index of 12 into its FT2 forwarding table, also not shown). When router b receives the packet, it will have the following RD: RD = . Router b will index 14 into its forwarding table FT3, which indicates that it should go to level 4 and route on the next RHF. Note that here is an example where, to save memory, this table could be implemented as a single "wildcard" entry rather than a full table to be indexed into. When host z receives this packet, it can again either leave in the intermediate RHFs or take them out. In this case, however, the intermediate RHFs are interspersed between source RHFs. This can be detected, however, by inspection of the RHFRs. Assuming that host z leaves the intermediate RHFs in, it would form the following RD: RD = . When router g receives this packet from backbone D, it forwards the packet to router b. Router b receives the following RD: RD = . Since the RHFR after the 6th RHF (12) is "none", router b goes to the "none" column of index 12 in table FT2, increments the RHF Offset, and indexes again into FT2, but this time 19. As a result, the return packet takes the reverse path of the forward packet. Note that in general for this to work, since backbone A has two ways to reach stub X, backbone A should have hole punching information about stub X. For instance, if backbone A transmits the packet to stub X via router e, then router c would forward the packet to area 12, which would then return the packet to router c via area 19. The packet would not loop more than this once, but none-the-less it is clearly a non-optimal path. This is a natural consequence of doing policy routing without specifying the path adequately, and is not a bug with Pip per se. (Alternatively, to eliminate the need for hole punching information in A's routers, X could have two level 3 RH numbers under backbone 1. One number would indicate entry via e, and the other entry via g. Each could alternate route to the other in case of node or link crashes making the primary route impossible.) 5.2.4 Comments on Header Size Even with some policy in the RD, the Pip headers are still relatively (compared to CLNP) small. For instance, assume that there are no more than 1000 top level backbones, and that any hierarchy element has no more than 1000 sub-elements. In this case, the largest RHF is 10 bits. Therefore, the RHFs of Example 2.2 require only 82 bits, or 3 words when padded out to 32-bit words. Including two 6-octet IDs, we get 6 words total (note that not all packets must include the IDs). This can advantageously be compared to CLNP addresses, which require 10 words (two 5-word addresses). The RHFs of Example 2.2, which have a decent amount of policy information in them, require only 106 bits, or 4 words when padded out (7 words when IDs are considered). 5.3 Example 3: Node-level Source Routing Example 2.3 showed how Pip can do domain-level (or area-level) source routing for policy routing. Other literature [Per2, Che, CG] suggests that node-level source routing has advantages. In the case of Perlman, source routing is used to make a network more robust. In the case of Cherition (Sirpent) and Cidon (Paris), it is to speed up the forwarding process. Perlman encodes node identifiers in the source route, Sirpent encodes outgoing link identifiers, and Paris encodes self-routing switch codes. Consider a case where a stub domain wished to use Perlman's byzantine routing for internal communications, and to use normal hierarchical RH Numbering for external communications. For external communications, the RH numbering of Figure 6 would be used. For internal communications, a separate RH numbering scheme is used. In this scheme, each router is given an identifier, counting up from 1. For instance, if a network had 500 routers, they would be numbered 1 through 500, and the RHF for each router would be 9 bits long. Each host would have a number assigned by its connected router. Therefore, even if each router had 500 hosts (for a total of 250,000 hosts), each RHF would still be only 9 bits. The RD would be composed as follows: A separate LR value would be used to distinguish RH numbers in this local scheme from hierarchical global RH numbers. Assuming 9 bits per RHF, Pip can encode a source route of 18 hops plus two 6-octet IDs in the same space required for two NSAP addresses. 5.4 Routing on a path identifier (or VCI number) There are various advantages to setting up a dynamic path identifier rather than sending full RH Numbering information in each packet. Because part of the forwarding function is to modify the RHF, the RD can be used as a path or virtual circuit identifier. It can also be used as a hierarchical path identifier as with ATM cells. It might be possible to use an option field in Pip to convey the information necessary to setup a path. 5.5 Multicast Routing Pip provides enormous potential for increasing the sophistication and efficiency of multicast routing. For instance, Pip can encode hierarchical multicast routing, where for instance one level of the RH indicated a multicast at the backbone level, while the next level down indicated multicast within a stub. This could be used, for instance, to allow a backbone to view the various stub locations of an international corporation as the group members of a single multicast tree (a single upper level multicast RHF), while in fact the corporation had many multicast groups (multiple lower level multicast RHFs). Since different applications require different multicast trees (for instance, applications that don't require smallest possible delays could get away with a single multicast tree instead of multiple source-rooted multicast trees), multiple multicast algorithms could run in parallel, with bits in the LR Field distinguishing between them. 6.0 Transition from IP This section outlines an approach for transitioning from IP to Pip. I presume that the target architecture for Pip is backbone-oriented hierarchical RH Numbers such as shown in Example 2.2. This RH Number structure is essentially the same as what is proposed in RFC 1237 [CGC]. I don't see any reason to use geographically-oriented RH Numbers, such as proposed by Deering [Ref?}, given that 1) the inter- domain part of RH Numbers can be hidden from stubs, and 2) that with Pip, it is straight-forward to take advantage of backbone-oriented RH Numbers for policy routing. None-the-less, geographically-oriented RH Numbers can be used with Pip, and so the issue remains open to debate. Because the RH Numbers are semantically equivalent to RFC 1237 NSAPs, it should be possible to use the "CNAT" transition plan being developed by Callon almost as is. The main difference is that Pip will be used instead of CLNP, and RH Numbers will be used instead of NSAPs. The transition, then, goes roughly as follows: 1. Start running Pip in the backbones. a. Modify BGP carry RH Numbers. Once BGP has been modified for general masks as currently planned (BGP4), it will be rela- tively easy to add RH Numbers, as BGP4 will already have hole punching capability. b. An RH Number Authority (perhaps the same authority that as- signs IP addresses, or perhaps the Internet Society) will assign RH Numbers to backbones. On one hand, this will result in fewer assignments than are currently done by the IP numbering authori- ty, but on the other hand each assignment will require some screening to insure that the recipient is a valid backbone. 2. Simultaneous with 1, populate border routers with mappings between IP network number and corresponding RH Numbers (i.e., IP net num- ber <=> RH backbone.stub). This is to allow for translation between IP packets and Pip packets at the borders of stubs. These mappings can be distributed using a new BGP attribute. 3. Simultaneous with 1, modify the DNS root servers to issue RH Num- bers in addition to IP numbers. 4. One-by-one, modify intra-domain routing to use Pip. Because Pip can use either the subnet/host model of IP or the area/host model of CLNP, and because inter-domain routing information need not be seen within stub domains, both IP and CLNP routing protocols can be modified to carry Pip RH Numbers. 5. Simultaneous with 4, modify the stub DNS servers to issue RH Num- bers in addition to IP numbers. 6. One-by-one, modify hosts to run Pip. a. At the same time, higher layer protocols such as FTP or TCP that encode IP addresses should be modified to either not require in- ternet-layer identifiers, or to handle multiple types, including Pip IDs. The TCP pseudo-header checksum could be made to include the whole Pip ID. b. While any host in a stub is an IP-only host, all Pip hosts should be able to run IP, in order to talk to that host without translation, and intra-domain routing must be able to handle IP or Pip. c. Once a stub domain becomes pure Pip (no IP boxes), that stub do- main should never have to translate Pip packets into IP packets. The burden of all translations should be up to the stub that still runs IP. 7.0 Further Work Obviously there is a great deal of work to be done-detailed Pip specification; specification of modifications required to existing protocols, particularly routing but also DNS; development of a transition plan; specification of configuration protocols; establishment of a Pip addressing authority; and experimentation, among others. While I don't expect anybody to buy completely into Pip based on this paper alone, I hope that this paper convinces most that Pip is an alternative worth expending considerable resources on. REFERENCES [BE] Breslau, L. and Estrin D., "Design of Inter-Administrative Domain Routing Protocols", Proceedings of ACM SIGCOMM `90, Philadelphia PA, September 1990 [Che] Cheriton, D.R., "Sirpent: A High-Performance Internetworking Approach", Proceedings of ACM SIGCOMM `89, Austin Texas, September 1989 [CG] Cidon, I., and Gopal, I., "Control Mechanisms for High- Speed Networks", Proceedings of IEEE International Conference on Communications `90, Atlanta Georgia, April 1990 [Chi] Chiappa, J.N., "A New IP Routing and Addressing Architecture", IETF Internet Draft, draft-chiappa-routing- 00.txt, available by anonymous FTP at nnsc.nsf.net. [CGC] Collela, R., Gardner, E.P., Callon, R.W., "Guidelines for OSI NSAP allocation in the internet", RFC-1237, USC/ Information Sciences Institute, July 1991. [LS] Lepp, M., Steenstrup, M., "An Architecture for Inter- domain Policy Routing", IETF Internet Draft, draft- chiappa-routing-00.txt, available by anonymous FTP at nnsc.nsf.net. [OSI2] International Organization for Standardization ISO8473, "Protocol for providing the Connectionless-mode Network Service" [OSI3] International Organization for Standardization ISO10589, "Intermediate System to Intermediate System Intra- Domain routeing exchange protocol for use in Conjunction with the Protocol for providing the Connectionless-mode Network Service (ISO 8473)" [Per1] Perlman, R., "Incorporation of Service Classes into a Network Architecture", Proceedings of the Seventh Data Communications Symposium ACM SIGCOMM, Vol. 11, No. 4, October 1981, pp. 204-210. [Per2] Perlman, R., "Byzantine Routing", PhD Thesis, Department of Computer Science, MIT, 19??. [Tsu] Tsuchiya, P.F., "Scaling and Policy Routing using Multiple Hierarchical Addresses," Proceedings of SIGCOMM `91, Zurich, September 1991. Appendix A: Forwarding Tables for Routers of Each table shown is a Forwarding Table or Tunnel Table. The first line gives the table label, followed by the criteria (LR.level, Tunnel, or previous forwarding table) under which the table is accessed. No LR Tables are shown, because the LR Table can be deduced from the criteria that each forwarding table is labeled with. Within the body of each table, the first column is the index into the table. This index is either derived from the Tunnel or an RHF, depending on which applies for the given table. There are skips in the index values. The intervening index values are not shown when the corresponding network components are not shown Figure 6. Normally, the forwarding tables are well-packed, and all index values are represented. The action taken after any table access is to either route to the next-hop router, in which case the second column (next-hop) will have an entry, or to access another table, in which case one or more of the three "next-level or next-table" columns will have an entry. The next table chosen depends on the meaning of the RHF Relator after the RHF field. The last column (new-value) is the value written into either the Tunnel or the RHF field, depending on which applies, upon transmission of the packet. In practice, both the Tunnel value and RHF may be modified, but for these examples, it is always only one or the other. A plus (+) after any entry in these four columns means that the RHF Offset should be incremented (either before transmitting the packet or before accessing the next table). A blank entry simply means that the circumstances under which the entry has been reached should not occur. This may or may not result in an error message. An exclamation point "!" after any entry means that the entry might validly be reached, but that an error message should be sent. A "?" after any entry means that additional checks will be made to determine if an error message is necessary (the text will explain these as they are encountered). An entry or RH (in a tunnel forwarding table) means to evaluate the RH from scratch. Internet Draft -- Expires Nov. 20, 1992