next up previous contents
Next: Distributing the data model Up: Design Previous: Design

Application Data Units

Our choice of application data units is driven in part by the model for data distribution and our choice of data distribution model is driven in part by the choice of application data units. Although we present the two as separate sections for the sake of readability, it should be clear that the two really are closely interrelated.

The guiding factors in determining nt's data model come from the interactivity requirement listed above - many users must be able to work on the same document simultaneously - and the observation of usage modes, particularly the need to be able to keep annotations separate from the primary text being worked on.

These guiding factors led us to choose a hierarchical data model based around blocks of text, each consisting of a number of lines of text. Each block of text is independent of other blocks of text - it can overlap them if required although this does not aid readability. An example of blocks of text used for annotation, with each block in a different font, is given in figure 8.1.

Figure 8.1: An example of blocks of text used for annotation

As it is not expected that most annotations will be modified by multiple users simultaneously, this by itself allows a number of users to be working simultaneously on the document in separate blocks.

However only allowing multiple people simultaneously to annotate a document imposes too great a constraint on the potential usage modes of the editor. Thus we also make each line of a block of a document a separate entity. This allows users to be working on separate lines in the same block without there being a potential conflict.

We could potentially have taken this model further, and treated each character of a line as an independent entity. There are, however, a number of reasons why this is undesirable. Firstly, the amount of state that needs to be kept for each separate entity to ensure eventual consistency is significant. In addition, if we choose a line rather than a character as an ADU, we do not need to receive all the individual changes to the line as a user types - the most recent version of the line is sufficient, which gives us a large degree of redundancy in the face of packet loss. Lastly, there are potential transmission failure modes that with either line ADUs or character ADUs render us with no globally consistent ordering for the ADUs - however, due to the nature of the changes to text, these are significantly less likely to occur with line ADUs than with character ADUs. We shall discuss these failure modes and also the implications of two users attempting to modify the same line simultaneously later, in the light of the loose consistency model described below.

It is perhaps not immediately obvious that either a line or a block is truly an independent entity. A block of lines is an independent entity because it has no interaction with, or dependency on, other blocks. However, within a block, lines are dependent on the block that contains them as their location is dependent on its position. In addition, a line has a position within a block. This can be represented in a number of ways - the simplest being an absolute line number, or relative links to next and previous lines. However, given that we wish to avoid any form of locking, it is possible that a message conveying the creation of a new line by one user crosses with a message conveying deletion of an earlier line by another user - in this and other similar cases it is clear that an absolute line number is insufficient to uniquely place the line within a block. Positioning a line relative to neighbouring lines is more robust, as even if a line creation crosses with the deletion of the line immediately previous to the new line, the location of the previous line can be retained after its deletion and the location is still unambiguous. Perhaps as importantly, so long as lines are only created or deleted (never moved) relative location of lines means that only changes in the immediate location of the line in question can cause possible confusion, which makes it much easier for the user interface to make the possible confusion obvious to the users involved.

However, although relative location information for lines is more flexible, it is insufficient to be able to display a block of text as it arrives complete with lost packets - such as might happen when a file is loaded into the editor. In such a case, adding additional line numbering allows the parts of the block that have been received to be displayed immediately irrespective of whether there are any missing lines at this stage. Whether this is an important requirement depends on the importance placed on timeliness of display - we took the viewpoint that it is important to be able to display any change as soon as it is received, and so added line numbers to the line ADUs which are ignored after all previous lines in the block have been received.

Given the choice of ADUs as lines and blocks, there is a certain amount of meta-data that must be associated with each. Thus a block contains, amongst other things, the following:

  • position of the block on the page
  • the line-id of the first line in the block
and a line contains, amongst other things:
  • the text of the line
  • the block-id of the block this line forms a part of
  • the line-id of the previous line
  • the line-id of the next line
  • the line number of the block (ignored if all previous lines in the block are present)
Thus, although lines and blocks are not completely independent, blocks can be moved without modifying the lines contained in the block, and lines can be created, deleted and edited independently of other lines or blocks. There are however a number of desirable operations on lines that cannot be carried out independently - we shall discuss these and their consequences later.

next up previous contents
Next: Distributing the data model Up: Design Previous: Design