Specification of the EBS file format for bio-signals
----------------------------------------------------


Markus Kuhn -- http://www.cl.cam.ac.uk/~mgk25/ebs/

$Id: ebs.txt,v 2.4 1994-03-11 08:30:15+00 msprosch Rel mgk25 $


1  Purpose

In the analysis of multi-channel bio-signal recordings (e.g., electro
cardiogram, electro encephalogram, magneto cardiogram, magneto
encephalogram, audio data), scientists often spend a significant time
in the coding of simple functions and programs that write and read the
data into and out of files. Programs for trivial tasks like extracting
a single channel or a short time sequence out of a huge file, applying
different filters and standard signal processing algorithms to the
recordings and visualization of the data are rewritten and reinvented
again and again in many institutions all over the world each day. A
lot of existing formats offered by recording equipment vendors are
often designed only for very special applications, are unflexible and
unextensible. Some of these vendor formats are also optimized only for
a special hardware and are not or only badly documented.  Scientific
applications require file formats that are not too complicated, easy
to understand and implement, highly flexible, fully documented and
that allow researchers to cooperate by making easy exchange of data
and tools among work groups possible.

The design goals of the EBS file format have been:

  - Implementation of software which supports the EBS format must not
    be very difficult and it must be possible for advanced programs to
    exchange data with very simple implementations that use only a few
    features of the format.

  - It must be possible to handle the data efficiently, because often
    very large data sets have to be processed. I.e. different machine
    architectures have to be considered. Modifications to header data
    must be possible without having to copy the entire file. Access to
    growing files while the recording is still in progress should be
    possible on multitasking systems.

  - The format must be as universal as possible. Only very few
    parameters (length of the file, number of channels, data format)
    should be mandatory. It must be possible to attach arbitrary
    further information, i.e. the format must be highly extensible in
    a way that won't prevent the use of existing tools for extended
    versions of this format. The data part of the format must be
    capable of containing different encodings of data (e.g. various
    precisions, fixed point or floating point types, compressed
    variable length encodings, etc.).

  - A number of common attributes that are required by many different
    applications stored together with data in the file (e.g. patient
    ID, description texts, common recording parameters) have to be
    predefined. In this way, extensions to the format are not
    necessary very often.

The EBS file format has been designed for storing single-channel or
multi-channel signals that have been recorded simultaneously at
constant intervals of time with the same sample rate in each channel.
Not all channels must store signals from the same source, e.g., EEG,
ECG and trigger signals may very well be mixed in one file, but the
same encoding (e.g. 16-bit signed integers, floating point reals,
compressed or uncompressed) and the same sample frequency must be used
for all channels in a single file.

It is our hope that the EBS file format will motivate scientists
working on the analysis of bio-signals to exchange their tools and
data sets as public domain software, because similar positive
influences of standard file formats have been observed in other
scientific communities (e.g. computer graphics, astronomy and
operating systems) where well-known scientists have developed a lot of
freely available high quality software.


2  File format

An EBS file is a linear sequence of 8-bit bytes of defined length.  If
a file system allows a file name extension, '.ebs' is recommended and
if a file type has to be specified, a transparent unstructured binary
type should be used.  Each EBS file consists of 3 or 4 different
parts: (1) the fixed header containing information that is needed by
every program reading EBS files, (2+4) the variable headers which
might contain additional data that is only needed by some programs and
may be simply ignored by others and (3) the encoded bio-signal
data. The normal position of the variable header information is
between the fixed header and the encoded data (2), but it is also
possible to put some or all parts of the variable header information
behind the encoded data (4).

[Note: Having two possible positions of the variable header
information allows to change, insert or delete information in the
variable header without having to move the encoded signal data as well
as reading files while other programs are still adding data to the end
of part (3) (on-line processing).]


                   -----------------------------------
                   |     Fixed Header (32 bytes)     |       (1)
                   +---------------------------------+
                   |         Variable Header         |       (2)
                   +---------------------------------+
                   | Encoded Signal Data (4*d bytes) |       (3)
                   +---------------------------------+
                   | Optional Second Variable Header |       (4)
                   -----------------------------------


Most integer values in the fixed and variable headers are coded as
32-bit words stored in 4 bytes beginning with the most significant
byte (Bigendian format). If the value is a signed integer type, then
the usual 2-complement representation of negative values will be
used.  E.g., the value -3 is stored as 0xff,0xff,0xff,0xfd and 1024 is
stored as 0x00,0x00,0x04,0x00 (in this text, the prefix '0x' indicates
a hexadecimal number as in the C programming language and two hex
digits form an 8-bit byte value). All 32-bit integer values in the
fixed and variable headers are aligned to 32-bit boundaries,
i.e. their start byte position relative to the first byte of the file
is always a multiple of 4.


2.1  The Fixed Header
 
Each EBS file starts with a 32 bytes long data structure with the
following format:


                   -----------------------------------
                   |  Identification Code (8 bytes)  |
                   +---------------------------------+
                   |   Data Encoding ID (4 bytes)    |
                   +---------------------------------+
                   |  Number n of channels (4 bytes) |
                   +---------------------------------+
                   |  Number m of samples (8 bytes)  |
                   +---------------------------------+
                   | Length d of Data Part (8 bytes) |
                   -----------------------------------


	Byte |  Value        |  Meaning
      -------+---------------+---------------------------------
          0  |  0x45         | ASCII character 'E'
          1  |  0x42         | ASCII character 'B'
          2  |  0x53         | ASCII character 'S'
          3  |  0x94         | another ID character
          4  |  0x0a         |   "
          5  |  0x13         |   "
          6  |  0x1a         |   "
          7  |  0x0d         |   "
        8-11 |  see 2.3      | Encoding ID
       12-15 |  any          | number n of channels (unsigned)
       16-23 |  any          | number m of samples per channel (unsigned)
             |               | stored as a 64-bit value or all bytes are
             |               | 0xff if unspecified.
       24-31 |  any          | length d of the data part (3) in 32-bit words
             |               | (i.e. part (3) is 4*d bytes long) or all bytes
             |               | are 0xff if part (4) is not present.
             |
       32-   |  here begins the first variable header part (2) of an EBS file


Identification code:

The 'magic code' in the first 8 bytes identifies the file as an EBS
file. Programs that read EBS files should complain about files that
don't start with these 8 bytes.

Encoding ID:

The number in the next 32-bit word indicates the format in which the
bio-signals are encoded in part (3) of the file. The possible
encodings and their ID values in this field are described later in
section 2.3.

Number of channels:

The 32-bit unsigned integer value n starting at byte 12 specifies, how
many channels have been recorded.

Number of samples:

The 64-bit Bigendian unsigned integer value m in the next 8 bytes
indicates, how many samples have been recorded in each channel.

[Note: Don't worry about the 64-bit values! Today, most
implementations just check, whether the bytes 16-19 have the value
0x00 and read the bytes 20-23 as the 32-bit number of samples, because
their operating system can't deal with 64-bit values and with files
longer than a few gigabytes. It is all right if your implementation
just gives a nice error message for EBS files with more then e.g.
4294967295 samples, but some applications might need files in which
the number of samples can't be described with 32-bit (e.g. long-time
recordings) and new operating systems support files of this length.]

If all bytes from position 16 to 23 have the value 0xff, then this
indicates that the length of the whole file is NOT determined by the
fixed header.  Instead, the end of the data part (3) is determined by
the operating system.  This is called an EBS file with 'unspecified
length' and may be used when recorded data has to be accessed while
the recording is still in progress and part (3) is still growing. In
this case, the program can read sequences of n sample values until the
first end-of-file condition is signaled by the operating system. The
undefined length value is only allowed in combination with TIME-BASED
ORDER data encodings (see section 2.3) and no second variable header
can be present in files with unspecified length.

Length of Data Part:

If a second variable header is present, then the 64-bit value starting
at byte 24 will be the length d of part (3) counted in 32-bit
words. I.e. part (3) is d*4 bytes long and 4*d bytes have to be
skipped after the final tag of part 2 in order to get to the first
byte of part (4).  If no part (4) is present, all bytes from byte 24
to byte 31 have the value 0xff.  The purpose of this value d is only
to define the position of the second variable header.  It can not be
used to determine the number of samples stored in the data part (this
information is stored in m in the fixed header). The number of bytes
needed to store the n*m sample values in part (3) may be less than or
equal to 4*d, but not greater.

[Note: In some (often called 'compressed') variable length encoding
formats for the data part (3), the values n and m (number of channels
and number of samples) from the fixed header can not be used to
predict the exact size of the data part, because in compressed
formats, the number of bits per sample is not always fixed. This makes
it impossible to find the start of the second variable header part (4)
quickly (i.e., without going through the whole data part). In order to
avoid this problem, the length of the data part d is stored separately
if a second variable header is present.]

If the number of samples is not specified in the fixed header (m =
0xffffffffffffffff), then no second part of the variable header is
allowed and d also has the value 0xffffffffffffffff.


2.2  The Variable Header

Part (2) and (4) of EBS files contain a sequence of attributes
(e.g. patient name and age, sample rate, description texts, date and
time of recording, etc.) which a useful file format must be able to
carry, but which are only of interest to some application programs.
Other programs may simply ignore most or all attributes in this
header.

Each attribute in the variable header is stored as a TLV (tag, length,
value) sequence. A tag is a 32-bit unsigned Bigendian integer number
that identifies the type of information stored in the attribute
(e.g. patient name). Some tag numbers and the meaning and syntax of
the following attribute value are already defined in Appendix A, but
other new ones may be easily defined for special applications
according to the rules in Appendix B. The tag number is followed by an
unsigned 32-bit length indicator l that specifies the number of 32-bit
words (i.e. l*4 bytes) of the directly following value of the
attribute. The number of bytes in an attribute value is always a
multiple of four.

Both variable header parts end with the special tag 0x00000000.  If
part (4) is present, these are normally the last bytes of the
file. The final special tag 0x00000000 in part (2) is directly
followed by the first byte of the data part (3). The tag 0xffffffff is
reserved and must not be used in any EBS file. The format of both
variable header parts is:


                   --------------------------
                   |      tag (4 bytes)     |
                   +------------------------+
                   |   length l (4 bytes)   |
                   +------------------------+
                   |    value (l*4 bytes)   |
                   +------------------------+
                   ... tag, length, value ...
                   +------------------------+
                   |       0x00000000       |
                   --------------------------


The interpretation of the value bytes depends completely on the value
of the tag number. Most values are simple data types like integer
numbers or text-strings or are sequences of these simple types.  If
not otherwise specified, the values of attributes defined in this text
in Appendix A use the following encoding for various simple types and
it is recommended that attributes in new additional attributes use the
same encoding where this is appropriate. All simple types are encoded
so that their length in bytes is always a multiple of four. Simple
data types without fixed length (e.g. strings and floating point
numbers) are self delimiting (e.g. with final zero bytes).

a) 32-bit integer number

Integer numbers are stored starting with the most significant byte.
Signed integer numbers are stored with the usual 2-complement
encoding.

b) 64-bit integer numbers

They are also stored with the most significant byte first and use
2-complement encoding if the value is signed. In the variable header,
only a 32-bit, not a 64-bit alignment is guaranteed, i.e. it is NOT
guaranteed that 64-bit integer values start at an address relative to
the first byte of the file which is a multiple of 8.

c) floating point numbers

Floating point numbers are stored as ASCII strings in the usual
representation (e.g. as in the C programming language). These strings
may only contain the characters '+', '-', 'e', 'E', '.' and the digits
'0' to '9'. At the end of the string, between one and four 0x00 bytes
are appended, so that the length of the encoded floating point number
is always a multiple of 4. Examples of valid floating point numbers
are

  '3.14'        0x33,0x2e,0x31,0x34,0x00,0x00,0x00,0x00
  '-.1'         0x2d,0x2e,0x31,0x00
  '+0.910e+45'  0x2b,0x30,0x2e,0x39,0x31,0x30,0x65,0x2b,0x34,0x35,0x00,0x00

The Extended Backus-Naur Form (EBNF) grammar of all possible real
numbers (without the final 0x00 bytes) is

  ['-'|'+'] {digit} ['.' {digit}] [('e'|'E') ['-'|'+'] digit {digit}]

where digit is a character from '0' to '9', [] means optional, 
| describes a choice and {} means zero, one or several times. At least
one digit must be present before the optional exponential part.  The
special value "not-a-number" (NaN) is represented by the empty string
0x00,0x00,0x00,0x00.

d) single-line and multi-line text-strings

Text-strings are stored using the 16-bit character set UCS-2 (the
16-bit subset of ISO 10646, also known as 'Unicode') which covers all
other character sets on this planet. UCS-2 characters are stored as
sequences of 16-bit Bigendian values.

[Note: If you are unfamiliar with ISO 10646, it is sufficient to know
that ASCII and ISO 8859-1 (ISO Latin 1) characters have the same code
in this 16-bit character set, i.e. you get the correct 16-bit value by
prefixing each ASCII or Latin-1 byte with 0x00.  Check a copy of the
ISO 10646 standard or of the compatible Unicode Standard (Version 1.1
or higher) if you want to support other characters (e.g., Cyrillic,
Greek, Chinese, Japanese, IBM PC, etc.) and need to know their 16-bit
codes.]

If text-strings are allowed to span several lines, the code 0x000a
(LF, line feed) should be used as the only line separator between
these lines.  The last line is not followed by another 0x000a code.
Strings always end with one or two 0x0000 codes so that the number of
bytes in the string including the two or four 0x00-bytes at the end is
always a multiple of four.  If not otherwise specified, single-line
text-strings should not have more than 64 characters (not including
the 1 or 2 0x0000 codes at the end), but application programs must be
able to cope with longer lines, e.g. by truncating them. Multi-line
strings may have any number of lines but should also have not more
than 64 characters per line (not including the 0x000a line separation
code and the 0x0000 end markers) if not otherwise specified. An
example text-string is:

  'hello'       0x00,0x68,0x00,0x65,0x00,0x6c,0x00,0x6c,0x00,0x6f,0x00,0x00


Appendix A defines a lot of commonly used attribute tags and the
semantic of their values and Appendix B defines which tag values you
may use to define your own attribute types.

The least significant bit of each attribute tag specifies, whether the
attribute value contains information about specific channels (bit is
1) or not (bit is 0). In this way, programs that add, remove or
rearrange channel data in EBS files can leave unknown attributes with
even tag numbers in the file. They should remove unknown attributes
with odd tag numbers and modify odd numbered attributes that are known
to the programmer, because their content might assume a special
channel layout in the file that does not exist any more after the file
modification.

Each attribute tag shall appear not more than once in the variable
headers.


2.3  The Data Part

The recorded data may consist of different types (e.g., signed 16-bit
integers, unsigned 32-bit integers, signed 12-bit integers, floating
point numbers) and these different types may be encoded in different
ways (e.g., Bigendian, Littleendian, various compression methods). The
values may also be ordered differently. The TIME-BASED sample ordering
starts with the values of all channels at the first sample time
followed by the values of all channels at the second sample time and
so on. The CHANNEL-BASED ORDER of samples begins with all values of
the first channel over the full recording time followed by all values
of the second channel, etc. If the CHANNEL-BASED ORDER is used, the
number m of samples MUST be indicated in byte 16 to 23 of the fixed
header. Eight 0xff bytes in this field are only possible in
combination with TIME-BASED ORDER formats.

The Encoding ID number stored in byte 8 to byte 11 of the fixed header
may indicate one of the following data types and data encodings
(others might be added in future versions of this specification):

TIB_16 (Encoding ID: 0x00000000):

This format stores 16-bit signed integer values with the high byte
first in TIME-BASED ORDER. This means that e.g. the recorded values

     time    channel 1      channel 2      channel 3      n = 3

       0        20             13            1493
       1         5              7             307
       2       -11              9             421
       3       ...
       ...
       m-1     

will be stored as 0x00,0x14,0x00,0x0d,0x05,0xd5,0x00,0x05,0x00,0x07,
0x01,0x33,0xff,0xf5,0x00,0x09,0x01,0xa5,... (length: 2*n*m bytes, i.e.
d >= (n*m*2)/4).

CIB_16 (Encoding ID: 0x00000001):

This format is very much like TIB_16 with the only difference that the
values are stored in CHANNEL-BASED ORDER, i.e. the above example
recording would be stored as 0x00,0x14,0x00,0x05,0xff,0xf5,...,
0x00,0x0d,0x00,0x07,0x00,0x09,...,0x05,0xd5,0x01,0x33,0x01,0xa5,...

TIL_16 (Encoding ID: 0x00000002):

This format is like TIB_16 a TIME-BASED ORDER, 16-bit signed integer
encoding, with the difference that the integer values are stored in
the Littleendian format (i.e. beginning with the low byte), which
makes efficient programming possible on systems that use Littleendian
as their native integer format (e.g., INTEL processors, Transputers,
...). The example recording is then stored as: 0x14,0x00,0x0d,0x00,
0xd5,0x05,0x05,0x00,0x07,0x00,0x33,0x01,0xf5,0xff,0x09,0x00,0xa5,
0x01,...

CIL_16 (Encoding ID: 0x00000003):

This format is like CIB_16 a CHANNEL-BASED ORDER, 16-bit signed
integer encoding, but in Littleendian format (i.e. beginning with the
low byte).  The example recording is stored as 0x14,0x00,0x05,0x00,
0xf5,0xff,...,0x0d,0x00,0x07,0x00,0x09,0x00,...,0xd5,0x05,0x33,0x01,
0xa5,0x01,...

TI_16D (Encoding ID: 0x00000010):

In this compressed TIME-BASED ORDER encoding, 16-bit signed integer
values are stored, but they are encoded in a way that will in many
applications need only a little bit more than 50% of the storage space
of TIB_16 or TIL_16. The trick is that only the difference between two
consecutive samples in the same channel is stored as a signed
2-complement 8-bit value ranging from -127 (0x81) to +127 (0x7f). A
positive difference means that the next sample value in the same
channel has a higher value. If the value is the first sample of a
channel or if the difference is less than -127 or greater than +127,
then the absolute value will be stored in a 3 byte sequence starting
with -128 (0x80) followed by the full 16-bit signed integer value of
the sample with the high byte first. I.e., our example recording from
above would look like this: 0x80,0x00,0x14,0x80,0x00,0x0d,0x80,0x05,
0xd5,0xf1,0xfa,0x80,0x01,0x33,0xf0,0x02,0x72,... The length of the
data part in bytes can't be predicted with the parameters in the fixed
header if this compressed encoding is used (d >= n*(m+2)/4).

CI_16D (Encoding ID: 0x00000011):

The encoding is the same as TI_16D with the only difference that the
sample values (i.e. the differences between them) are stored in
CHANNEL-BASED ORDER. The example recording would look like this:
0x80,0x00,0x14,0xf1,0xf0,...,0x80,0x00,0x0d,0xfa,0x02,...,0x80,
0x05,0xd5,0x80,0x01,0x33,0x72,...

[Note: It is expected that CIB_16 will be the most popular format. If
you are confused by the many different encodings, just support CIB_16
and reject other EBS encodings with other encoding IDs with a nice
error message. There are tools available that allow easy conversion
between the different encodings. On some popular processors, you might
perhaps prefer CIL_16 if you operate on very huge data sets with
efficient methods (e.g. memory mapped files). Time will show, whether
the uncompressed TIME-BASED ORDER formats will be of use, and among
the compressed formats, TI_16D will perhaps be the most popular
version for archive and transfer purposes until more efficient
compression techniques are available. If you have only one single
channel, then there will be no difference between the TIME-BASED ORDER
format and the corresponding CHANNEL-BASED ORDER format. Before you
use a coin to decide whether you should indicate a TIME-BASED ORDER or
a CHANNEL-BASED ORDER format, it is recommend to use the ID of the
CHANNEL-BASED ORDER encoding.]

If a second variable header is present, between 0 and 3 zero padding
bytes have to be appended after the above described encodings of the
recording in order to give the whole data part a length in bytes that
is a multiple of four. This will guarantee a 32-bit alignment for the
second variable header part.

As a convention, program user interfaces should give the channels
numbers beginning with 1 and samples should be numbered beginning with
0.

[Note: It seems to be most natural for most people to start with 0 for
points of time, e.g. digital clocks count from 0 to 59, but only
computer scientists find it as obvious that the first channel might
also have the number 0).  This convention makes user interfaces of
programs operating on EBS files more consistent. The numbering
convention is only defined for numbers visible to the user of a
program and is not intended for variables used internally within a
program or for attributes in the variable header.]

The Encoding IDs in the range from 0x80000000 to 0xfffffffe are
reserved for private additional encodings and the encoding ID
0xffffffff is reserved and must not be used in EBS files.

[Note: Please use random numbers for your private encoding IDs in the
range 0x80000000 to 0xfffffffe and don't simply start at 0x80000000 in
order to keep the odds of collisions with other peoples' private IDs
small.]

If the need for a new standardized encoding arises, please contact the
EBS coordinator (see Appendix C) and it is likely that other standard
encodings will be added.


Appendix A -- Standardized Attribute Tags

This appendix defines a number of useful attribute tags and the
meaning of the corresponding attribute values. The attribute values
defined here are simple types with the encoding recommended in section
2.2, sequences of these simple types or other special types (e.g.
graphical diagrams or dates).

Attributes that do not refer to individual channels and thus have an
even tag number:

0x00000002 IGNORE (length: any)
           This attribute should just be ignored by any application.
           It allows to remove an attribute without having to copy the
           whole file by just overwriting the tag field of this
           attribute with the tag number of IGNORE. This attribute may
           have any arbitrary value, but applications which delete
           attributes should fill the value with 0x00 bytes so that
           critical information (e.g. patient names in published
           files) will surely be destroyed and not only be made
           invisible.

           This is the only attribute that may appear several times
           in a variable header.

0x00000004 PATIENT_NAME (length: > 0 words, <= 33 words)
           This single-line text-string may contain the full name of
           the person of whom the signals have been recorded.

0x00000006 PATIENT_ID (length: > 0 words, <= 33 words)
	   This single-line text-string may contain additional
           information that is used to identify the patient, e.g. a
           patient number in a hospital, etc.

0x00000008 PATIENT_BIRTHDAY (length: 2 words)
           This numeric string contains the birthday of the patient in
           the 'yyyymmdd' format stored as ASCII digits (not as 16-bit
           UCS-2 characters!). E.g., '19930210' (0x31,0x39,0x39,0x33,
           0x30,0x32,0x31,0x30) means February 10, 1993. (This format
           is one of the date/time formats defined in ISO 8601.)

0x0000000a PATIENT_SEX (length: 1 word)
           This 32-bit integer value is 1 for male and 2 for female
           patients. (The numbers are those specified by ISO 5218.)

0x0000000c SHORT_DESCRIPTION (length: > 0 words, <= 33 words)
           A single-line text-string that summarizes with a few words
           the contents of the file. This attribute is intended for
           listings of many EBS files where each EBS file is listed in
           a single line.

0x0000000e DESCRIPTION (length: > 0 words)
           A multi-line text-string that may tell the user of a file
           everything he/she might need to know in addition to the
           standardized attributes, e.g. the conditions under which
           the recording has been made, etc.

0x00000010 SAMPLE_RATE (length: > 0 words)
           The value is the sample rate in Hz stored as a
           floating point number.  E.g., a sample rate of 1024 per
           second (1024 Hz) might be stored as 0x31,0x30,0x32,
           0x34,0x00,0x00,0x00,0x00 ('1024').

0x00000012 INSTITUTION (length: > 0 words, <= 33 words)
           This single-line string may contain the name of the
           institution, where the file has been recorded, processed,
           etc.

0x00000014 PROCESSING_HISTORY (length: > 0 words)
           This attribute is a sequence of multi-line strings. Each
           string may describe a processing step that has been
           performed in order to produce this file. This might e.g. be
           the command line that has been used to start a program or a
           list of parameters that have been applied.  A program may
           add its own processing description as another string to the
           end of the already existing sequence. Also text information
           about the equipment used to record the data and who did the
           recording or processing can be stored here.  The number of
           multi-line text-strings in this attribute is determined by
           the length of the attribute.

0x00000016 LOCATION_DIAGRAM (length: > 0 words)
           This attribute contains a graphical diagram of the object
           (e.g. brain, head, whole body, ...) from which the recorded
           data has originated or any other diagram that may be used
           to describe the positions of sensors/electrodes. The
           attribute CHANNEL_LOCATIONS may assign to channels
           coordinates in this diagram. In this way, software can
           generate pictures that indicate the position of
           electrodes/sensors on or in the body. This attribute
           contains the background graphic for these pictures and
           attribute CHANNEL_LOCATIONS contains the coordinates for
           channel markers.

           The value of LOCATION_DIAGRAM is a complete Computer
           Graphics Metafile (CGM) as defined in ISO 8632. Only the
           binary encoding of a CGM file as defined in ISO 8632-3 is
           used. The end of the CGM file is filled with 0x00 to a
           length in bytes divisible by 4. All coordinates are
           specified as 16-bit integer values (i.e. VDC TYPE is
           integer and INTEGER PRECISION is 16, which is the default
           for the binary CGM encoding).  The VDC EXTEND should be
           specified for each picture. The attribute may contain
           several pictures in the metafile. As most applications
           won't need the full power of the CGM format, the following
           subset of CGM elements is suggested as a minimum
           requirement for software that uses this attribute:

               BEGIN METAFILE, END METAFILE, BEGIN PICTURE, BEGIN
               PICTURE BODY, END PICTURE, METAFILE VERSION, METAFILE
               ELEMENT LIST, VDC EXTENT, POLYLINE

           Programmers may of course support more CGM functionality
           (e.g. colors, text, arcs, fill patterns, etc.) as defined
           in ISO 8632 and it is possible that later versions of this
           standard will add additional elements to this minimal
           subset if necessary. Programs may ignore additional
           elements and warn the user that the displayed diagram might
           be incomplete or may ignore the whole attribute if
           additional elements are present.  Appendix F gives a short
           introduction into the minimal CGM subset specified here.


Attributes that refer to a special channel layout and that have to be
changed by programs which change, add, move or delete channels:

0x00000001 PREFERRED_INTEGER_RANGE (length: (1+1)*n words)
	   For integer data, this attribute gives display software a
           hint, which value range might be most interesting in the
           data. The value consists of a recommended display minimum
           (32-bit signed integer) followed by a recommended display
           maximum (32-bit signed integer) for each channel beginning
           with channel 1. E.g., if in 16-bit signed integer data most
           good values are in the range -2048 to +2047 in all
           channels, then, if the value of this attribute is 0xff,
           0xff,0xf8,0x00,0x00,0x00,0x07,0xff (repeated for each
           channel), it will be easy for a visualization program to
           find a nice default scaling factor. If both the minimum and
           the maximum value for a channel are equal (e.g. both are
           zero), then no preferred integer range is specified for
           this channel as it would be the case for all channels if
           this attribute were not present.

0x00000003 UNITS (length: >= (1+1)*n words)
           This attribute contains a sequence of physical unit
           specifications, one for each channel. It assigns each
           channel an SI unit (e.g. mA, mV, nT) and a quotient of a
           physical quantity and the encoded sample value that
           represents it. Each unit specification is a sequence of a
           floating point value and a single-line text-string. The
           floating point number is the number with which the sample
           value must be multiplied in order to get the physical value
           (e.g. '0.0025' if a sample value of 400 represents 1.0 mV
           and the specified unit in the text-string is 'mV').  The
           quotient is followed by a single-line text-string with the
           usual abbreviation for the SI unit (not more than 8
           characters (= 20 bytes) long). E.g., the text-string for
           Microvolts is 0x00,0xb5,0x00,0x56,0x00,0x00,0x00,0x00. Only
           linear relations between the physical quantity and the
           sample value in the encoded data can be described with this
           attribute. If the float number is 'not a number'
           (0x00,0x00,0x00,0x00), the physical unit and quantity is
           unspecified for this single channel as it would be for all
           channels if the whole attribute were absent. In this case,
           the unit text-string should also be empty.

0x00000005 CHANNEL_DESCRIPTION (length: >= (1+1)*n words, <= (5+33)*n words)
           The attribute consists of a sequence of 2*n single-line
           text-strings, one pair for each channel.  The first string
           in a pair must not contain more than 8 characters (not
           including the 1 or 2 0x0000-words at the end of each
           string).  This string contains a very short name for the
           channel that might e.g. be used to label it in diagrams,
           etc.  E.g., in EEG recordings, this will often be the name
           of the electrode position in the usual 10-20-system, like
           "F4-A1", "C4-Cz", etc. The second single-line text-string
           in the pair that follows directly behind each short label
           string may contain additional descriptive text for each
           channel that does not fit in the short 8 character label
           (e.g., in EEG recordings information about electrodes with
           bad contact, etc.).

0x00000007 CHANNEL_GROUPS (length: >= 3 words)
           Each channel may belong to zero, one or several groups.  A
           channel group might e.g. be used to group channels from the
           same biological source (e.g., one group for EEG and one
           group for ECG channels) so that they can be more
           conveniently selected together or shown in different colors
           in interactive programs.  The CHANNEL_GROUPS attribute
           contains a sequence of group descriptions. A single group
           description consists of

             - a single-line text-string with a short name for the
               group (e.g. "EEG") with not more than 8 characters,
               followed by
             - a single-line text-string with a description of the
               group (this may of course be the empty string
               0x00000000 if no description is available), followed by
             - an unsigned 32-bit integer number g with the number of
               channels in this group which is followed by 
             - g unsigned 32-bit integer numbers with the numbers of
               the channels (with 0 being the first channel) that
               belong to this group.

           If groups are associated with numbers in a user interface,
           then the first group in this attribute should be assigned
           number 1.

0x00000009 EVENTS (length: any)
           This attribute allows to mark events or time intervals in
           the recording for all channels together or for individual
           channels. Each event or interval belongs to one event list
           and each event list has a short name and a description
           text. In addition, each single event or interval may have a
           description string. The attribute contains a sequence of
           event lists. The number of event lists is determined by
           the length of the attribute. Each event list consists of  

             - a single-line text-string with the short name (not more
               than 8 characters), followed by
             - a multi-line description string, followed by
             - the number e (unsigned 32-bit integer) of
               events/intervals in this event group, followed by
             - a sequence of e events or intervals.

           Each single event or interval in an event list is described
           by the following sequence

             - An unsigned 32-bit integer channel number. The first
               channel is represented by number 0 and 0xffffffff
               indicates that this event or interval is not associated
               with a single channel.
             - An unsigned 64-bit integer number that represents the
               position (the first sample has position 0) of the event
               or the start position of an interval.
             - An unsigned 64-bit integer number that has the value
               0x0000000000000000 for events or represents the length
               of an interval if it has any other value.
             - A single-line text-string (as usual not more than 64
               characters long) may contain a textual description of
               the type of event or interval that has been marked or
               just an empty string.

           The whole event/interval sequence in each event list
           consists of these event/interval descriptions sorted
           ascending by their start sample number (second integer
           value).

0x0000000b RECORDING_TIME (length: 2 or 4 words) 
           This is the time when the recording of the physical signals
           started. Two different formats are allowed, either only the
           date (as in PATIENT_BIRTHDAY) or date and time.

           The date and time format is 'yyyymmddThhmmss' stored as
           ASCII digits (not 16-bit UCS-2 characters!), the ASCII
           character 'T' and one final 0-byte. E.g.  '19930211T153159'
           stored as 0x31,0x39,0x39,0x33,0x30,0x32,0x31,0x31,0x54,
           0x31,0x35,0x33,0x31,0x35,0x39,0x00 means that the
           recording started on February 11, 1993, 3:31:59 pm local
           time.

           If no time is available, the date alone may be stored as
           '19930211' or in bytes 0x31,0x39,0x39,0x33,0x30, 0x32,
           0x31,0x31.

           [Note: These attribute formats are two of the date/time
           formats specified in ISO 8601.  The ASCII 'T' has been
           inserted for compatibility with the ISO standard. This
           attribute has an odd tag number, because it has to be
           modified or removed if a beginning part of a recording is
           removed from an EBS file as then the recording time of the
           first sample number changes.]

           If this attribute is either not exactly 4 words long and
           has not a 'T', a 0x00 and ASCII digits at the specified
           positions, and is not 2 words long and contains only ASCII
           digits, then it should be ignored, because it could be
           another ISO 8601 time format that might be specified as an
           alternative in a future version of this standard if
           necessary (e.g. with time zone, milliseconds, several
           concatenated intervals of time).

0x0000000d CHANNEL_LOCATIONS (length: any)
           This attribute may only be present together with a
           LOCATION_DIAGRAM attribute. It defines the locations of
           sensors/electrodes in the coordinate space (VDC) of the
           graphical diagrams in LOCATION_DIAGRAM. Each channel may
           have zero, one or several positions, i.e. a channel may
           appear on several places in a diagram and in different
           diagrams. A channel may be associated with several single
           points or with pairs of points, which might be represented
           graphically as arrows from the first point to the second
           one.  The value of this attribute is a sequence of
           positions (each is a point or an arrow representing a
           channel) and each position is a sequence of the following
           six 32-bit integer values:

             - channel number (the first channel has number 0,
               unsigned value).
             - picture number (the first picture in the CGM file
               of LOCATION_DIAGRAM has number 0, unsigned value).
             - X1 coordinate (signed value)
             - Y1 coordinate (signed value)
             - X2 coordinate (signed value)
             - Y2 coordinate (signed value)

           Several positions can have the same channel number. For
           point positions, X1 and Y1 are the coordinates of the
           points and X2 and Y2 have the special value 0x80000000. For
           arrow positions, X1 and Y1 are the coordinates of the tail
           and X2 and Y2 are those of the head. Arrows may e.g. be
           used to indicate that a channel represents the difference
           potential between two electrode positions. The coordinates
           are all inside the CGM VDC extent.

0x0000000f FILTERS (length: >= n words)
           Information about the filters that have been applied to
           each channel may be stored here. The attribute contains a
           sequence of filter lists, one for each channel. It may only
           be present if also a SAMPLE_RATE attribute is present. For
           each channel, the filter list consists of a sequence of
           filter specifications followed by 0xffffffff (i.e. the
           attribute value contains at least one final 0xffffffff for
           each channel). The following filter specifications may
           appear in a filter list:

             - lowpass filter: it is specified by a sequence of
               the following three values.

                 o The first 32-bit integer number 0x00000001
                   identifies the filter as a lowpass filter.

                 o The second parameter is the cutoff frequency of the
                   filter [the usual -3 dB limit, i.e. the frequency
                   where the output voltage has been decreased to
                   1/sqrt(2) (71%) of the input voltage] which is
                   stored as a positive floating point value in Hz.

                 o The third value describes the falloff after the
                   cutoff frequency. It stores the attenuation in dB
                   per decade as a negative floating point value. If
                   this value is not known, a not-a-number value
                   (0x00000000) may be used here.

                   [Note: A -20 falloff value represents a filter
                   where the output voltage has decreased to -20 dB
                   (that is 10% of its input voltage) at a frequency
                   which is 10 times the cutoff frequency (decade).
                   This is identical to the alternative description
                   that the filter has a -6 dB/octave falloff,
                   i.e. the output voltage has dropped to 50% (-6 dB)
                   at double cutoff frequency.  In general, a p-pole
                   filter (also known as a filter of order p) is
                   stored as the value -20*p.]

             - highpass filter: it is specified by a sequence of
               the following three values.

                 o The first 32-bit integer number 0x00000002
                   identifies the filter as a highpass filter.

                 o The second parameter is the cutoff frequency of the
                   filter [the usual -3 dB limit, i.e. the frequency
                   where the output voltage has been decreased to
                   1/sqrt(2) (71%) of the input voltage] which is
                   stored as a positive floating point value in Hz.

                   [Note: If you are interested in the time constant t
                   in seconds of a highpass or lowpass filter and you
                   know only the cutoff frequency f in Hz: t = 1 /
                   (2*pi*f).]

                 o The third value describes the falloff before the
                   cutoff frequency. It stores the attenuation in dB
                   per decade as a negative floating point value. If
                   this value is not known, a not-a-number value
                   (0x00000000) may be used here.

             - notch filter: it is specified by a sequence of
               the following three values.

                 o The first 32-bit integer number 0x00000003
                   identifies the filter as a notch filter which
                   attenuates only the frequencies around a single
                   peak frequency.

                 o The second parameter is the peak frequency of the
                   filter (the most attenuated frequency) which is
                   stored as a positive floating point value in Hz.

                 o The third value describes the falloff around the
                   peak frequency. It stores the attenuation in dB per
                   decade as a negative floating point value.  If this
                   value is not known, a not-a-number value
                   (0x00000000) may be used here.


Feel free to use those of the attributes you need, to use none at all
or to define your own attribute tags as described in the next
appendix.


Appendix B -- Tag Number Ranges for Your Own Tags

The standardized attribute tags from Appendix A cover already many
applications, but some people need their own special additional
attributes. This appendix describes, how they should select their
attribute numbers so that collisions are unlikely if they later
exchange their files and software with other institutions, where their
private attribute tag numbers might perhaps have already a different
meaning if they have been selected without care.

In order to avoid collisions, the range of tag numbers is separated
into 4 parts. In this way, the following methods for assigning new
tags are possible:

  - The EBS coordinator (see Appendix C) may assign additional new
    attributes in this text that will have numbers in the STANDARD
    AREA when the need for new common and well-known standard
    attributes arises.

  - The EBS coordinator may reserve intervals in the RESERVATION AREA
    of the tag number range for people or institutions that request
    these intervals from the author. They can then assume that nobody
    else will use tag numbers in this range with different meanings
    and may again reserve subranges within their range to other
    people.

  - Everybody may define his/her own attribute tag without prior
    communication with the EBS coordinator or with someone possessing
    an interval in the RESERVATION AREA by using a tag number in the
    FREE AREA. In order to keep the odds of a collision still small,
    you should use a really random tag number in the FREE AREA [I.e.
    throw a coin for the remaining 29 bits. Calculating the
    probability of at least 2 people having selected the same random
    tag number if c people selected one is left as an exercise for the
    reader.]

  - As many private attribute types are expected to contain
    single-line or multi-line text-strings (e.g. like in DESCRIPTION),
    these private attributes should use numbers from the FREE STRING
    AREA instead of the FREE AREA, so programs that allow to display
    even unknown attributes know that they can display them correctly
    as strings and not only as e.g. hexadecimal numbers.

The ranges of the tag number space are:

  0x0000000                     FINAL TAG
                                must not be used as an attribute tag
                                number.

  0x00000001 - 0x0000ffff       STANDARD AREA
                                attribute tags defined in Appendix A
                                of this text

  0x00010000 - 0x7fffffff       RESERVATION AREA
                                attribute tags defined in intervals
                                that have been individually reserved
                                by the EBS coordinator for people or
                                institutions uniquely. These people
                                may again reserve subintervals of
                                their tag area for other people,
                                etc. So no one has to fear that his
                                attribute tag will be used by someone
                                else with a different meaning by
                                accident which might cause confusion
                                later.  Contact the EBS coordinator if
                                you need your own interval.

  0x80000000 - 0x87ffffff       FREE AREA
                                attribute tags that may be freely used
                                by everyone with the risk that the same
                                attribute is also used by someone else
                                for a different purpose. Please use a
                                random number within this interval and
                                do not simply start at 0x80000000.

  0x88000000 - 0xfffffffe       FREE STRING AREA
                                These tag numbers may be used as
                                freely as those in the FREE AREA, but
                                universal programs that allow to
                                display even unknown attributes may
                                assume that the values of attributes
                                with tags in the FREE STRING AREA may
                                be interpreted as single displayable
                                multi-line text-strings.

  0xffffffff                    ILLEGAL TAG
                                may not be used as an attribute tag
                                number.

Please remember that the least significant bit of the tag number
indicates whether it might be necassary to change the attribute
contents if the data part has been modified and thus can't be selected
at random.


Appendix C -- EBS coordinator

The EBS coordinator is a person or a committee that coordinates the
definition of new standard encoding IDs and attribute tags. The EBS
coordinator may assign new standard encoding IDs in the range
0x00000000 to 0x7fffffff, new attribute tags in the STANDARD AREA (see
Appendix B) and may reserve attribute tag ranges in the RESERVATION
AREA for organizations or individuals. The latest version of this EBS
standard with all defined attributes in the STANDARD AREA and a list
of reserved intervals in the RESERVATION AREA are available from the
EBS coordinator.

The current EBS coordinator is the author of this text,

     Dr Markus Kuhn
     Computer Laboratory
     University of Cambridge
     15 JJ Thomson Avenue
     Cambridge CB3 0FD
     England

     http://www.cl.cam.ac.uk/~mgk25/


Appendix D -- Rationale of the Format Design

The primary design goal of this file format has been to make it just
as complex as necessary, but not too complex. A classical design rule
is that systems which are suitable for 80% of all possible
applications cost only 20% of the price of systems that are suitable
for 99% of all possible applications. So we decided to make the
following limitations in order to keep the costs of implementation
small:

  - all channels have the same data type and encoding
  - all channels have to be recorded with the same sample rate
  - all channels have equal length in time (i.e. have equal number of
    samples. 

These restrictions seem to be acceptable for most kinds of scientific
applications of a bio-signal format, because most recording devices
have similar limits.  Where these fundamental limitations of the EBS
format are not appropriate, several EBS files can be used to store the
complete data set.
 
The overall structure of the file format is dominated by the
separation in 3 parts: fixed header, one or two variable headers and
the data part.

We decided not to use a pure ASCII format, because encoding and
decoding the data part as ASCII numbers separated by space, tab or new
line codes is extremely inefficient in both required storage space and
coding time. E.g. 16-bit signed integers need 48 bits in a fixed
length ASCII decimal encoding (like in '-03445') and e.g. about 28-35
bits for typical 12-bit EEG data if a format with separating spaces
and without leading zeros is used (which is a variable length format
unsuitable for direct addressing of sample values). Even a hexadecimal
format would have doubled the memory requirements and would have made
some very efficient implementation techniques impossible. The fact
that computer systems with word sizes that are not powers of two
(e.g. the old 12-bit PDPs) have nearly completely disappeared in the
scientific environment allows an efficient binary format to be used in
a portable way.

We could have decided to encode at least the headers as ASCII
text. This would have been seen by many people as very easy to
understand, but would have had the following disadvantages:

  - A 7-bit or 8-bit character set (e.g. ISO 8859-1 Latin 1) is only
    acceptable in English speaking countries and perhaps western
    Europe, but not (especially not in clinical environments) in the
    rest of the world. A binary format allows us to use UCS-2 which
    won't make implementation more difficult if the conversion to the
    local character set is performed in the string read/write
    procedures.

  - Some people try to modify and fix ASCII header formats with their
    editors and often destroy data in this way, either because they
    haven't read the specification and don't know exactly what they
    change or because the editor corrupts the binary data part. A
    binary header format discourages these efforts and changes can
    only easily be made with programs where the developers must in any
    case be familiar with the specification and where consistency
    checks are possible.

  - If length indicators instead of line feeds are used to separate
    attribute values, arbitrary attribute values (e.g. even digitized
    photos, voice annotations, etc.) can be stored as attributes
    without problems. In an ASCII notation, awkward encodings would be
    necessary for these attribute types.

  - ASCII attribute notations would have made it very difficult to add
    a second variable header ('footer') after the data part.

  - No portable standard exists for ASCII files. At least four line
    separation conventions are known (CR+LF, CR, LF, NL).

We did not use data format specification languages like ISO 8824
(ASN.1) and complex binary data format syntaxes like ISO 8825 (BER).
These standards have been designed for much more complicated
applications. They require a significant amount of time (the ASN.1
standard is over 100 pages long) and experience for implementation,
which would make an ASN.1 based format not appropriate in a scientific
environment (at least not until good ASN.1/BER tools are widely
available).  Consequently, we designed a much simpler header format
that won't force a programmer to learn complex and difficult universal
format specifications that will never be fully exploited in this
special application field.

The fixed header contains only the information needed by all programs
in order to read in the data set and in order to determine whether the
data can be read in at all or if the file is encoded in an unsupported
way. The purpose of the first 8 bytes is to allow programs that can
read in other formats in addition to EBS to detect if the current
input file has been stored in EBS format or not.  We obviously
selected the name of the format in ASCII characters as the first 3
bytes. The remaining 5 bytes have been selected so that they will most
likely be altered if something has been made wrong during a file
transmission. These bytes are:

  0x94: An arbitrary byte with the most significant bit set to 1.
        Not 8-bit clean channels or character set translation
        functions will likely change this byte. It should also be
        changed as a version indicator if incompatible changes are
        made to this specification.
  0x0a: ASCII control character line feed (LF). File transfer
        programs sometimes add a 0x0d (CR) after this byte.
  0x13: ASCII flow control character Ctrl-S stops transmission
        on some channels and is removed on others.
  0x1a: Ctrl-Z is the MS-DOS end-of-file marker and will cause
        problems if the file has not been opened in binary mode.
  0x0d: ASCII control character carriage return (CR) will be
        removed by some file transfer programs.

These additional test bytes have only been added, because they are
very easy to implement and might help to detect common file handling
errors more quickly. They do NOT guarantee data integrity. We felt
that mechanisms for data integrity like checksums, digital signatures
and forward error correction codes should be applied to complete EBS
files with more general packing/encryption tools where this is
necessary and should not be included in the EBS specification.

Some system tools like graphical file managers detect file types by
characteristic first bytes. In this way, EBS files can easily be
represented with a suitable icon.

In order to make it easier to read in the file headers as memory
mapped files with processors that can only read 32-bit integer values
starting on 32-bit boundaries in the memory, all 32-bit values in the
EBS file start on 32-bit boundaries. In addition, the two 64-bit
values in the fixed header start on 64-bit boundaries. The consequence
of this layout is that all strings, etc. in the headers have to be
padded with 0x00 bytes to the next 32-bit boundary, but this can
easily be done (together with the UCS-2 translation) in the string
read/write routine, etc. once and for all times.

The number of samples must be specified in the fixed header, because
it can not be determined for all encodings from the file length,
because it is in some applications necessary to know it in advance for
memory allocation and because it is necessary to find the first sample
value of each channel in CHANNEL-BASED ORDER encodings. All integer
values in the fixed and variable headers are stored with at least
32-bit, because today's computers can operate easily with these values
and because more integer formats (e.g. also 8-bit and 16-bit) would
need more read/write functions and would make 32-bit alignment more
difficult.

The variable header is one of the reasons for the flexibility of the
format.  Arbitrary information can be stored in it, but programs only
have to pick out the attributes which they are interested in. It would
have been possible to specify the length of the first variable header
part or the start of the data part in the fixed header. But this would
have made it necessary to calculate the length of the variable header
in advance which is quite clumsy to implement or it would have been
necessary to jump back to the fixed header after the variable header
had been written which makes pipeline processing and sequential file
access impossible. Jumping over the variable header by looking at the
attribute length indicators is quite easy to implement on the other
hand.

It is better to have the variable header stored in front of the data
part if it should be readable while the data is still written or if
pipeline processing is used. A variable header at the end of the file
has the advantage that modifications to it are possible without having
to make a copy of the whole file in order to move the encoded data
(which might often comprise many hundred megabytes and would need a
lot of time and temporary storage to copy). Consequently both places
are available for variable header information.

In the variable header, one of the simple types must be able to
represent real numbers. Among the alternatives

  - a fraction of two 32-bit integer numbers
  - an 8-byte double precision floating point number according to
    IEEE 754 (the representation used by most floating point hardware
    today).
  - a string representation of the written floating point number
    (e.g. '3.14E-9')

we decided to use the string representation, because the value range
of the fraction is quite limited and some programmers might find it
difficult to implement a correct read/write procedure for IEEE 754
floating point numbers if the internal representation used by the
system is a different one.  The string representation seemed to be the
easiest and most portable alternative and allows arbitrary
precision. Efficient coding is important in the data part but in most
attributes not in the relatively small variable header.

The currently defined signed 16-bit integer data type for the data
part seems to be suitable for nearly all applications, because it
allows efficient processing of data from 12-bit A/D converters and
because converters with more than 16-bit are used only by very few
people. A 12-bit data type would have made processing a little bit
more difficult and the storage gain is still higher with the 8-bit
difference encoding of 16-bit values. However, adding further data
types to EBS like 8-bit signed integers and 4-byte floating point
numbers is easily possible.

The TIME-BASED ORDER format is the natural choice for recording
equipment and other applications where the number of samples is not
known in advance. The CHANNEL-BASED ORDER is much more efficient for
processing applications that use only the data of one channel at a
time, because then, only the bytes for this channel have to be fetched
from mass storage devices. As there are good reasons for both
alternatives and as they can easily be converted, both are supported
in the EBS format. In a typical EBS usage scenario a conversion
program from a vendor specific recording equipment to EBS is necessary
and it is a good idea to do the TIME-BASED ORDER to the more efficient
CHANNEL-BASED ORDER conversion in this program.

The only compatibility problem for binary formats is that there exist
two different integer encodings on the hardware market: Bigendian and
Littleendian. Both alternatives are supported in EBS, because they can
easily be converted and because this allows at one Institution all
data to have the format optimized for the local hardware. However, the
performance gains of a suitable byte sex are not as serious as those
of the decision for a binary encoding or for the CHANNEL-BASED ORDER,
so using the Bigendian format as the preferred format (i.e. CIB_16) is
encouraged.

The number of predefined attributes has been limited as much as
possible, because this makes the implementation of most of them more
likely. It would have been possible to add much more text attributes
(e.g. who did the recording, type of equipment, diagnosis, ...), but
all of this information can easily be included in the DESCRIPTION or
in the PROCESSING HISTORY attribute. The INSTITUTION attribute has
been added as an exception to this rule, because some people prefer to
have this string printed or displayed separately at a prominent place
by their software. The attributes PROCESSING_HISTORY, CHANNEL_GROUPS
and EVENTS have no special integer value with the number of processing
steps, channel groups or events, because this allows attribute
management functions that simply add a few bytes at the end of an
attribute value to be used universally to add another item to these
lists.


Appendix E -- Implementation Hints

A program reading an EBS file might e.g. look like the following one
which is written in ANSI C. This example fragment of program reads the
fixed and both parts of the variable header.  The patient name is
printed if present and all other attributes will be ignored. The final
fseek() call jumps to the beginning of the recorded sample values of a
selected channel.


/* Demo program for reading EBS files */

#include <stdio.h>

/* for old (non ANSI C) versions of stdio.h */
#ifndef SEEK_SET
#define SEEK_SET 0
#endif

/* Read in a Bigendian 32-bit integer from a file */
long fgeti32(FILE *f)
{
  long i;

  i =  (long) getc(f) << 24;
  i |= (long) getc(f) << 16;
  i |= (long) getc(f) << 8;
  i |= (long) getc(f);
  
  return i;
}

int main(int argc, char **argv)
{
  FILE *fin;
  unsigned long samples_hi, samples;
  unsigned long length_hi, length;
  int channels;
  unsigned long tag;
  unsigned long attribute_length;
  long pos, data_start;
  int second_part, ready;
  unsigned short c;

  /* ... open fin, etc. ...*/

  /* read fixed header */
  if ((fgeti32(fin) != 0x45425394) || 
      (fgeti32(fin) != 0x0a131a0d) ||
      (fgeti32(fin) != 0x00000001) ||
      feof(fin)) {
    fprintf(stderr, "Input file is not in EBS CIB-16 format!\n");
    exit(1);
  }
  channels   = fgeti32(fin);
  samples_hi = fgeti32(fin);     /* number of samples: 2x32-bit */
  samples    = fgeti32(fin);
  length_hi  = fgeti32(fin);     /* length of data part: 2x32-bit */
  length     = fgeti32(fin);
  if (samples_hi != 0 ||
      (length_hi != 0 && !(length_hi == 0xffffffff && length == 0xffffffff))) {
    fprintf(stderr, "Input file is too long for this program!\n");
    exit(1);
  }

  /* read variable header */
  second_part = 0;
  ready = 0;
  do {
    /* read attributes until final tag appears */
    while ((tag = fgeti32(fin)) != 0) {
      attribute_length = fgeti32(fin);
      pos = ftell(fin);
      switch (tag) {
      case 4: /* PATIENT_NAME */
        printf("patient name is ");
        do {
          c = fgetc(fin) << 8;     /* read in 16-bit Unicode character */
          c |= fgetc(fin);
          if (c) {
            if (c < 127) putchar(c);  /* print only ASCII characters and */
            else putchar('?');        /* '?' for other Unicode characters */
          }
        } while (c);
        printf(".\n");
        break;
      default:
        /* just ignore other attributes */
        break; 
      }
      /* jump to the next attribute */
      fseek(fin, pos + attribute_length * 4, SEEK_SET);
    }
    if (!second_part) {
      /* if there is a second variable header part then remember
         the start of the data part and jump over it */
      data_start = ftell(fin);
      if (length_hi != 0xffffffff || length != 0xffffffff) {
        second_part = 1;
        fseek(fin, data_start + length * 4, SEEK_SET);
      } else ready = 1;
    } else ready = 1;
  } while (!ready);

  /* read data */
  fseek(fin, data_start + (<channel-of-interest> - 1) * samples * 2,
        SEEK_SET);
  /* ... */

}


Library functions for reading/writing/modifying EBS files allow much
easier EBS file management.


Appendix F -- The CGM Format
----------------------------

This appendix contains only a very brief introduction into the CGM
graphic file format which is used to store graphical diagrams in the
LOCATION_DIAGRAM attribute. This description might be sufficient for a
primitive implementation of the minimal subset defined for
LOCATION_DIAGRAM, but implementors are encouraged to read the official
standard (ISO 8632-1 for the specification of the functionality and
ISO 8632-3 about the binary encoding) or at least a book about CGM. In
case of ambiguities, this appendix should be ignored.

A binary encoded CGM file consists of a sequence of CGM elements very
similar to the attributes in the EBS variable headers. Most integer
values are 16-bit long, are stored with the most significant byte
first (Bigendian) and have a 16-bit alignment. The elements have a
class number and an identifier number (both together used like the EBS
tag number) and a length indicator. Two forms are possible: a
short-form element for element parameter data lengths between 0 and 30
bytes and a long-form element for arbitrary parameter lengths.

A short-form element starts with a 16-bit header of the form


       15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0    bit
      -------------------------------------------------
      |   class   |     identifier     |    length    |
      -------------------------------------------------


and is followed by the number of data bytes indicated in the lower 5
bits which are the parameters of this element. If the number of data
bytes is odd, a single zero padding byte follows which gives the whole
element including the two header bytes an even number of bytes and
preserves the 16-bit alignment. The data length in a short form
element may be between 0 and 30 bytes.

Long-form elements start with a 32-bit header of the form


       15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0    bit
      -------------------------------------------------
      |   class   |     identifier     | 1  1  1  1  1|   word 1
      +-----------------------------------------------+
      | P|               partial length               |   word 2
      -------------------------------------------------


followed by between 0 and 32767 bytes. If the bit P (partition flag)
is 1, then after the indicated number of data bytes another word with
a partition flag and a 15-bit partial length field follows which is
again followed by the indicated number of data bytes and if its P bit
is still 1, another length word will follow after the data bytes,
etc. A very long long-form element might look like this:


       15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0    bit
      -------------------------------------------------
      |   class   |     identifier     | 1  1  1  1  1|   word 1
      +-----------------------------------------------+
      | 1|               partial length               |   word 2
      +-----------------------------------------------+
      ...  'partial length' bytes ...
      +-----------------------------------------------+
      | 1|               partial length               |
      +-----------------------------------------------+
      ... 'partial length' bytes ...
      +-----------------------------------------------+
      | 0|               partial length               |
      +-----------------------------------------------+
      ... 'partial length' bytes ...


A zero padding byte is added again after the element if the number of
bytes of the element is odd in order to preserve the 16-bit
alignment.

The following elements are used in the minimal subset for
LOCATION_DIAGRAM:


      element name          class      identifier

      no-op                   0            0
      BEGIN METAFILE          0            1       *
      END METAFILE            0            2       *
      BEGIN PICTURE           0            3
      BEGIN PICTURE BODY      0            4
      END PICTURE             0            5
      METAFILE VERSION        1            1       *
      METAFILE ELEMENT LIST   1           11       *
      VDC EXTENT              2            6
      POLYLINE                4            1

      * these elements must be present in every CGM file

A CGM file (and consequently also a LOCATION_DIAGRAM value) starts
with a BEGIN METAFILE element which is followed by a part called
'metafile descriptor'. After the metafile descriptor elements follow
zero, one or several pictures and finally an END METAFILE element.
No-op elements can have any parameter length and have to be ignored.

   ----------------------------------------------------------------------
   | BEGIN METAFILE | metafile descriptor | pictures ... | END METAFILE |
   ----------------------------------------------------------------------

Reading applications may ignore the data part of BEGIN METAFILE and
simple writing applications should put a single zero byte in the data
part of this first element (followed by a padding byte). The END
METAFILE element has no parameters, its length field is always
zero. The metafile descriptor must contain at least the two elements
METAFILE VERSION and METAFILE ELEMENT LIST.  Simple reading
applications may just ignore them and simple writing applications
should give METAFILE VERSION a single 16-bit integer value 1 as its
parameter. The parameter of METAFILE ELEMENT LIST is a list of the
class and identifier codes of the non-mandatory elements that might
appear in the file (which allows to determine quickly which subset of
CGM is supported by the application that wrote the file).  Programs
that write only CGM files using this minimal subset should use the 11
16-bit integer numbers 5 (the number of elements specified), 0, 3, 0,
4, 0, 5, 2, 6, 4 and 1 as parameters to METAFILE ELEMENT LIST.

The BEGIN METAFILE element and the suggested metafile descriptor look
like this

  0x00,0x21,0x00,0x00,
  0x10,0x22,0x00,0x01,
  0x11,0x74,0x00,0x05,0x00,0x00,0x00,0x03,0x00,0x00,0x00,0x04,
  0x00,0x00,0x00,0x05,0x00,0x02,0x00,0x06,0x00,0x04,0x00,0x01

The END METAFILE element is

  0x00,0x40.

After the metafile descriptor elements, a sequence of pictures
follows. Each picture has the following structure:

   -----------------------------------------------------------------------
   | BEGIN PIC. | pic. descr. | BEGIN PIC. BODY | pict. elem. | END PIC. |
   -----------------------------------------------------------------------

Each picture starts with a BEGIN PICTURE ELEMENT and ends with an END
PICTURE element. Reading applications may ignore the parameter of
BEGIN PICTURE and simple writing applications can just use a single
zero byte (as with BEGIN METAFILE). The elements BEGIN PICTURE BODY
and END PICTURE have no parameters (i.e., their length field is always
zero).  The BEGIN PICTURE BODY element separates the picture
descriptor elements from the elements that represent the graphical
objects (here only lines) of the picture.

The only required picture descriptor element in this minimal subset of
CGM is VDC EXTENT. It has 4 16-bit signed integer values as parameters
(length 8 bytes): The X coordinate of the lower left corner, the Y
coordinate of the lower left corner, the X coordinate of the upper
right corner and the Y coordinate of the upper right corner. These two
points define the VDC extent, a rectangular area which contains the
parts of the coordinate space that contains the diagram. Display
software must be capable of scaling the VDCs (virtual device
coordinates) used in the picture elements so that the VDC extend is
always mapped to a suitable size on the output device. This scaling
should use the same scaling factor for each axis in order to preserve
the aspect ratio. The positive direction of the X and Y axis is also
determined by the VDCs of the lower left and the upper right points
given in the VDC EXTENT element.

The only required graphical picture element in this subset that may
appear between BEGIN PICTURE BODY and END PICTURE BODY is POLYLINE.
This element represents a sequence of connected lines. Its parameters
are 2*p 16-bit signed integer values (length field: 4*p) which are
VDCs of p points stored as pairs of X and Y coordinates. The line is
drawn from the first point to the second, from the second point to the
third, ..., and from point p-1 to point p.

If unknown elements appear in a CGM file, the application should
either warn the user that it might not be able to display the full
diagram correctly and ignore the unknown elements or it may ignore the
whole CGM file.

[Note: Using the CGM standard as the format for the LOCATION_DIAGRAM
attribute allows easy extension of the graphical capabilities of this
attribute, because only the used subset of CGM has to be enlarged and
no new graphic format extensions have to be invented. In addition it
allows to use existing CGM tools for designing the diagrams.]


Appendix G -- Glossary
----------------------

attribute -- An information field identified by an attribute tag
number and delimited by a length indicator which may contain an
arbitrary sequence of bytes with additional information describing the
bio-signal data stored in an EBS file. The one or two variable header
parts of an EBS file contain the attributes.

attribute value -- This is the sequence of bytes contained in an
attribute.  Its length is always a multiple of four bytes and may be
up to 16 gigabytes.

Bigendian -- In 'Gulliver's travels' by Jonathan Swift a politician
which insists on opening an egg on the big end first. In computer
architecture the property of a microprocessor to store the more
significant bytes of a word at the lower addresses in
memory. Littleendians do it the other way.

CGM (computer graphics metafile) -- A file format for storage of
pictures as collections of graphical elements (e.g. lines, text,
circles, etc.) defined in ISO 8632.

channel-based order -- A data part layout in which the sample values
of a single channel for the complete recording time are stored
together sorted by the recording time. All these channel recordings
are stored together sorted by their channel number.

compressed encoding -- A storage representation of sample values that
is more efficient in storage capacity than the natural encoding of
using equally sized machine words for each sample value independently
of all other sample values.

data part -- This is the part of an EBS file that contains nothing but
encoded bio-signal data values (and up to 3 zero padding bytes at the
end if a second variable header is present).

EBS (extensible bio-signal file format) -- The type of computer file
specified in this text suitable for the exchange, processing and
storage of bio-signal recordings and additional information.

first variable header part -- The attributes and the first final tag
that are located directly after the fixed header and before the data
part.

fixed header -- The first 32 bytes of every EBS file form the fixed
header, which contains information needed by all programs that process
EBS files.

ISO -- Short name for the 'International Organization for
Standardization' in Geneva. You can order ISO standards from your
local national standards body (e.g. ANSI, DIN, BSI, AFNOR, etc.).

Littleendian -- see Bigendian.

multi-line text-string -- A simple data type that is used as a part of
many attribute value syntaxes. If not otherwise specified, it should
not contain more than 64 characters per line encoded in the UCS-2
character set. Lines are separated by the line feed control character
0x000a.

recording -- A complete collection of all sample values within a
certain interval of time measured at a certain sample frequency.

sample value -- A numeric representation of a physical or other
quantity at a point of time associated with a channel.

second variable header part -- The attributes and the final tag that
are located directly after the data part. This part of the variable
header may be absent.

single-line text-string -- A simple data type that is used as a part
of many attribute value syntaxes. If not otherwise specified, it
contains up to 64 characters encoded in the UCS-2 character set and no
line feed control characters.

tag -- An attribute tag is a 32-bit number that identifies the type of
an attribute, i.e. it indicates the syntax and semantic of an
attribute value.

time-based order -- A data part layout in which the sample values of a
single point in time are stored together sorted by the number of their
channel. All collections of these samples for a single point in time
are stored together sorted by their recording time.

UCS-2 -- The 2-byte encoding of the 'Universal Multiple-Ocetet Coded
Character Set' (UCS) defined in ISO 10646. This character set is also
known under the more popular name 'Unicode'.

variable header -- The part of an EBS file that contains the
information which is only needed by some applications. This
information is stored in attributes.


Appendix H -- Revision history
------------------------------

Revision history of this document:

1993-10-19  - First published version (presented at the 24th conference
              of the Deutsche Gesellschaft für Medizinische Physik in
              Erlangen)

1994-03-02  - Minor editorial corrections

2008-02-??  - EBS co-ordinator contact address updated
            - Appendix H added
            - Minor editorial corrections