The xilinx DMA controller



next up previous
Next: Status and Control Up: Fairisle Port ControllerDesign and Previous: The cell buffer

The xilinx DMA controller

The xilinx chip, which controls the network part of the card, consists of five main sections: the processor interface, the SRAM interface, reception from the network, injection into the fabric and transmission to the network. The general design of each of these sections is outlined below, more specific information appears in the individual design documents.

SRAM interface

The xilinx is the only device which drives the SRAM, arbitrating between its DMA use and the processor. The arbiter controls a multiplexor which drives the SRAM address bus and enables of SRAM data bus drivers. The address used in DMA operations is generated from registers internal to the xilinx chip.

Part of the SRAM data bus is used directly by the xilinx, to load its internal registers and output status information. The width of data bus to the xilinx depends on the part used, for a 3042 data lines 0..11 and 30..31 are available, a 3064 also has lines 12..15 connected, the 3090 has all the data lines connected. When a cell header is read the VCI will be on the bottom 16 bits. If the SAR1A or FAS protocols are used then the start and end bits will be on bits 30 and 31.

Injection to the fabric

The xilinx will start sending cells into the fabric if there are buffers in the transmit queue and the processor has issued a RamRdGo command since reset or a fabric NACK. Transmission continues until the queue is empty, or a cell is NACKed by the fabric or destination port controller. If transmission stops because the queue becomes empty then it will restart as soon as a new item is put in the queue - strobing of the RamRdGo line is only required after NACK or xilinx reset.

Fabric output to network

Data from the fabric is moved under control of the xilinx ``fab2fifo'' section, into one (or both) of the two fifos. The cell arriving from the fabric has had the fabric routeing information removed but still contains the port controller route byte. The route byte is shown in table refpcrou. The cell Active bit is always a 1, and is used to indicate the start of the full cell in the same way as in the fabric. The loopback bit indicates whether the cell is destined for transmission or for looping back into the port controller. Using the Xi5 design insertion is possible into both fifos by setting the SMcast bit in the route byte.

Bytes arriving from the fabric are latched externally to provide a cycle delay for decoding the routeing information. On recognition of a cell the loopback bit is used to indicate which fifo should be used for the duration of this cell. Bytes are then strobed into the fifo for the number of bytes in a cell. The fifo contains 256gif nine bit words. The ninth bit is used as a start of cell marker and is set as the portc route byte is strobed into the fifo. This byte will be removed by the transmission card and replaced by the appropriate start of cell marker for the transmission medium. The data associated with this byte is ignored by the transmission system.

The Half Full signal from the selected fifo is used to generate the Ack signal to the fabric. Thus as soon as the fifo becomes half full back pressure is applied. Since the transmitting port controller only senses the Ack signal for a single byte time it is possible for almost an entire cell to be put in the fifo past its half full point.

Reception from network or loopback fifo

The reception system expects to read from a fifo, a reception card must therefore either use a fifo or have the same interface as a fifo. The start of cell control sequence will have been replaced by a byte with the SOC bit set. The data in this byte is currently not defined and the byte is discarded. Data transmitted through the loopback fifo will already be in this format.

If either fifo is seen to contain data then it is read from, if both contain data then they are read alternately. When a SOC bit is seen from one of the fifos it is selected. Bytes are read from the selected fifo and assembled into words in the latches. When a word has been assembled it is written to the SRAM. If a new SOC is seen part way through reading a cell then the first cell is discarded and the new one read into the same cell buffer, a status bit is set to indicate this happening.

If there are no buffers available for reception then only the receive fifo is read, and any cells found will be read out of the fifo and discarded. This ensures that the fifo does not fill, so cells are dropped safely as they emerge from the fifo rather than in a uncontrolled way on the way into the fifo. The loopback fifo will apply back pressure across the fabric when it fills so it is safe not to discard cells from it. When cells are discarded a status bit is set.

This section of the chip caused many of the problems found in the implementation. It has been reimplemented many times, to improve the operation speed of the xilinx. The problems are caused by the asynchronous nature of the problem - the line speed is slower than the port controller clock speed, so care has to be taken with the fifo going empty. In both FPC1 and FPC2 the fifos used required pulses with a 25:75 duty cycle and generated the empty signal very late in the read. To avoid some of these problems the FPC3 uses clocked fifos, which take a free running clock and an enable signal. Despite these reimplementations the overall design of this section did not change between versions.

The processor interface

The processor control signals are decoded and I/O accesses to the xilinx space (0x3180000 - 0x31FFFFF) are detected. The use of I/O space causes the access cycle to be asynchronous, allowing the xilinx to arbitrate the memory access. The xilinx decodes addresses in this range to allow processor access to internal registers and the buffer SRAM Addresses 0x3180000 and 0x31C0000 are ARM immediate constants and so it is advantageous to place things at these addresses. The results of performing a byte write are undefined, but it usually results in all 4 bytes in the addressed word being set to the value given.

Since an asynchronous memory access is used it will take at least 2 cycles of the ARM REF8M 8MHz clock to perform the access. There are two designs of the ARM interface section ``newarm.hdl'' and ``slowarm.hdl''. The slow version goes to extreme lengths to ensure the synchronisation between the ARM REF8M clock and the fabric clock, the faster version makes the assumption that and therefore needs less synchronisation. The slow interface takes a minimum of three REF8M cycles, and the faster a minimum of two.



next up previous
Next: Status and Control Up: Fairisle Port ControllerDesign and Previous: The cell buffer



Mark Hayter and Richard Black