Software

Fairisle Port Controllers

Types of bus error: A ``Barfbarf'' is a particular type of bus error generated voluntarily on a port controller when the Xilinx receive buffer overflows. It indicates that you are flooding the port controller with more data than it can handle. All other bus errors are more serious and should be investigated.

Kernel panics with message: Kernel contains wrong hardware! Xin detected The Xilinx programming bits are linked into the kernel. This message is produced if the hardware and software are inconsistent. For FPC2 n will be 2 or 3. There is currently no test for Xi6 bits.

Kernel hangs after printing: If it hangs here check the fabric clock This message is printed just before the first access through the Xilinx to the SRAM. The most likely reason for this failing is a missing or bad crystal in the fabric or clock board.

Kernel hangs after programming the Xilinx chip: As above, but for kernels built before the message was put in (e.g. many boot ROMs)

When a port controller halts, it sits in a loop flashing its lowest red LED at about 1 Hz and polling the serial line input register. Sending Control C to a machine in this state will cause it to reenter its boot ROMs (except old kernels e.g. some boot ROMs).

All Machines

ARP host %I cannot be active on pod %d. : Machine %I has been seen active on a pod for which this wanda machine has been instructed (via its routeing information) should be on a different pod. The active request has been ignored by the kernel. Although there are some cases when this scenario can legally occur (typically when there are multiple routeing paths between networks with different routers load balancing). It was put in because people were booting machines with random routeing tables and then grumbling or getting confused when weird things happened (i.e. multiple active gateways leading to bad context requests)

GATEWAY-controls : This gives a comma separated list of things that this gateway will search to decide whether to gateway a particular connection request. Each of these entries is of the form (Bool=net-mask->net-mask). Bool is either ``T'' or ``F'' or ``S''. ``S'' implies true both in the forward direction and in the opposite direction. It should be pronounced as ``It is true / false / symmetrically-true that net under mask may connect to net under mask''.

Note that one shouldn't be generating Gateway controls by hand anyway - use the sc tool.

GATEWAY-redirectors : This provides a hack for moving a service from one machine to another. It also allows deliberate loop introduction because the target address will be transformed. It is a comma separated list of the form "%I-%I->%I-%I". The values are host and port not host and net so explicit entries are needed per service.

Hostinfo When machines boot they sometimes fail to get their hostinfo file. This is usual apparent because they don't know who they are. This is usually due to one or more of the following:

The hostinfo file is not executable by UID wanda.
The hostinfo file contains a bug such as referring to a non-existant file.
The kernel has not been stripped.
The kernel was built without user processes configured.
The current directory for the bootserver must be $W/boot/hostinfo.

The MSRPC library (not MSRPC2)

Client side of MSRPC: If a client call returns MSRPC_CANT_ENCODE_ARGS, check that you haven't tried to encode arguments which require a buffer in excess of 2 KB to store the MSDR format of the arguments. If you have then you have no option but to reduce the size of the arguments you are encoding. The same applies at the server side but the error may be MSRPC_CANT_ENCODE_RESULTS.

The MSRPC library looks for a buffer of size WandaIPCMaxMsg(PROTOCOL_MSNL) by default. This, of course, may be less than 2KB if you are working on a kernel which has no large IO bufs. If the MSRPC library can't get hold of a buffer of the largest size then it will return the error MSRPCSTAT_CANT_SEND. You then need to do a clnt_control() to set the buffer size to the size of the largest IO buf which the library can expect to get its hands on.

If you get things like MSRPCSTAT_NO_MEMORY then you have run out of memory, possibly because you are neglecting to call clnt_freeres() after finishing with the results returned to you from a server.

If the library crashes, your name is probably Paul Barham.

Name Server

Symptom: Wanda: gethostbyname() or WandaGetHostName() or WandaGetHostByName() fails. Unix: gethostname() or getmsnlhostname() fails.

Problem: On Unix, if you're using gethostname() to obtain the MSNL address of a host, then you're a bozo. Use getmsnlhostbyname() instead. Make sure you link with the MSRPC library (see the chapter on MSRPC).

If you are attempting to get the MSNL address of a host using the appropriate routine, and are getting the reply ``no such host'', then the following could be the reason: When you do a gethostbyname() (Wanda) / getmsnlhostbyname() (unix) the libraries attempt to obtain information from the MSNL name server. The name server is normally up and running on a service machine in the lab. See the chapter on the MSNL Name Server. If the name server has crashed, then report the error to someone responsible and it will be restarted. You can see if it has crashed by using the program hostlookup <name> where <name> is the symbolic name of the host which you are trying to look up. Hostlookup will return the appropriate results if an entry is in the name server. It will return ``no such host'' if there is no host entry for the name, and it will report an error if it cannot find the name server.

To find the name server to use, both the libraries (Wanda & Unix) and the hostlookup program use the environment variable MSNL_NAME_SERVERS to obtain the address of the name server. This environment variable MUST be present to permit MSNL name resolution. Any number of alternative name servers can be listed in this variable. The format is:

Where h is the octet in internet dotted notation of the MSNL address of host h. The port 0.0.0.223 is the MSNL name server port.

Information returned from the name server is returned to the calling process by means of a static area of memory. Results of such calls should thus be copied to ensure they remain valid. On Wanda there is no concurrency protection for this static memory area. Session bindings to the name server are cached.

Next: Tools Up: Folklorebugs, kludges, hacks Previous: Hardware

Folk