Fairisle Port Controllers
Types of bus error: A ``Barfbarf'' is a particular type of bus error generated voluntarily on a port controller when the Xilinx receive buffer overflows. It indicates that you are flooding the port controller with more data than it can handle. All other bus errors are more serious and should be investigated.
Kernel panics with message:
Kernel contains wrong hardware! Xi
n detected
The Xilinx programming bits are linked into the kernel. This message
is produced if the hardware and software are inconsistent. For FPC2
n will be 2 or 3. There is currently no test for Xi6 bits.
Kernel hangs after printing:
If it hangs here check the fabric clock
This message is printed just before the first access through the Xilinx
to the SRAM. The most likely reason for this failing is a missing or bad
crystal in the fabric or clock board.
Kernel hangs after programming the Xilinx chip: As above, but for kernels built before the message was put in (e.g. many boot ROMs)
When a port controller halts, it sits in a loop flashing its lowest red LED at about 1 Hz and polling the serial line input register. Sending Control C to a machine in this state will cause it to reenter its boot ROMs (except old kernels e.g. some boot ROMs).
All Machines
ARP host %I cannot be active on pod %d.
:
Machine %I has been seen active on a pod for which this wanda machine
has been instructed (via its routeing information) should be on a different
pod. The active request has been ignored by the kernel. Although there are some
cases when this scenario can legally occur (typically when there are multiple
routeing paths between networks with different routers load balancing). It was
put in because people were booting machines with random routeing
tables and then grumbling or getting confused when weird things happened
(i.e. multiple active gateways leading to bad context requests)
GATEWAY-controls : This gives a comma separated list of things that this gateway will search to decide whether to gateway a particular connection request. Each of these entries is of the form (Bool=net-mask->net-mask). Bool is either ``T'' or ``F'' or ``S''. ``S'' implies true both in the forward direction and in the opposite direction. It should be pronounced as ``It is true / false / symmetrically-true that net under mask may connect to net under mask''.
Note that one shouldn't be generating Gateway controls by hand anyway - use the sc tool.
GATEWAY-redirectors :
This provides a hack for moving a service from one machine to another.
It also allows deliberate loop introduction because the target address will be
transformed. It is a comma separated list of the form "%I-%I->%I-%I"
.
The values are host and port not host and net so explicit entries are
needed per service.
Hostinfo When machines boot they sometimes fail to get their hostinfo file. This is usual apparent because they don't know who they are. This is usually due to one or more of the following:
The MSRPC library (not MSRPC2)
Client side of MSRPC:
If a client call returns MSRPC_CANT_ENCODE_ARGS
,
check that you haven't tried to encode arguments which require a buffer in
excess of 2 KB to store the MSDR format of the arguments. If you have then
you have no option but to reduce the size of the arguments you are encoding.
The same applies at the server side but the error may be
MSRPC_CANT_ENCODE_RESULTS
.
The MSRPC library looks for a buffer of size
WandaIPCMaxMsg(PROTOCOL_MSNL)
by default.
This, of course, may be less than 2KB if
you are working on a kernel which has no large IO bufs.
If the MSRPC library can't get hold of a buffer of the largest size then it
will return the error MSRPCSTAT_CANT_SEND
.
You then need to do a clnt_control()
to set the buffer size to the size of the largest IO buf which the library can
expect to get its hands on.
If you get things like MSRPCSTAT_NO_MEMORY
then you have run out of
memory, possibly because you are neglecting to call clnt_freeres()
after finishing with the results returned to you from a server.
If the library crashes, your name is probably Paul Barham.
Name Server
Symptom:
Wanda: gethostbyname()
or WandaGetHostName()
or WandaGetHostByName()
fails.
Unix: gethostname()
or getmsnlhostname()
fails.
Problem:
On Unix, if you're using gethostname()
to obtain the MSNL address of a
host, then you're a bozo. Use getmsnlhostbyname()
instead.
Make sure you link with the MSRPC library (see the chapter on MSRPC).
If you are attempting to get the MSNL address of a host using the appropriate
routine, and are getting the reply ``no such host'', then the following could
be the reason:
When you do a gethostbyname()
(Wanda) / getmsnlhostbyname()
(unix) the libraries attempt to obtain information from the MSNL name server.
The name server is normally up and running on a service machine in the lab.
See the chapter on the MSNL Name Server. If the name server has crashed, then
report the error to someone responsible and it will be restarted. You can see
if it has crashed by using the program hostlookup <name>
where
<name>
is the symbolic name of the host which you are trying to look up.
Hostlookup will return the appropriate results if an entry is in the name
server. It will return ``no such host'' if there is no host entry for the name,
and it will report an error if it cannot find the name server.
To find the name server to use, both the libraries (Wanda & Unix) and the
hostlookup program use the environment variable MSNL_NAME_SERVERS
to
obtain the address of the name server. This environment variable MUST be
present to permit MSNL name resolution.
Any number of alternative name servers can be listed
in this variable. The format is:
Where h is the
octet in
internet dotted notation of the MSNL address of host h
. The port
0.0.0.223
is the MSNL name server port.
Information returned from the name server is returned to the calling process by means of a static area of memory. Results of such calls should thus be copied to ensure they remain valid. On Wanda there is no concurrency protection for this static memory area. Session bindings to the name server are cached.