This section on network communication in Condor discusses which network ports are used, how Condor behaves on machines with multiple network interfaces and IP addresses, and how to facilitate functionality in a pool that spans firewalls and private networks.
The security section of the manual contains some information that is relevant to the discussion of network communication which will not be duplicated here, so please see section 3.6 as well.
Firewalls, private networks, and network address translation (NAT) pose special problems for Condor. There are currently two main mechanisms for dealing with firewalls within Condor:
Each method has its own advantages and disadvantages, as described below.
Every Condor daemon listens on a network port for incoming commands. Most daemons listen on a dynamically assigned port. In order to send a message, Condor daemons and tools locate the correct port to use by querying the condor_ collector, extracting the port number from the ClassAd. One of the attributes included in every daemon's ClassAd is the full IP address and port number upon which the daemon is listening.
To access the condor_ collector itself, all Condor daemons and tools must know the port number where the condor_ collector is listening. The condor_ collector is the only daemon with a well-known, fixed port. By default, Condor uses port 9618 for the condor_ collector daemon. However, this port number can be changed (see below).
As an optimization for daemons and tools communicating with another
daemon that is running on the same host,
each Condor daemon can be configured to
write its IP address and port number into a well-known file.
The file names are controlled using the <SUBSYS>_ADDRESS_FILE
configuration variables,
as described in section 3.3.5 on
page .
NOTE: In the 6.6 stable series, and Condor versions earlier than 6.7.5, the condor_ negotiator also listened on a fixed, well-known port (the default was 9614). However, beginning with version 6.7.5, the condor_ negotiator behaves like all other Condor daemons, and publishes its own ClassAd to the condor_ collector which includes the dynamically assigned port the condor_ negotiator is listening on. All Condor tools and daemons that need to communicate with the condor_ negotiator will either use the NEGOTIATOR_ADDRESS_FILE or will query the condor_ collector for the condor_ negotiator's ClassAd.
Sites that configure any checkpoint servers will introduce other fixed ports into their network. Each condor_ cktp_server will listen to 4 fixed ports: 5651, 5652, 5653, and 5654. There is currently no way to configure alternative values for any of these ports.
CONDOR_HOST = machX.cs.wisc.edu COLLECTOR_HOST = $(CONDOR_HOST)the configuration might be
CONDOR_HOST = machX.cs.wisc.edu COLLECTOR_HOST = $(CONDOR_HOST):9650
If a non standard port is defined, the same value of COLLECTOR_HOST (including the port) must be used for all machines in the Condor pool. Therefore, this setting should be modified in the global configuration file (condor_config file), or the value must be duplicated across all configuration files in the pool if a single configuration file is not being shared.
When querying the condor_ collector for a remote pool that is running on a non standard port, any Condor tool that accepts the -pool argument can optionally be given a port number. For example:
% condor_status -pool foo.bar.org:1234
On single machine pools, it is permitted to configure the condor_ collector daemon to use a dynamically assigned port, as given out by the operating system. This prevents port conflicts with other services on the same machine. However, a dynamically assigned port is only to be used on single machine Condor pools, and only if the COLLECTOR_ADDRESS_FILE configuration variable has also been defined. This mechanism allows all of the Condor daemons and tools running on the same machine to find the port upon which the condor_ collector daemon is listening, even when this port is not defined in the configuration file and is not known in advance.
To enable the condor_ collector daemon to use a dynamically assigned port, the port number is set to 0 in the COLLECTOR_HOST variable. The COLLECTOR_ADDRESS_FILE configuration variable must also be defined, as it provides a known file where the IP address and port information will be stored. All Condor clients know to look at the information stored in this file. For example:
COLLECTOR_HOST = $(CONDOR_HOST):0 COLLECTOR_ADDRESS_FILE = $(LOG)/.collector_address
NOTE: Using a port of 0 for the condor_ collector and specifying a COLLECTOR_ADDRESS_FILE only works in Condor version 6.6.8 or later in the 6.6 stable series, and in version 6.7.4 or later in the 6.7 development series. Do not attempt to do this with older versions of Condor.
Configuration definition of COLLECTOR_ADDRESS_FILE
is in section 3.3.5 on
page ,
and
COLLECTOR_HOST
is in
section 3.3.3 on
page
.
If a Condor pool is completely behind a firewall, then no special consideration or port usage is needed. However, if there is a firewall between the machines within a Condor pool, then configuration variables may be set to force the usage of specific ports, and to utilize a specific range of ports.
By default, Condor uses port 9618 for the condor_ collector daemon, and dynamic (apparently random) ports for everything else. See section 3.7.1, if a dynamically assigned port is desired for the condor_ collector daemon.
The configuration variables HIGHPORT and LOWPORT facilitate setting a restricted range of ports that Condor will use. This may be useful when some machines are behind a firewall. The configuration macros HIGHPORT and LOWPORT will restrict dynamic ports to the range specified. The configuration variables are fully defined in section 3.3.3. All of these ports must be greater than 0 and less than 65,536. Note that both HIGHPORT and LOWPORT must be at least 1024 for Condor version 6.6.8. In general, use ports greater than 1024, in order to avoid port conflicts with standard services on the machine. Another reason for using ports greater than 1024 is that daemons and tools are often not run as root, and only root may listen to a port lower than 1024. Also, the range must include enough ports that are not in use, or Condor cannot work.
The range of ports assigned may be restricted based on incoming (listening) and outgoing (connect) ports with the configuration variables IN_HIGHPORT , IN_LOWPORT , OUT_HIGHPORT , and OUT_LOWPORT . See section 3.3.6 for complete definitions of these configuration variables. A range of ports lower than 1024 for daemons running as root is appropriate for incoming ports, but not for outgoing ports. The use of ports below 1024 (versus above 1024) has security implications; therefore, it is inappropriate to assign a range that crosses the 1024 boundary.
NOTE: Setting HIGHPORT and LOWPORT will not automatically force the condor_ collector to bind to a port within the range. The only way to control what port the condor_ collector uses is by setting the COLLECTOR_HOST (as described above).
The total number of ports needed depends on the size of the pool, the usage of the machines within the pool (which machines run which daemons), and the number of jobs that may execute at one time. Here we discuss how many ports are used by each participant in the system.
The central manager of the pool needs 5 + NEGOTIATOR_SOCKET_CACHE_SIZE ports for daemon communication, where NEGOTIATOR_SOCKET_CACHE_SIZE is specified in the configuration or defaults to the value 16.
Each execute machine (those machines running a condor_ startd daemon) requires 5 + (5 * number of virtual machines advertised by that machine) ports. By default, the number of virtual machines advertised will equal the number of physical CPUs in that machine.
Submit machines (those machines running a condor_ schedd daemon) require 5 + (5 * MAX_JOBS_RUNNING) ports. The configuration variable MAX_JOBS_RUNNING limits (on a per-machine basis, if desired) the maximum number of jobs. Without this configuration macro, the maximum number of jobs that could be simultaneously executing at one time is a function of the number of reachable execute machines.
Also be aware that HIGHPORT and LOWPORT only impact dynamic port selection used by the Condor system, and they do not impact port selection used by jobs submitted to Condor. Thus, jobs submitted to Condor that may create network connections may not work in a port restricted environment. For this reason, specifying HIGHPORT and LOWPORT is not going to produce the expected results if a user submits jobs to be executed under the PVM or MPI job universes.
Where desired, a local configuration for machines not behind a firewall can override the usage of HIGHPORT and LOWPORT, such that the ports used for these machines are not restricted. This can be accomplished by adding the following to the local configuration file of those machines not behind a firewall:
HIGHPORT = UNDEFINED LOWPORT = UNDEFINED
If the maximum number of ports allocated using HIGHPORT and LOWPORT is too few, socket binding errors of the form
failed to bind any port within <$LOWPORT> - <$HIGHPORT>are likely to appear repeatedly in log files.
Beginning with Condor version 6.1.5, Condor can run on machines with multiple network interfaces. However, starting with Condor version 6.7.13, new functionality is available that allows even better support for multi-homed machines, using the configuration variable BIND_ALL_INTERFACES. A multi-homed machine is one that has more than one NIC (Network Interface Card). Further improvements to this new functionality will remove the need for any special configuration in the common case. For now, care must still be given to machines with multiple NICs, even when using this new configuration variable.
Starting with Condor version 6.7.13, machines can be configured such that whenever Condor daemons or tools call bind(), the daemons or tools use all network interfaces on the machine. This means that outbound connections will always use the appropriate network interface to connect to a remote host, instead of being forced to use an interface that might not have a route to the given destination. Furthermore, sockets upon which a daemon listens for incoming connections will be bound to all network interfaces on the machine. This means that so long as remote clients know the right port, they can use any IP address on the machine and still contact a given Condor daemon.
To enable this functionality, the boolean configuration variable BIND_ALL_INTERFACES is defined and set to True:
BIND_ALL_INTERFACES = TRUE
This functionality has limitations, and therefore has a default value of False. Here are descriptions of the limitations.
Currently, Condor daemons can only advertise a single IP address in the ClassAd they send to their condor_ collector. Condor tools and other daemons only know how to look up a single IP address, and they attempt to use that single IP address when connecting to the daemon. So, even if the daemon is listening on 2 or more different interfaces, each with a separate IP, the daemon must choose what IP address to publicly advertise so that other daemons and tools can locate it.
By default, Condor advertises the IP address of the network interface used to contact the collector, since this is the most likely to be accessible to other processes that query the same collector. The NETWORK_INTERFACE setting can still be used to specify the IP address Condor should advertise, even if BIND_ALL_INTERFACES is set to True. Therefore, some of the considerations described below regarding what interface should be used in various situations still apply when deciding what interface is to be advertised.
Sites that make heavy use of private networks and multi-homed machines
should consider if using Generic Connection Brokering, GCB, is
right for them.
More information about GCB and Condor can be found in
section 3.7.3 on page .
Often users of Condor wish to set up ``compute farms'' where there is one machine with two network interface cards (one for the public Internet, and one for the private net). It is convenient to set up the ``head'' node as a central manager in most cases and so here are the instructions required to do so.
Setting up the central manager on a machine with more than one NIC can be a little confusing because there are a few external variables that could make the process difficult. One of the biggest mistakes in getting this to work is that either one of the separate interfaces is not active, or the host/domain names associated with the interfaces are incorrectly configured.
Given that the interfaces are up and functioning, and they have good host/domain names associated with them here is how to configure Condor:
In this example, farm-server.farm.org maps to the private interface.
On the central manager's global (to the cluster) configuration file:
CONDOR_HOST = farm-server.farm.org
On your central manager's local configuration file:
NETWORK_INTERFACE = ip address of farm-server.farm.org
NEGOTIATOR = $(SBIN)/condor_negotiator
COLLECTOR = $(SBIN)/condor_collector
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, STARTD
If your central manager and farm machines are all NT, then you only have vanilla universe and it will work now. However, if you have this setup for UNIX, then at this point, standard universe jobs should be able to function in the pool, but if you did not configure the UID_DOMAIN macro to be homogeneous across the farm machines, the standard universe jobs will run as nobody on the farm machines.
In order to get vanilla jobs and file server load balancing for standard universe jobs working (under Unix), do some more work both in the cluster you have put together and in Condor to make everything work. First, you need a file server (which could also be the central manager) to serve files to all of the farm machines. This could be NFS or AFS, it does not really matter to Condor. The mount point of the directories you wish your users to use must be the same across all of the farm machines. Now, configure UID_DOMAIN and FILESYSTEM_DOMAIN to be homogeneous across the farm machines and the central manager. Now, you will have to inform Condor that an NFS or AFS filesystem exists and that is done in this manner. In the global (to the farm) configuration file:
# If you have NFS USE_NFS = True # If you have AFS HAS_AFS = True USE_AFS = True # if you want both NFS and AFS, then enable both sets above
Now, if you've set up your cluster so that it is possible for a machine name to never have a domain name (for example: there is machine name but no fully qualified domain name in /etc/hosts), you must configure DEFAULT_DOMAIN_NAME to be the domain that you wish to be added on to the end of your host name.
If you have a client machine with two or more NICs, then there might be
a specific network interface with which you desire a client machine to
communicate with the rest of the Condor pool. In this case, in the local
configuration file for that machine, place:
NETWORK_INTERFACE = ip address of interface desired
If your Checkpoint Server is on a machine with multiple interfaces, the only way to get things to work is if your different interfaces have different host names associated with them, and you set CKPT_SERVER_HOST to the host name that corresponds with the IP address you want to use in the global configuration file for your pool. You will still need to specify NETWORK_INTERFACE in the local config file for your Checkpoint Server.
Generic Connection Brokering, or GCB, is a system for managing network connections across private network and firewall boundaries. Starting with Condor version 6.7.13, Condor's Linux releases are linked with GCB, and can use GCB functionality to run jobs (either directly or via flocking) on pools that span public and private networks.
While GCB provides numerous advantages over restricting Condor to use
a range of ports which are then opened on the firewall (see
section 3.7.1 on
page ),
GCB is also a very complicated system, with major
implications for Condor's networking and security functionality.
Therefore, sites must carefully weigh the
advantages and disadvantages
of attempting
to configure and use GCB before making a decision.
Advantages:
Disadvantages:
Given the increased complexity, use of GCB requires a careful read of this entire manual section, followed by a thorough installation.
Details of GCB and how it works can be found at the GCB homepage:
http://www.cs.wisc.edu/condor/gcb
This information is useful for understanding the technical details of how GCB works, and the various parts of the system. While some of the information is partly out of date (especially the discussion of how to configure GCB) most of the sections are perfectly accurate and worth reading. Ignore the section on ``GCBnize'', which describes how to get a given application to use GCB, as the Linux port of all Condor daemons and tools have already been converted to use GCB.
The rest of this section gives the details for configuring a Condor pool to use GCB. It is divided into the following topics:
At the heart of GCB is a logical entity known as a broker or inagent. In reality, the entity is made up of daemon processes running on the same machine comprised of the gcb_broker and a set of gcb_relay_server processes, each one spawned by the gcb_broker.
Every private network using GCB must have at least one broker to arrange connections. The broker must be installed on a machine that nodes in both the public and the private (firewalled) network can directly talk to. The broker need not be able to initiate connections to the private nodes. It can take advantage of the case where it can initiate connections to the private nodes, and that will improve performance. The broker is generally installed on a machine with multiple network interfaces (on the network boundary) or just outside of a network that allows outbound connections. If the private network contains many hosts, sites can configure multiple GCB brokers, and partition the private nodes so that different subsets of the nodes use different brokers.
For a more thorough explanation of what a GCB broker is, check out: http://www.cs.wisc.edu/~sschang/firewall/gcb/mechanism.htm
A GCB broker should generally be installed on a dedicated machine. These are machines that are not running other Condor daemons or services. If running any other Condor service (for example, the central manager of the pool) on the same machine as the GCB broker, all other machines attempting to use this Condor service (for example, to connect to the condor_ collector or condor_ negotiator) will incur additional connection costs and latency. It is possible that future versions of GCB and Condor will be able to overcome these limitations, but for now, we recommend that a broker is run on a dedicated machine with no other Condor daemons (except perhaps a single condor_ master used to spawn the gcb_broker daemon, as described below).
In principle, a GCB broker is a network element that functions almost like a router. It allows certain connections through the firewall by redirecting connections or forwarding connections. In general, it is not a good idea to run a lot of other services on the network elements, especially not services like Condor which can spawn arbitrary jobs. Furthermore, the GCB broker relies on listening to many network ports. If other applications are running on the same host as the broker, problems exist where the broker does not have enough network ports available to forward all the connections that might be required of it. Also, all nodes inside a private network rely on the GCB broker for all incoming communication. For performance reasons, avoid forcing the GCB broker to contend with other processes for system resources, such that it is always available to handle communication requests. There is nothing in GCB or Condor requiring the broker to run on a separate machine, but it is the recommended configuration.
The gcb_broker daemon listens on two hard-coded, fixed ports (65432 and 65430). A future version of Condor and GCB will remove this limitation. However, for now, to run a gcb_broker on a given host, ensure that ports 65432 and 65430 are not already in use.
If root access on a machine where a GCB broker is planned, one good option is to have initd configured to spawn (and re-spawn) the gcb_broker binary (which is located in the <release_dir>/libexec directory). This way, the gcb_broker will be automatically restarted on reboots, or in the event that the broker itself crashes or is killed. Without root access, use a condor_ master to manage the gcb_broker binary.
Since the gcb_broker and gcb_relay_server are not Condor daemons, they do not read the Condor configuration files. Therefore, they must be configured by other means, namely the environment and through the use of command-line arguments.
There is one required command-line argument for the gcb_broker. This argument defines the public IP address this broker will use to represent itself and any private network nodes that are configured to use this broker. This information is defined with -i xxx.xxx.xxx.xxx on the command-line when the gcb_broker is executed. If the broker is being setup outside the private network, it is likely that the machine will only have one IP address, which is clearly the one to use. However, if the broker is being run on a machine on the network boundary (a multi-homed machine with interfaces into both the private and public networks), be sure to use the IP address of the interface on the public network.
Additionally, specify environment variables to control how the gcb_broker (and the gcb_relay_server processes it spawns) will behave. Some of these settings can also be specified as command-line arguments to the gcb_broker. All of them have reasonable defaults if not defined.
yes
or no
, case sensitive.
GCB_ACTIVE_TO_CLIENT should be set to yes
only if
this GCB broker is running on a network boundary and can connect to
both the private and public nodes.
If the broker is running in the public network, it should be left
undefined or set to no
.
$GCB_LOG_DIR/BrokerLog
and the relay server will write to
$GCB_LOG_DIR/RelayServerLog.<pid>
$GCB_RELAY_SERVER_LOG.<pid>
where <pid>
is
replaced with the process id of the corresponding
gcb_relay_server.
When defined, this setting overrides GCB_LOG_DIR.
fulldebug
(more verbose) or basic
.
This defines logging behavior for all GCB daemons, unless
the following daemon-specific settings are defined.
There are two ways to spawn the GCB broker:
To spawn the GCB broker with a condor_ master, here are the recommended condor_config settings that will work:
# Specify that you only want the master and the broker running DAEMON_LIST = MASTER, GCB_BROKER # Define the path to the broker binary for the master to spawn GCB_BROKER = $(RELEASE_DIR)/libexec/gcb_broker # Define the path to the release_server binary for the broker to use GCB_RELAY = $(RELEASE_DIR)/libexec/gcb_relay_server # Setup the gcb_broker's environment. We use a macro to build up the # environment we want in pieces, and then finally define # GCB_BROKER_ENVIRONMENT, the setting that condor_master uses. # Initialize an empty macro GCB_BROKER_ENV = # (recommended) Provide the full path to the gcb_relay_server GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_RELAY_SERVER=$(GCB_RELAY) # (recommended) Tell GCB to write all log files into the Condor log # directory (the directory used by the condor_master itself) GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_LOG_DIR=$(LOG) # Or, you can specify a log file separately for each GCB daemon: #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_BROKER_LOG=$(LOG)/GCB_Broker_Log #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_RELAY_SERVER_LOG=$(LOG)/GCB_RS_Log # (optional -- only set if true) Tell the GCB broker that it can # directly connect to machines in the private network which it is # handling communication for. This should only be enabled if the GCB # broker is running directly on a network boundary and can open direct # connections to the private nodes. #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_ACTIVE_TO_CLIENT=yes # (optional) turn on verbose logging for all of GCB #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_DEBUG_LEVEL=fulldebug # Or, you can turn this on separately for each GCB daemon: #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_BROKER_DEBUG=fulldebug #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_RELAY_SERVER_DEBUG=fulldebug # (optional) specify the maximum log file size (in bytes) #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_MAX_LOG=640000 # Or, you can define this separately for each GCB daemon: #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_BROKER_MAX_LOG=640000 #GCB_BROKER_ENV = $(GCB_BROKER_ENV);GCB_RELAY_SERVER_MAX_LOG=640000 # Finally, set the value the condor_master really uses GCB_BROKER_ENVIRONMENT = $(GCB_BROKER_ENV) # If your Condor installation on this host already has a public # interface as the default (either because it is the first interface # listed in this machine's host entry, or because you've already # defined NETWORK_INTERFACE), you can just use Condor's special macro # that holds the IP address for this. GCB_BROKER_IP = $(ip_address) # Otherwise, you could define it yourself with your real public IP: # GCB_BROKER_IP = 123.123.123.123 # (required) define the command-line arguments for the broker GCB_BROKER_ARGS = -i $(GCB_BROKER_IP)
Once those settings are in place, either spawn or restart the condor_ master and the gcb_broker should be started. Ensure the broker is running by reading the log file specified with GCB_BROKER_LOG, or in $(LOG)/BrokerLog if using the default.
The system's initd may be used to manage the gcb_broker without running the condor_ master on the broker node, but this requires root access. Generally, this involves adding a line to the /etc/inittab file. Some sites use other means to manage and generate the /etc/inittab, such as cfengine or other system configuration management tools, so check with the local system administrator to be sure. An example line might be something like:
GB:23:respawn:/path/to/gcb_broker -i 123.123.123.123 -r /path/to/relay_server
It may be easier to wrap the gcb_broker binary in a shell script, in order to change the command-line arguments (and set environment variables) without having to edit /etc/inittab all the time. This will be similar to:
GB:23:respawn:/opt/condor-6.7.13/libexec/gcb_broker.sh
Then, create the wrapper, as similar to:
#!/bin/sh libexec=/opt/condor-6.7.13/libexec ip=123.123.123.123 relay=$libexec/gcb_relay_server exec $libexec/gcb_broker -i $ip -r $relay
You will probably also want to set some environment variables to tell the GCB daemons where to write their log files (GCB_LOG_DIR), and possibly some of the other variables described above.
Either way, after updating the /etc/inittab, send
the initd process (always PID 1) a SIGHUP
signal, and it
will re-read the inittab and spawn the gcb_broker.
In general, before configuring a node in a Condor pool to use GCB, the GCB broker node(s) for the pool must be set up and running. Set up, configure, and spawn the broker first.
To enable the use of GCB on a given Condor host, set the following Condor configuration variables:
# Tell Condor to use a network remapping service (currently only GCB # is supported, but in the future, there might be other options) NET_REMAP_ENABLE = true NET_REMAP_SERVICE = GCB
Only GCB clients within a private network need to define the following variable, which specifies the IP address of the broker serving this network. Note that this IP address must be the same as the IP address that was specified on the broker's command-line with the -i option.
# Public IP address (in standard dot notation) of the GCB broker # serving this private node. NET_REMAP_INAGENT = xxx.xxx.xxx.xxx
Obviously, because the NET_REMAP_INAGENT setting is only valid on private nodes, it should not be defined in a global Condor configuration file (condor_config). Furthermore, with a large number of hosts in a given private network, if multiple brokers are to be run to alleviate scalability issues, each subset of the private nodes that uses a specific broker will need a different value for this variable.
Finally, if setting up the recommended (but optional) GCB routing table, tell Condor daemons where to find their table. Define the following variable:
# The full path to the routing table used by GCB NET_REMAP_ROUTE = /full/path/to/GCB-routing-table
Setting NET_REMAP_ENABLE causes the
BIND_ALL_INTERFACES variable to be automatically set.
More information about this setting can be found in
section 3.7.2 on
page .
It would not hurt to place the following in the
configuration file near the other GCB-related settings,
just to remember it:
# Tell Condor to bind to all network interfaces, instead of a single # interface. BIND_ALL_INTERFACES = true
Once a GCB broker is set up and running to manage connections for a each private network, and the Condor installation for all the nodes in either private and public networks are configured to enable GCB, restart the Condor daemons, and all of the different machines should be able to communicate with each other.
By default, a GCB-enabled application will always attempt to directly connect to a given IP/port pair. In the case of a private nodes being represented by a GCB broker, the IP/port will be a proxy socket on the broker node, not the real address at each private node. When the GCB broker receives a direct connection to one of its proxy sockets, it notifies the corresponding private node, which establishes a new connection to the broker. The broker then forwards packets between these two sockets, establishing a communication pathway into the private node. This allows clients which are not linked with the GCB libraries to communicate with private nodes using a GCB broker.
This mechanism is expensive in terms of latency (time between messages) and total bandwidth (how much data can be moved in a given time period), as well as expensive in terms of the broker's system resources such as network I/O, processor time, and memory. This expensive mechanism is unnecessary in the case of GCB-aware clients trying to connect to private nodes that can directly communicate with the public host. The alternative is to contact the GCB broker's command interface (the fixed port where the broker is listening for GCB management commands), and use a GCB-specific protocol to request a connection to the given IP/port. In this case, the GCB broker will notify the private node to directly connect to the public client (technically, to a new socket created by the GCB client library linked in with the client's application), and a direct socket between the two is established, removing the need for packet forwarding between the proxy sockets at the GCB broker.
On the other hand, in cases where a direct connection from the client to a given server is possible (for example, two GCB-aware clients in the same public network attempting to communicate with each other), it is expensive and unnecessary to attempt to contact a GCB broker, and the client should connect directly.
To allow a GCB-enabled client to know if it should make a direct connection (which might involve packet forwarding through proxy sockets), or if it should use the GCB protocol to communicate with the broker's command port and arrange a direct socket, GCB provides a routing table. Using this table, an administrator can define what IP addresses should be considered private nodes where the GCB connection protocol will be used, and what nodes are public, where a direct connection (without incurring the latency of contacting the GCB broker, only to find out there is no information about the given IP/port) should be made immediately.
If the attempt to contact the GCB broker for a given IP/port fails, or if the desired port is not being managed by the broker, the GCB client library making the connection will fall back and attempt a direct connection. Therefore, configuring a GCB routing table is not required for communication to work within a GCB-enabled environment. However, the GCB routing table can significantly improve performance for communication with private nodes being represented by a GCB broker.
One confusing aspect of GCB is that all of the nodes on a private network believe that their own IP address is the address of their GCB broker. Due to this, all the Condor daemons on a private network advertise themselves with the same IP address (though the broker will map the different ports to different nodes within the private network). Therefore, a given node in the public network needs to be told that if it is contacting this IP address, it should know that the IP address is really a GCB broker representing a node in the private network, so that the public network node can contact the broker to arrange a single socket from the private node to the public one, instead of relying on forwarding packets between proxy sockets at the broker. Any other addresses, such as other public IP addresses, can be contacted directly, without going through a GCB broker. Similarly, other nodes within the same private network will still be advertising their address with their GCB broker's public IP address. So, nodes within the same private network also have to know that the public IP address of the broker is really a GCB broker, yet all other public IP addresses are valid for direct communication.
In general, all connections can be made directly, except to a host represented by a GCB broker. Furthermore, the default behavior of the GCB client library is to make a direct connection. The routing table is a (somewhat complicated) way to tell a given GCB installation what GCB brokers it might have to communicate with, and that it should directly communicate with anything else. In practice, the routing table should have a single entry for each GCB broker in the system. Future versions of GCB will be able to make use of more complicated routing behavior, which is why the full routing table infrastructure described below is implemented, even if the current version of GCB is not taking advantage of all of it.
Format of the GCB routing table
The routing table is a plain ASCII text file.
Each line of the file contains one rule.
Each rule consists of a target and a method.
The target specifies destination IP address(es) to match, and the method
defines what mechanism must be used to connect to the given target.
The target must be a valid IP address string in the standard
dotted notation, followed by a slash character (/
),
as well as an integer mask.
The mask specifies how many bits of the destination IP address
and target IP address must match.
The method must be one of the strings
GCB directGCB stops searching the table as soon as it finds a matching rule, therefore place more specific rules (rules with a larger value for the mask and without wildcards) before generic rules (rules with wildcards or smaller mask values). The default when no rule is matched is to use direct communication. Some examples and the corresponding routing tables may help clarify this syntax.
Simple GCB routing table example (1 private, 1 public)
Consider an example with
a private network that has a set of nodes whose IP
addresses are 192.168.2.*
.
Other nodes are in a public network
whose IP addresses are 123.123.123.*
.
A GCB broker for the 192
network is running on IP address 123.123.123.123
.
In this case, the routing table for both the public and private nodes
should be:
123.123.123.123/32 GCB
This rule states that for IP addresses where all 32 bits exactly match
the address 123.123.123.123
, first communicate with the GCB broker.
Since the default is to directly connect when no rule in the routing table matches a given target IP, this single rule is all that is required. However, to illustrate how the routing table syntax works, the following routing table is equivalent:
123.123.123.123/32 GCB */0 direct
Any attempt to connect to 123.123.123.123
uses GCB,
as it is the first rule in the file.
All other IP addresses
will connect directly.
This table explicitly defines GCB's default behavior.
More complex GCB routing table example (2 private, 1 public)
As a more complicated case, consider a single Condor pool that
spans one public network and two private networks.
The two separate private networks each have machines
with private addresses like 192.168.2.*
.
Identify one of these private networks as A
, and the other one
as B
.
The public network has nodes with IP addresses like
123.123.123.*
.
Assume that the GCB broker for nodes in the A
network
has IP address
123.123.123.65
,
and the GCB broker for the nodes in the B
network
has IP address
123.123.123.66
.
All of the nodes need to be able to talk to each other.
In this case, nodes in private network A
advertise
themselves as 123.123.123.65
, so any node, regardless of being
in A, B, or the public network, must treat that IP address as a GCB broker.
Similarly, nodes in private network B
advertise
themselves as 123.123.123.66
, so any node, regardless of being
in A, B, or the public network, must treat that IP address as a GCB broker.
All other connections from any node can be made directly.
Therefore, here is the appropriate routing table for all nodes:
123.123.123.65/32 GCB 123.123.123.66/32 GCB
When a message is received at a Condor daemon's command socket,
Condor authenticates based on the IP
address of the incoming socket.
For more information about this host-based security in Condor, see
section 3.6.8 on page .
Because of the way GCB changes the IP addresses that are used and
advertised by GCB-enabled clients, and since all nodes being
represented by a GCB broker are represented by different ports on the
broker node (a process known as address leasing), using GCB has
implications for this process.
Depending on the communication pathway used by a GCB-enabled Condor client (either a tool or another Condor daemon) to connect to a given Condor server daemon, and where in the network each side of the connection resides, the IP address of the resulting socket actually used will be very different. In the case of a private client (that is, a client behind a firewall, which may or may not be using NAT and a fully private, non-routable IP address) attempting to connect to a server, there are three possibilities:
Therefore, any public server that wants to allow a command from a specific client must have any or all of the various IP addresses mentioned above within the appropriate HOSTALLOW settings. In practice, that means opening up the HOSTALLOW settings to include not just the actual IP addresses of each node, but also the IP address of the various GCB brokers in use, and potentially, the public IP address of the NAT host for each private network.
However, given that all private nodes which are represented by a given GCB broker could potentially make connections to any other host using the GCB broker's IP address (whenever proxy socket forwarding is being used), if a single private node is being granted a certain level of permission within the Condor pool, all of the private nodes using the same GCB broker will have the same level of permission. This is particularly important in the consideration of granting HOSTALLOW_ADMINISTRATOR or HOSTALLOW_CONFIG privileges to a private node represented by a GCB broker.
In the case of a public client attempting to connect to a private server, there are only two possible cases:
This second case is particularly troubling. Since there are legitimate circumstances where a private server would need to use a forwarded proxy socket from its GCB broker, in general, the server should allow requests originating from its GCB broker. But, precisely because of the proxy forwarding, that implies that any client that can connect to the GCB broker would be allowed into the private server (if IP-based authorization was the only defense).
The final host-based security setting that requires special mention is HOSTALLOW_NEGOTIATOR . If the condor_ negotiator for the pool is running on a private node being represented by a GCB broker, there must be modifications to the default value. For the purposes of Condor's host-based security, the condor_ negotiator acts as a client when communicating with each condor_ schedd in the pool which has idle jobs that need to be matched with available resources. Therefore, all the possible cases of a private client attempting to connect to a given server apply to a private condor_ negotiator. In practice, that means adding the public IP address of the broker, the real private IP address of the negotiator host, and possibly the public IP address of the NAT host for this private network to the HOSTALLOW_NEGOTIATOR setting. Unfortunately, this implies that any host behind the same NAT host or using the same GCB broker will be authorized as if it was the condor_ negotiator.
Future versions of GCB and Condor will hopefully add some form of authentication and authorization to the GCB broker itself, to help alleviate these problems. Until then, sites using GCB are encouraged to use GSI strong authentication (since Kerberos also depends on IP addresses and is therefore incompatible with GCB) to rely on an authorization system that is not affected by address leasing. This is especially true for sites that (foolishly) choose to run their central manager on a private node.
Using GCB and address leasing has implications for Condor configuration settings outside of the Host/IP-based security settings. Each is described.
However, because the condor_ collector is listening on a fixed port, and that single port is reserved on the GCB broker node, no two private nodes using the same broker can attempt to use the same port for their condor_ collector. Therefore, any site that is attempting to set up multiple pools within the same private network is strongly encouraged to set up separate GCB brokers for each pool. Otherwise, one or both of the pools must use a non-standard port for the condor_ collector, which adds yet more complication to an already complicated situation.
KERBEROS
may not be used for authentication
on a GCB-enabled pool.
The IP addresses used in various
circumstances will not be the real IP addresses of the machines.
Since Kerberos stores the IP address of each host as part of the
Kerberos ticket, authentication will fail on a GCB-enabled
pool.
Due to the complications and security limitations that arise from running a central manager on a private node represented by GCB (both regarding the COLLECTOR_HOST and HOSTALLOW_NEGOTIATOR), we recommend that sites avoid locating a central manager on a private host whenever possible.
TCP sockets are reliable, connection-based sockets that guarantee the delivery of any data sent. However, TCP sockets are fairly expensive to establish, and there is more network overhead involved in sending and receiving messages.
UDP sockets are datagrams, and are not reliable. There is very little overhead in establishing or using a UDP socket, but there is also no guarantee that the data will be delivered. All previous Condor versions used UDP sockets to send updates to the condor_ collector, and this did not cause problems.
Beginning with version 6.5.0, Condor can be configured to use TCP sockets to send updates to the condor_ collector instead of UDP datagrams. It is not intended for most sites. This feature is targeted at sites where UDP updates are lost because of the underlying network. Most Condor administrators that believe this is a good idea for their site are wrong. Do not enable this feature just because it sounds like a good idea. The only cases where an administrator would want this feature are if the ClassAd updates are consistently not getting to the condor_ collector. An example where this may happen is if the pool is comprised of machines across a wide area network (WAN) where UDP packets are frequently dropped.
Configuration variables are set to enable the use of TCP sockets. There are two variables that an administrator must define to enable this feature:
The use of a cache allows Condor to leave established TCP sockets open, facilitating much better performance. Subsequent updates can reuse an already open socket. The work to establish a TCP connection may be lengthy, including authentication and setting up encryption. Therefore, Condor requires that a socket cache be defined if TCP updates are to be used. TCP updates will be refused by the condor_ collector daemon if a cache is not enabled.
Each Condor daemon will have 1 socket open to the condor_ collector. So, in a pool with N machines, each of them running a condor_ master, condor_ schedd, and condor_ startd, the condor_ collector would need a socket cache that has at least 3*N entries. Machines running Personal Condor in the pool need an additional two entries (for the condor_ master and condor_ schedd) for each Personal Condor installation.
Every cache entry utilizes a file descriptor within the condor_ collector daemon. Therefore, be careful not to define a cache that is larger than the number of file descriptors the underlying operating system allocates for a single process.
NOTE: At this time, UPDATE_COLLECTOR_WITH_TCP, only affects the main condor_ collector for the site, not any sites that a condor_ schedd might flock to.