Next: condor_ history
Up: 9. Command Reference Manual
Previous: condor_ findhost
Contents
Index
Subsections
condor_ glidein
add a Globus resource to a Condor pool
condor_ glidein
[-help]
[-admin address]
[-anybody]
[-archdir dir]
[-basedir basedir]
[-count CPUcount]
[-genconfig]
[-genstartup]
[-gensubmit]
[-idletime minutes]
[-localdir dir]
[-memory MBytes]
[-project name]
[-queue name]
[-runtime minutes]
[-runonly]
[-setuponly]
[-setup_here]
[-setup_jobmanager jobmanager]
[-scheduler name]
[-suffix suffix]
[-useconfig filename]
[-usestartup filename]
[-vms VMcount]
{-contactfile filename } | Globus contact string
condor_ glidein allows the temporary addition of a Globus resource to
a local Condor pool.
The addition is accomplished by installing and executing some of the Condor
daemons on the Globus resource.
A glidein_startup job appears in the queue of the local
Condor pool for each glidein request.
To remove the Globus resource from the local Condor pool,
use condor_ rm to remove the glidein_startup job from
the job queue.
You must have an X.509 certificate and access
to the Globus resource to use condor_ glidein.
The Globus software must also be installed.
Globus is a software system that provides uniform access to
different high-performance computing resources.
When specifying a machine to use with Globus,
you provide a Globus contact string.
Often, the contact string can be just the hostname of the machine.
Sometimes, a more complicated contact string is required.
For example, if a machine has multiple schedulers (ways to run a job),
the contact string may need to specify which to use.
See the Globus home page, http://www.globus.org/
for more
information about Globus.
condor_ glidein works in two steps: set up and execution.
During set up, a configuration file, startup file, and the Condor daemons
master, startd and starter are installed on the Globus
resource.
Binaries for the correct architecture are copied from a central server.
To obtain access to the server,
or to set up your own server, follow
instructions on the Glidein Server Setup page,
at http://www.cs.wisc.edu/condor/glidein.
Set up need only be done once per site.
The execution step starts the Condor daemons running through
the resource's Globus interface.
By default, all files placed on the remote machine are placed in
$(HOME)/Condor_glidein (or whatever -basedir is
defined to be). It is assumed that this directory is shared by all of
the machines that will be running the glidein Condor daemons. By
default, the daemon log files will also be written into this area, but
you are encouraged to change this (e.g. with -localdir) to make
them write to local scratch space on the execution machine. However,
for debugging initial problems, it may be convenient to have the log
files in a more accessible place. If you do leave the default
setting alone, you should at least occasionally clean up old
log and execute directories left behind by glideins or you may eventually
run out of space.
To setup and run 10 glideins under PBS on a grid site with a
gatekeeper named gatekeeper.site.edu:
% condor_glidein -count 10 gatekeeper.site.edu/jobmanager-pbs
If you try something like the above and condor_ glidein is not able to
automatically determine everything it needs to know about the remote site,
it will ask you to provide more information. A typical result of this
process is something like the following command:
% condor_glidein \
-count 10 \
-arch 6.6.7-i686-pc-Linux-2.4 \
-setup_jobmanager jobmanager-fork \
gatekeeper.site.edu/jobmanager-pbs
You may use condor_ q to see the glidein jobs that have been
submitted. Once they successfully run, you may see them join your
Condor pool by using condor_ status.
See the list of common problems and solutions near the end of this section
if you have trouble getting the system to work.
- -help
- Display brief usage information and exit
- -basedir basedir
- Specifies the base directory on the Globus resource
used for placing files.
The default file is $(HOME)/Condor_glidein
on the Globus resource.
- -archdir dir
- Specifies the directory on the Globus resource for placement
of the executables.
The default value for -archdir ,
given according to version information on the Globus resource, is
basedir/<condor-version>-<Globus canonicalsystemname>
An example of the directory
(without the base directory on the Globus resource)
for Condor version 6.1.13
running on a Sun Sparc machine with Solaris 2.6 is
6.1.13-sparc-sun-solaris-2.6
- -localdir dir
- Specifies the directory on the Globus resource
in which to create log and execution
subdirectories needed by Condor.
If limited disk quota in the home or base directory
on the Globus resource is a problem,
set -localdir to a large temporary space,
such as /tmp or /scratch. If the batchsystem
makes glidein start up in a temporary scratch directory,
you can use `.' for -localdir.
- -contactfile filename
- Allows the use of a file of Globus contact strings,
rather than the single
Globus contact string given in the command line.
For each of the contacts listed in the file,
the Globus resource is added to the local Condor pool.
- -runonly
- Starts execution of the Condor daemons on the Globus
resource.
If any of the files are missing, exits with an error code.
This option cannot be run simultaneously with -setuponly
- -run_here
- Runs condor_ master directly rather than submitting it to
Condor-G for remote execution. To instead generate a script that
does this, use -run_here in combination with
-gensubmit. This may be useful for running Glidein on
resources that are not directly accessible to Condor-G.
- -setuponly
- Performs only the placement of files on the Globus resource.
This option cannot be run simultaneously with -runonly
- -setup_here
- Runs the setup process directly instead of submitting a setup
job to the remote Globus resource. For example, this may
be used to install glidein in an AFS area that is read-only
from the remote Globus resource.
- -setup_jobmanager jobmanager-fork
- Jobmanager to use for running Glidein setup process. If a
readonable default can be discovered through MDS, this is optional.
- -arch architecture
- Identifies the glidein tarball to download and install. If a
readonable default can be discovered through MDS, this is
optional. A list of possible values may be found here:
http://www.cs.wisc.edu/condor/glidein/binaries. The architecture
name is the same as the tarball name minus the tar.gz. For
example: 6.6.5-i686-pc-Linux-2.4
- -scheduler name
- Selects the Globus job scheduler type.
Defaults to fork.
NOTE: Contact strings which already contain the scheduler
type will not be overridden by this option.
- -queue name
- The argument name is a string which specifies which
job queue is to be used for submission on the Globus resource.
- -project name
- The argument name is a string which specifies which
project is to be used for submission on the Globus resource.
- -memory MBytes
- The maximum memory size to request from the Globus resource
(in megabytes).
- -count CPUcount
- Number of CPUs to request, default is 1.
- -vms VMcount
- For machines with multiple CPUs, the CPUs maybe divided
up into virtual machines. VMcount is the number
of virtual machines that results.
By default, Condor divides multiple-CPU resources such that
each CPU is a virtual machine, each with an equal share of RAM,
disk, and swap space.
This option configures the number of virtual machines, so that
multi-threaded jobs can run in a virtual machine with multiple
CPUs.
For example, if 4 CPUs are requested and
-vms is not specified, Condor will
divide the request up into 4 virtual machines with 1 CPU each.
However, if -vms 2 is specified,
Condor will divide the request up into 2 virtual machines with
2 CPUs each, and if -vms 1 is
specified, Condor will put all 4 CPUs into one virtual
machine.
- -idletime minutes
- How long the Condor daemons on the Globus resource
can remain idle before the resource reverts back to its
former state of not being part of the local Condor pool.
If the value is 0 (zero), the resource will not
revert back to its former state.
In this case,
the Condor daemons will run until the runtime time expires,
or they are killed by the resource or
with condor_ rm.
The default value is 20 minutes.
- -runtime minutes
- How long the Condor daemons on the Globus resource
will run before shutting themselves down. This option is useful
for resources with enforced maximum run times. Setting
runtime to be a few minutes shorter than the allowable
limit gives the daemons time to perform a graceful shutdown.
- -anybody
- Sets the Condor START expression to TRUE
to allow any user job which meets the job's requirements to run on
the Globus resource added to the local Condor pool.
Without this option, only jobs owned by the user executing
condor_ glidein can execute on the Globus resource. WARNING:
Using this option may violate the usage policies of many
institutions.
- -admin address
- Where to send e-mail with problems.
The defaults is the login of the user running
condor_ glidein at UID domain of the local Condor pool.
- -genconfig
- This option creates a local copy of the configuration file used
on the Globus resource.
The file is called glidein_condor_config{suffix}.
You may edit this file and use -useconfig to install your
modified config.
- -useconfig config_file
- This option makes the setup process copy the config file
you specify, rather than generating one from scratch.
- -genstartup
- This option creates a local copy of the startup script used
on the Globus resource when condor_ master runs.
The file is called glidein_startup{suffix}.
You may edit this file and use -usestartup to install your
modified config.
- -usestartup startup_file
- This option makes the setup process copy the startup script
you specify, rather than generating one from scratch.
- -suffixX
- Suffix to use when generating files. Default is process id.
- -gsi_daemon_name cert_name
- Using this option turns on GSI authentication in the glidein
configuration. The argument to this option is the GSI
certificate name that the glidein daemons will use to authenticate
themselves. It should be set to whatever certificate name
you will use to execute the glideins.
- -install_gsi_trusted_ca_dir path
- Using this option turns on GSI authentication in the glidein
configuration. The argument to this option is the path
to the trusted CA certificates that you wish the glidein
daemons to use (e.g. /etc/grid-security/certificates). The
contents of this directory will be installed at the remote
site in -basedir/grid-security.
- -install_gsi_gridmap file
- Using this option turns on GSI authentication in the glidein
configuration. The argument to this option is the filename
of the grid-mapfile that you wish the glidein daemons to
use. The file will be installed at the remote site
in -basedir/grid-security. The file should contain
entries mapping grid-certificates to user names. At the
very least, it must contain an entry for the certificate
-gsi_daemon_name . If your other Condor
daemons use different certificates, then this file should
also mention any certificates that the glidein daemons
will encounter (schedd, collector, and negotiator). See
section 3.6.3 for more information.
condor_ glidein will exit with a status value of 0 (zero) upon
complete success.
The script exits with non-zero values upon failure.
The status value will be 1 (one) if
condor_ glidein encountered an error making a directory,
was unable to copy a tar file,
encountered an error in parsing the command line,
or was not able to gather required information.
The status value will be 2 (two) if
there was an error in the remote set up.
The status value will be 3 (three) if
there was an error in remote submission.
The status value will be -1 (negative one) if
no resource was specified in the command line.
Common problems are listed below. Many of these are best discovered by
looking in the remove StartLog in the glidein ``localdir''.
- WARNING: The file xxx is not writable by condor
- This happens if you
run condor_ glidein from a directory that does not have the right
permissions for Condor to access files. If you are in an AFS directory,
keep in mind that Condor does not have your AFS ACLs.
- Glideins fail to run due to GLIBC errors
- Check the list of
available glidein binaries
(http://www.cs.wisc.edu/condor/glidein/binaries) and try setting
up glidein with an architecture name that includes the correct glibc
version for the remote site.
- Glideins join pool but no jobs run on them
- One common
cause of this problem is that the glidein machines are in a different
filesystem domain and your jobs have been submitted with an implicit
requirement that they must run in the same filesystem domain. If this
is your problem, see section 2.5.4 for details on
using Condor's file-transfer capabilities. Another cause of this
problem is a communication failure. For example, a firewall may be
preventing condor_ negotiator or condor_ schedd from connecting to
the glidein condor_ startd. Although work is being done to remove
this requirement in the future, it is currently necessary to have full
bi-directional connectivity, at least over a restricted range of
ports. See page
for more information on
configuring a port range.
- Glideins run but fail to join the pool
- This may be caused by
your pool's security settings or by a communication failure. Check
that the security settings in your pool's Condor config file allow
write access to the glidein machines. If you do not wish to modify
the security settings for the pool, you can run a separate pool
specifically for the glideins and use flocking to balance jobs across
the two pools of resources. If instead the glidein daemon log files
indicate a communication failure, then see the next item.
- The startd cannot connect to the collector
- This may be caused
by several things. One is a firewall. Another is when the compute
nodes do not have even outgoing network access. Configuring Glidein
to work without full network access to and from the compute nodes is
still in the experimental stages, so for now, the short answer is that
you must at least have a range of open (bi-directional) ports and set
up the glidein config file as described on
page
. (Use -genconfig, edit the file,
and then use -useconfig.)
Another possible cause of connectivity problems may be the use of UDP by
the condor_ startd to register itself with the collector. You can
force it to use TCP as described on
page
.
Yet another possible cause of connectivity problems is when the glidein
machines have more than one network interface and the default one chosen
by Condor is not the correct one. One way to fix this is to modify
the glidein startup script (using -genstartup and -usestartup).
The script simply needs to determine the IP address associated with
the correct network interface and assign this to the environment
variable _condor_NETWORK_INTERFACE.
- NFS file locking problems
- If you have the -localdir
configured to be on NFS (not recommended, but sometimes convenient
for testing), the Condor daemons may have trouble manipulating file
locks. You may insert the following into the Glidein config file:
IGNORE_NFS_LOCK_ERRORS = True
Condor Team, University of Wisconsin-Madison
Copyright © 1990-2006 Condor Team, Computer Sciences Department,
University of Wisconsin-Madison, Madison, WI. All Rights Reserved.
No use of the Condor Software Program is authorized
without the express consent of the Condor Team. For more information
contact: Condor Team, Attention: Professor Miron Livny,
7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685,
(608) 262-0856 or miron@cs.wisc.edu.
U.S. Government Rights Restrictions: Use, duplication, or disclosure
by the U.S. Government is subject to restrictions as set forth in
subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer
Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and
(2) of Commercial Computer Software-Restricted Rights at 48 CFR
52.227-19, as applicable, Condor Team, Attention: Professor Miron
Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison,
WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.
See the Condor Version 6.8.3 Manual for
additional notices.
Next: condor_ history
Up: 9. Command Reference Manual
Previous: condor_ findhost
Contents
Index
condor-admin@cs.wisc.edu