Next: condor_ submit_dag
Up: 9. Command Reference Manual
Previous: condor_ store_cred
Contents
Index
Subsections
condor_ submit
Queue jobs for execution under Condor
condor_ submit
[-verbose]
[-name schedd_name]
[-remote schedd_name]
[-pool pool_name]
[-disable]
[-password passphrase]
[-debug]
[-append command ... ]
[-spool]
[submit description file]
condor_ submit is the program for submitting jobs for execution
under Condor.
condor_ submit requires a submit description file which contains commands
to direct the queuing of jobs.
One submit description file may contain
specifications for the queuing of many Condor jobs at once.
A single invocation of condor_ submit may cause one or
more clusters.
A cluster is a set of jobs
specified in the submit description file
between queue commands for which the executable is not changed.
It is advantageous to submit
multiple jobs as a single cluster because:
- Only one copy of the checkpoint file is needed to
represent all jobs in a cluster until they begin execution.
- There is much less overhead involved for Condor to start the next
job in a cluster than for Condor to start a new cluster. This can make
a big difference when submitting lots of short jobs.
Multiple clusters may be specified within a single
submit description file.
Each cluster must specify a single executable.
The job ClassAd attribute ClusterId identifies a cluster.
See section 2.5.2 for specifics on this attribute.
Note that submission of jobs from a Windows machine requires
a stashed password to allow Condor to impersonate the user submitting
the job.
To stash a password, use the condor_ store_cred command.
See the manual page at
page
for details.
SUBMIT DESCRIPTION FILE COMMANDS
Each submit description file describes one cluster of jobs to be
placed in the Condor execution pool. All jobs in a cluster must share
the same executable, but they may have different input and output files,
and different program arguments. The submit description file is
the only command-line argument to condor_ submit.
The submit description file must contain one executable command and at least one
queue command. All of the other commands have default actions.
The commands which can appear in the submit description file are
numerous. They are listed here in alphabetical order by category.
BASIC COMMANDS
- arguments = <argument_list>
-
List of arguments to be supplied
to the program on the command line. In the Java Universe, the first
argument must be the name of the class containing main.
There are two permissible formats for specifying arguments. The new
syntax supports uniform quoting of spaces within arguments; the old
syntax supports spaces in arguments only in special circumstances.
In the old syntax, arguments are delimited (separated) by space
characters. Double-quotes must be escaped with a backslash (i.e. put
a backslash in front of each double-quote).
Further interpretation of the argument string differs depending on the
universe and operating system. On Windows, your argument string is
simply passed verbatim (other than the backslash in front of
double-quotes) to the windows application. Most Windows applications
will allow you to put spaces within an argument value by surrounding
the argument with double-quotes. In the grid universe, arguments are
passed verbatim (other than the backslash in front of double-quotes)
into the RSL string used by Globus. See
section 5.3.2 for further details. In all
other cases, there is no further interpretation of the arguments.
Example:
arguments = one \"two\" 'three'
Produces in unix vanilla universe:
argument 1: one
argument 2: "two"
argument 3: 'three'
Here are the rules for using the new syntax:
- Put double quotes around the entire argument string. This
distinguishes the new syntax from the old, because these double-quotes
are not escaped with backslashes, as required in the old syntax. Any
literal double-quotes within the string must be escaped by repeating
them.
- Use whitespace (e.g. spaces or tabs) to separate arguments.
- To put any whitespace in an argument, you must surround the
space and as much of the surrounding argument as you like with
single-quotes.
- To insert a literal single-quote, you must repeat it anywhere
inside of a single-quoted section.
Example:
arguments = "one ""two"" 'spacey ''quoted'' argument'"
Produces:
argument 1: one
argument 2: "two"
argument 3: spacey 'quoted' argument
Notice that in the new syntax, backslash has no special meaning. This
is for the convenience of Windows users.
- environment = <parameter_list>
- List of environment
variables.
There are two different formats for specifying the environment
variables: the old format and the new format. The old format is
retained for backward-compatibility. It suffers from a
platform-dependent syntax and the inability to insert some special
characters into the environment.
The new syntax for specifying environment values:
- Put double quote marks around the entire argument string. This
distinguishes the new syntax from the old.
The old syntax does not have double quote marks around it.
Any literal double quote marks within the string
must be escaped by repeating the double quote mark.
- Each environment entry has the form
<name>=<value>
- Use whitespace (space or tab characters) to separate environment entries.
- To put any whitespace in an environment entry, surround
the space and as much of the surrounding entry as desired with
single quote marks.
- To insert a literal single quote mark, repeat the single quote mark
anywhere inside of a section surrounded by single quote marks.
Example:
environment = "one=1 two=""2"" three='spacey ''quoted'' value'"
Produces the following environment entries:
one=1
two="2"
three=spacey 'quoted' value
Under the old syntax, there are no double quote marks surrounding the
environment specification. Each environment entry remains of the form
<name>=<value>
Under Unix, list multiple environment entries by separating them with
a semicolon (;). Under Windows, separate multiple entries with a
vertical bar (| ). There is no way to insert a literal semicolon
under Unix or a literal vertical bar under Windows. Note that spaces
are accepted, but rarely desired, characters within parameter names
and values, because they are treated as literal characters, not
separators or ignored whitespace. Place spaces within the parameter
list only if required.
A Unix example:
environment = one=1;two=2;three="quotes have no 'special' meaning"
This produces the following:
one=1
two=2
three="quotes have no 'special' meaning"
- error = <pathname>
-
A path and file name used by Condor to capture any
error messages the program would normally write to the screen
(that is, this file becomes stderr).
If not specified, the default value of
/dev/null is used for submission to a Unix machine.
If not specified, error messages are ignored
for submission to a Windows machine.
More than one job should not use the same error file, since
this will cause one job to overwrite the errors of another.
The error file and the output file should not be the same file
as the outputs will overwrite each other or be lost.
For grid universe jobs, error may be a URL that the Globus
tool globus_url_copy understands.
- executable = <pathname>
-
An optional path and a required file name of the executable file for this
job cluster. Only one executable command within a
submit description file is guaranteed to work properly.
More than one often works.
If no path or a relative path is used, then the executable file
is presumed to be relative
to the current working directory of the user as the
condor_ submit command is issued.
If submitting into the standard universe (the default),
then the named executable must have been re-linked with the Condor
libraries (such as via the condor_ compile command). If submitting into
the vanilla universe, then the named executable need not be re-linked and
can be any process which can run in the background (shell scripts work
fine as well). If submitting into the Java universe, then the argument
must be a compiled .class file.
- getenv = <True | False>
- If getenv is set to
True, then condor_ submit will copy all of the user's current
shell environment variables at the time of job submission into the job
ClassAd. The job will therefore execute with the same set of environment
variables that the user had at submit time. Defaults to False.
- input = <pathname>
-
Condor assumes that its jobs are
long-running, and that the user will not wait at the terminal for their
completion. Because of this, the standard files which normally access
the terminal, (stdin, stdout, and stderr),
must refer to files. Thus,
the file name specified with input should contain any keyboard
input the program requires (that is, this file becomes stdin).
If not specified, the default value
of /dev/null is used for submission to a Unix machine.
If not specified, input is ignored
for submission to a Windows machine.
For grid universe jobs, input may be a URL that the Globus
tool globus_url_copy understands.
Note that this command does not refer to the command-line
arguments of the program. The command-line arguments are specified by
the arguments command.
- log = <pathname>
- Use log to specify a file name where
Condor will write a log file of what is happening with this job cluster.
For example, Condor will place a log entry into this file
when and where the job begins running,
when the job produces a checkpoint, or moves (migrates) to another machine,
and when the job completes.
Most users find specifying a log file to be handy;
its use is recommended. If no log entry is specified,
Condor does not create a log for this cluster.
- log_xml = <True | False>
-
If log_xml is True,
then the log file will be written in ClassAd XML.
If not specified, XML is not used.
Note that the file is an XML fragment; it is
missing the file header and footer.
Do not mix XML and non-XML within a single file.
If multiple jobs write to a
single log file, ensure that all of the jobs specify
this option in the same way.
- notification = <Always | Complete | Error | Never>
-
Owners of Condor jobs are notified by
e-mail when certain events occur.
If defined by Always, the owner will be notified
whenever the job produces a checkpoint, as well as when the job completes.
If defined by Complete (the default), the owner will
be notified when the job terminates.
If defined by Error, the owner will only be notified
if the job terminates abnormally.
If defined by Never, the owner will not receive e-mail,
regardless to what happens to the job.
The statistics included in the e-mail are documented in
section 2.6.7 on
page
.
- notify_user = <email-address>
-
Used to specify the e-mail
address to use when Condor sends e-mail about a job. If not specified,
Condor defaults to using the e-mail address defined by
job-owner@UID_DOMAIN
where the configuration variable UID_DOMAIN
is specified by the Condor site administrator.
If UID_DOMAIN has not been specified,
Condor sends the e-mail to:
job-owner@submit-machine-name
- output = <pathname>
-
The output file captures
any information the program would ordinarily write to the screen
(that is, this file becomes stdout).
If not specified, the default value of
/dev/null is used for submission to a Unix machine.
If not specified, output is ignored
for submission to a Windows machine.
Multiple jobs should not use the same output
file, since this will cause one job to overwrite the output of
another.
The output file and the error file should not be the same file
as the outputs will overwrite each other or be lost.
For grid universe jobs, output may be a URL that the Globus
tool globus_url_copy understands.
Note that if a program explicitly opens and writes to a file,
that file should not be specified as the output file.
- priority = <integer>
-
A Condor job priority
can be any integer, with 0 being the default.
Jobs with higher numerical priority will
run before jobs with lower numerical priority. Note that this priority
is on a per user basis.
One user with many jobs may use this command
to order his/her own jobs,
and this will have no effect on whether or
not these jobs will run ahead of another user's jobs.
- queue [number-of-procs]
- Places one or more
copies of the job into the Condor queue.
The optional
argument number-of-procs specifies how many times to submit the
job to the queue, and it defaults to 1.
If desired, any commands may be placed
between subsequent queue commands, such as new input,
output, error, initialdir,
or arguments commands.
This is handy when submitting multiple runs into one cluster with
one submit description file.
- universe = <vanilla | standard | pvm | scheduler
| local | grid | mpi | java>
-
Specifies which Condor Universe to use when running this job. The Condor
Universe specifies a Condor execution environment. The standard
Universe is the default (except where the configuration variable
DEFAULT_UNIVERSE defines it otherwise),
and tells Condor that this job has been re-linked
via condor_ compile with the Condor libraries and therefore supports
checkpointing and remote system calls. The vanilla Universe is an
execution environment for jobs which have not been linked with the
Condor libraries. Note: Use the vanilla Universe to
submit shell scripts to Condor. The pvm Universe is for a
parallel job written with PVM 3.4. The scheduler is for a job that
should act as a metascheduler.
The grid universe forwards the job to an external job
management system.
Further specification of the grid universe is done with the
grid_resource command.
The mpi universe is
for running mpi jobs made with the MPICH package.
The java Universe is for programs written to the Java Virtual Machine.
COMMANDS FOR MATCHMAKING
- rank = <ClassAd Float Expression>
-
A ClassAd Floating-Point
expression that states how to rank machines which have already met the requirements
expression. Essentially, rank expresses preference. A higher numeric value
equals better rank. Condor will give the job the machine with the
highest rank. For example,
requirements = Memory > 60
rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory
and give to the job the machine with the most amount of memory.
See section 2.5.2
within the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
- requirements = <ClassAd Boolean Expression>
-
The requirements
command is a boolean ClassAd expression which uses C-like operators. In
order for any job in this cluster to run on a given machine, this
requirements expression must evaluate to true on the given machine. For
example, to require that whatever machine executes a Condor job has a
least 64 Meg of RAM and has a MIPS performance rating greater than 45,
use:
requirements = Memory >= 64 && Mips > 45
Only one requirements command may be present in a
submit description file.
By default, condor_ submit appends the following clauses to
the requirements expression:
- Arch and OpSys are set equal to the Arch and OpSys of the
submit machine. In other words: unless you request otherwise, Condor will give your
job machines with the same architecture and operating system version as
the machine running condor_ submit.
- Disk >= DiskUsage.
The DiskUsage attribute is initialized to the size of the
executable plus the size of any files specified in a
transfer_input_files command.
It exists to ensure there is enough disk space on the
target machine for Condor to copy over both the executable
and needed input files.
The DiskUsage attribute represents the maximum amount of
total disk space required by the job in kilobytes.
Condor automatically updates the DiskUsage attribute
approximately every 20 minutes while the job runs with the
amount of space being used by the job on the execute machine.
- (Memory * 1024) >= ImageSize. To ensure the target machine
has enough memory to run your job.
- If Universe is set to Vanilla, FileSystemDomain is set equal to
the submit machine's FileSystemDomain.
View the requirements of a job
which has already been submitted (along with everything else about the
job ClassAd) with the command condor_ q -l; see the command reference for
condor_ q on page
. Also, see the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
FILE TRANSFER COMMANDS
- should_transfer_files = <YES | NO | IF_NEEDED >
-
The should_transfer_files setting is used to define if Condor
should transfer files to and from the remote machine where the job
runs.
The file transfer mechanism is used to run jobs which are not in the
standard universe (and can therefore use remote system calls for file
access) on machines which do not have a shared file system with the
submit machine.
should_transfer_files equal to YES will cause Condor to
always transfer files for the job.
NO disables Condor's file transfer mechanism.
IF_NEEDED will not transfer files for the job if it is matched
with a resource in the same FileSystemDomain as the submit
machine (and therefore, on a machine with the same shared file
system).
If the job is matched with a remote resource in a different
FileSystemDomain, Condor will transfer the necessary files.
If defining should_transfer_files you must also
define when_to_transfer_output (described below).
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
Note that should_transfer_files is not supported
for jobs submitted to the grid universe.
- stream_error = <True | False>
-
If True, then stderr is streamed back to
the machine from which the job was submitted.
If False, stderr is stored locally
and transferred back when the job completes.
This command is ignored if the job ClassAd attribute
TransferErr is
False.
The default value is True in the grid
universe and False otherwise.
- stream_input = <True | False>
-
If True, then stdin is streamed from the
machine on which the job was submitted.
The default value is False.
The command is only relevant for jobs submitted to
the vanilla or java universes, and
it is ignored by the grid
universe.
- stream_output = <True | False>
-
If True, then stdout is streamed back to
the machine from which the job was submitted.
If False, stdout is stored locally
and transferred back when the job completes.
This command is ignored if the job ClassAd attribute
TransferOut is
False.
The default value is True in the grid
universe and False otherwise.
- transfer_executable = <True | False>
-
This command is applicable to jobs submitted to the grid,
vanilla, and MPI universes.
If transfer_executable is set to
False, then Condor looks for the executable on the remote machine, and
does not transfer the executable over.
This is useful for an already pre-staged
executable; Condor behaves more like rsh.
The default value is True.
- transfer_input_files = < file1,file2,file... >
-
A comma-delimited list of all the files to be transferred into the
working directory for the job before the job is started.
By default, the file specified in the
executable command and any file specified in the input
command (for example, stdin) are transferred.
Only the transfer of files is available; the transfer of
subdirectories is not supported.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
- transfer_output_files = < file1,file2,file... >
-
This command forms an explicit list of output files to be transferred
back from the temporary working directory on the execute machine to
the submit machine.
Most of the time, there is no need to use this command.
Other than for grid universe jobs,
if transfer_output_files is not specified,
Condor will automatically transfer back all files in the job's
temporary working directory which have been
modified or created by the job.
This is usually the desired behavior.
Explicitly listing output files is typically only done when the job creates
many files, and the user wants to keep a subset of
those files.
If there are multiple files, they must be delimited with commas.
WARNING: Do not specify transfer_output_files in the
submit description file unless there is a really good reason - it is
best to let Condor figure things out by itself based upon what
the job produces.
For grid universe jobs,
to have files other than standard output and standard error transferred
from the execute machine back to the submit machine,
do use transfer_output_files, listing
all files to be transferred.
These files are found on the execute machine in the
working directory of the job.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
- transfer_output_remaps = < `` name = newname ; name2 = newname2 ... ''>
-
This specifies the name (and optionally path) to use when downloading output
files from the completed job. Normally, output files are transferred back
to the initial working directory with the same name they had in the execution
directory. This gives you the option to save them with a different path
or name. If you specify a relative path, the final path will be relative
to the job's initial working directory.
name describes an output file name produced by your job, and
newname describes the file name it should be downloaded to.
Multiple remaps can be specified by separating each with a semicolon.
If you wish to remap file names that contain equals signs or
semicolons, these special characters may be escaped with a backslash.
- when_to_transfer_output = < ON_EXIT | ON_EXIT_OR_EVICT >
-
Setting when_to_transfer_output equal to ON_EXIT will
cause Condor to transfer the job's output files back to the submitting
machine only when the job completes (exits on its own).
The ON_EXIT_OR_EVICT option is intended for fault tolerant
jobs which periodically save their own state and can restart where
they left off.
In this case, files are spooled to the submit machine any time the
job leaves a remote site, either because it exited on its own, or was
evicted by the Condor system for any reason prior to job completion.
The files spooled back are placed in a directory defined by
the value of the SPOOL configuration variable.
Any output files transferred back to the submit machine are
automatically sent back out again as input files if the job restarts.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
POLICY COMMANDS
- hold = <True | False>
- If hold is set to
True, then the job will be submitted in the hold state. Jobs in
the hold state will not run until released by condor_ release.
Defaults to false.
- leave_in_queue = <ClassAd Boolean Expression>
-
When the ClassAd Expression evaluates to True, the job is
not removed from the queue upon completion.
The job remains in the queue until the user runs condor_ rm
to remove the job from the queue.
This allows the user of a remotely spooled job to retrieve output
files in cases where Condor would have removed them as part of
the cleanup associated with completion.
Defaults to False.
- on_exit_hold = <ClassAd Boolean Expression>
- This expression
is checked when the job exits and if true, places the job on hold. If false
then nothing happens and the on_exit_remove expression is
checked to determine if that needs to be applied.
For example:
Suppose a job is known to run for a minimum of an hour.
If the job exits after less than an hour, the job should be placed on
hold and an e-mail notification sent,
instead of being allowed to leave the queue.
on_exit_hold = (CurrentTime - JobStartDate) < (60 * $(MINUTE))
This expression places the job on hold if it exits for any reason
before running for an hour. An e-mail will be sent to the user explaining
that the job was placed on hold because this expression became True.
periodic_* expressions take
precedence over on_exit_* expressions,
and *_hold expressions take
precedence over a *_remove expressions.
If left unspecified, this will default to False.
This expression is available for the vanilla, java, and scheduler universes.
It is additionally available, when submitted from a Unix machine,
for the standard universe.
- on_exit_remove = <ClassAd Boolean Expression>
- This expression
is checked when the job exits and if true, then it allows the job to leave the
queue normally. If false, then the job is placed back into the Idle state.
If the user job runs under the vanilla universe,
then the job restarts from the beginning.
If the user job runs under the standard universe,
then it continues from where it left off, using the last checkpoint.
For example,
suppose you have a job that occasionally segfaults,
but you know if you run the job again with the same data,
chances are that the will finish successfully.
This is
how you would represent that with on_exit_remove
(assuming the signal identifier for segmentation fault is 11 on the
platform where your job will be running):
on_exit_remove = (ExitBySignal == False) || (ExitSignal != 11)
This expression will only let the job leave the queue if the job was
not killed by a signal (it exited normally on its own) or if it was
killed by a signal other than 11 (representing segmentation fault).
So, if it was killed by signal 11, it will stay in the job queue.
In any other case of the job exiting,
the job will leave the queue as it normally would have done.
As another example,
if your job should only leave the queue if it exited on its own with
status 0,
you would use this on_exit_remove expression:
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
If the job was killed by a signal or exited with a non-zero exit
status, Condor would leave the job in the queue to run again.
If left unspecified, the on_exit_remove expression will
default to True.
periodic_* expressions take
precedence over on_exit_* expressions,
and *_hold expressions take
precedence over a *_remove expressions.
This expression is available for the vanilla, java, and scheduler universes. It
is additionally available, when submitted from a Unix machine, for the
standard universe. Note that the condor_ schedd daemon,
by default, only checks
these periodic expressions once every 300 seconds. The period of
these evaluations can be adjusted by setting the
PERIODIC_EXPR_INTERVAL configuration macro.
- periodic_hold = <ClassAd Boolean Expression>
-
This expression is checked periodically at an interval of
the number of seconds set by
the configuration variable PERIODIC_EXPR_INTERVAL.
If it becomes true, the job will be placed on hold.
If unspecified, the default value is False.
See the Examples section for an example of a periodic_*
expression.
periodic_* expressions take
precedence over on_exit_* expressions,
and *_hold expressions take
precedence over a *_remove expressions.
This expression is available for the vanilla, java and grid universes.
It is additionally available, when submitted from a Unix machine,
for the standard universe. Note that the schedd, by default, only checks
periodic expressions once every 300 seconds. The period of
these evaluations can be adjusted by setting the
PERIODIC_EXPR_INTERVAL configuration macro.
- periodic_release = <ClassAd Boolean Expression>
-
This expression is checked periodically at an interval of
the number of seconds set by
the configuration variable PERIODIC_EXPR_INTERVAL
while the job is in the Hold state.
If the expression becomes True, the job will be released.
This expression is available for the vanilla, java, and grid universes.
It is additionally available, when submitted from a Unix machine,
for the standard universe. Note that the condor_ schedd daemon,
by default, only checks
periodic expressions once every 300 seconds. The period of
these evaluations can be adjusted by setting the
PERIODIC_EXPR_INTERVAL configuration macro.
- periodic_remove = <ClassAd Boolean Expression>
-
This expression is checked periodically at an interval of
the number of seconds set by
the configuration variable PERIODIC_EXPR_INTERVAL.
If it becomes True, the job is removed from the queue.
If unspecified, the default value is False.
See the Examples section for an example of a periodic_*
expression.
periodic_* expressions take
precedence over on_exit_* expressions,
and *_hold expressions take
precedence over a *_remove expressions.
So, the periodic_remove expression takes precedent over
the on_exit_remove expression,
if the two describe conflicting actions.
This expression is available for the vanilla, java and grid universes.
It is additionally available, when submitted from a Unix machine,
for the standard universe. Note that the schedd, by default, only checks
periodic expressions once every 300 seconds. The period of
these evaluations can be adjusted by setting the
PERIODIC_EXPR_INTERVAL configuration macro.
COMMANDS SPECIFIC TO THE STANDARD UNIVERSE
- allow_startup_script = <True | False>
-
If True, a standard universe job will execute a script
instead of submitting the job,
and the consistency check to see if the executable has
been linked using condor_ compile is omitted.
The executable command within the submit description
file specifies the name of the script.
The script is used to do preprocessing before the
job is submitted.
The shell script ends with an exec of the
job executable, such that the process id of the executable is the
same as that of the shell script.
Here is an example script that gets a copy of a machine-specific
executable before the exec.
#! /bin/sh
# get the host name of the machine
$host=`uname -n`
# grab a standard universe executable designed specifically
# for this host
scp elsewhere@cs.wisc.edu:${host} executable
# The PID MUST stay the same, so exec the new standard universe process.
exec executable ${1+"$@"}
If this command is not present (defined), then the value
defaults to false.
- append_files = file1, file2, ...
-
If your job attempts to access a file mentioned in this list,
Condor will force all writes to that file to be appended to the end.
Furthermore, condor_submit will not truncate it.
This list uses the same syntax as compress_files, shown above.
This option may yield some surprising results. If several
jobs attempt to write to the same file, their output may be intermixed.
If a job is evicted from one or more machines during the course of its
lifetime, such an output file might contain several copies of the results.
This option should be only be used when you wish a certain file to be
treated as a running log instead of a precise result.
This option only applies to standard-universe jobs.
- buffer_files = < `` name = (size,block-size) ; name2 = (size,block-size) ... '' >
-
- buffer_size = <bytes-in-buffer>
-
- buffer_block_size = <bytes-in-block>
-
Condor keeps a buffer of recently-used data for each file a job accesses.
This buffer is used both to cache commonly-used data and to consolidate small
reads and writes into larger operations that get better throughput.
The default settings should produce reasonable results for most programs.
These options only apply to standard-universe jobs.
If needed, you may set the buffer controls individually for each file using
the buffer_files option. For example, to set the buffer size to 1 Mbyte and
the block size to 256 KBytes for the file input.data, use this command:
buffer_files = "input.data=(1000000,256000)"
Alternatively, you may use these two options to set
the default sizes for all files used by your job:
buffer_size = 1000000
buffer_block_size = 256000
If you do not set these, Condor will use the values given by these
two configuration file macros:
DEFAULT_IO_BUFFER_SIZE = 1000000
DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000
Finally, if no other settings are present, Condor will use
a buffer of 512 Kbytes
and a block size of 32 Kbytes.
- compress_files = file1, file2, ...
-
If your job attempts to access any of the files mentioned in this list,
Condor will automatically compress them (if writing) or decompress them (if reading).
The compress format is the same as used by GNU gzip.
The files given in this list may be simple file names or complete paths and may
include * as a wildcard. For example, this list causes the file /tmp/data.gz,
any file named event.gz, and any file ending in .gzip to be automatically
compressed or decompressed as needed:
compress_files = /tmp/data.gz, event.gz, *.gzip
Due to the nature of the compression format, compressed files must only
be accessed sequentially. Random access reading is allowed but is very slow,
while random access writing is simply not possible. This restriction may be
avoided by using both compress_files and fetch_files at the same time. When
this is done, a file is kept in the decompressed state at the execution
machine, but is compressed for transfer to its original location.
This option only applies to standard universe jobs.
- fetch_files = file1, file2, ...
If your job attempts to access a file mentioned in this list,
Condor will automatically copy the whole file to the executing machine,
where it can be accessed quickly. When your job closes the file,
it will be copied back to its original location.
This list uses the same syntax as compress_files, shown above.
This option only applies to standard universe jobs.
- file_remaps = < `` name = newname ; name2 = newname2 ... ''>
-
Directs Condor to use a new file name in place of an old one. name
describes a file name that your job may attempt to open, and newname
describes the file name it should be replaced with.
newname may include an optional leading
access specifier, local:
or remote:
. If left unspecified,
the default access specifier is remote:
. Multiple remaps can be
specified by separating each with a semicolon.
This option only applies to standard universe jobs.
If you wish to remap file names that contain equals signs or semicolons,
these special characters may be escaped with a backslash.
- Example One:
- Suppose that your job reads a file named dataset.1.
To instruct Condor
to force your job to read other.dataset instead,
add this to the submit file:
file_remaps = "dataset.1=other.dataset"
- Example Two:
- Suppose that your run many jobs which all read in the same large file,
called very.big.
If this file can be found in the same place on
a local disk in every machine in the pool,
(say /bigdisk/bigfile,) you can
instruct Condor of this fact by remapping very.big to
/bigdisk/bigfile and specifying that the file is to be read locally,
which will be much faster than reading over the network.
file_remaps = "very.big = local:/bigdisk/bigfile"
- Example Three:
- Several remaps can be applied at once by separating each with a semicolon.
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"
- local_files = file1, file2, ...
-
If your job attempts to access a file mentioned in this list,
Condor will cause it to be read or written at the execution machine.
This is most useful for temporary files not used for input or output.
This list uses the same syntax as compress_files, shown above.
local_files = /tmp/*
This option only applies to standard universe jobs.
- want_remote_io = <True | False>
-
This option controls how a file is opened and manipulated in a standard
universe job.
If this option is true, which is the default, then the condor_ shadow
makes all decisions about how each and every file should be opened by
the executing job.
This entails a network round trip (or more) from the job to the
condor_ shadow and back again for every single open()
in addition to other needed information about the file.
If set to false, then when the job queries the condor_ shadow for the
first time about how to open a file, the condor_ shadow will inform the
job to automatically perform all of its file manipulation on the local
file system on the execute machine and any file remapping will be ignored.
This means that there must be a shared file system (such
as NFS or AFS) between the execute machine and the submit machine and that
ALL paths that the job could open on the execute machine must be valid.
The ability of the standard universe job to checkpoint, possibly to a
checkpoint server, is not affected by this attribute.
However, when the job resumes it will be expecting the same file system
conditions that were present when the job checkpointed.
COMMANDS FOR THE GRID
- globus_rematch = <ClassAd Boolean Expression>
-
This expression is evaluated by the condor_ gridmanager whenever:
- the globus_resubmit expression evaluates to True
- the condor_ gridmanager decides it needs to retry a submission
(as when a previous submission failed to commit)
If globus_rematch evaluates to True,
then before the job is submitted again to globus,
the condor_ gridmanager will request that the condor_ schedd
daemon renegotiate
with the matchmaker (the condor_ negotiator).
The result is this job will be matched again.
- globus_resubmit = <ClassAd Boolean Expression>
-
The expression is evaluated by the condor_ gridmanager each time
the condor_ gridmanager gets a job ad to manage.
Therefore, the expression is evaluated:
- when a grid universe job is first submitted to Condor-G
- when a grid universe job is released from the hold state
- when Condor-G is restarted (specifically, whenever the condor_ gridmanager
is restarted)
If the expression evaluates to True,
then any previous submission to the grid universe will be
forgotten and this job will be submitted again as a fresh submission to
the grid universe.
This may be useful if there is a desire to give up on a
previous submission and try again.
Note that this may result in the same job running more than
once. Do not treat this operation lightly.
- globus_rsl = <RSL-string>
-
Used to provide any additional Globus RSL
string attributes which are not covered by other submit description
file commands or job attributes. Used for grid universe
jobs, where the grid resource has a grid-type-string of
gt2 or gt3.
- globus_xml = <XML-string>
-
Used to provide any additional attributes in the GRAM XML job description
that Condor writes which are not covered by regular submit description
file parameters. Used for grid type gt4 jobs.
- grid_resource = <grid-type-string> <grid-specific-parameter-list>
-
For each grid-type-string value,
there are further type-specific values that must specified.
This submit description file command allows each to
be given in a space-separated list.
Allowable grid-type-string values are
gt2, gt3, gt4,
condor, nordugrid, and unicore.
See section 5.3 for details on the variety of
grid types.
For a grid-type-string of condor,
the first parameter is the name of the remote condor_ schedd
daemon.
The second parameter is the name of the pool to which the remote
condor_ schedd daemon belongs.
See section 5.3.1 for details.
For a grid-type-string of gt2,
the single parameter is the name of the pre-WS GRAM resource to be used.
See section 5.3.2 for details.
For a grid-type-string of gt3,
the single parameter is the name of the OSGA GRAM service to be used.
See section 5.3.2 for details.
For a grid-type-string of gt4,
the first parameter is the name of the WS GRAM service to be used.
The second parameter is the name of WS resource to be used (usually the
name of the back-end scheduler).
See section 5.3.2 for details.
For a grid-type-string of lsf, no additional
parameters are used.
See section 5.3.6 for details.
For a grid-type-string of nordugrid,
the single parameter is the name of the NorduGrid resource to be used.
See section 5.3.3 for details.
For a grid-type-string of pbs, no additional
parameters are used.
See section 5.3.5 for details.
For a grid-type-string of unicore,
the first parameter is the name of the Unicore Usite to be used.
The second parameter is the name of the Unicore Vsite to be used.
See section 5.3.4 for details.
- keystore_alias = <name>
-
A string to locate the certificate in a Java keystore file,
as used for a unicore job.
- keystore_file = <pathname>
-
The complete path and file name of the Java keystore file
containing the certificate to be used for a unicore job.
- keystore_passphrase_file = <pathname>
-
The complete path and file name
to the file containing the passphrase protecting a Java keystore
file containing the certificate.
Relevant for a unicore job.
- MyProxyCredentialName = <symbolic name>
-
The symbolic name that identifies a credential to the MyProxy server.
This symbolic name is set as the credential is
initially stored on the server (using myproxy-init).
- MyProxyHost = <host>:<port>
-
The Internet address of the host that is the MyProxy server.
The host may be specified by either a host name
(as in head.example.com) or an IP address
(of the form 123.456.7.8).
The port number is an integer.
- MyProxyNewProxyLifetime = <number-of-minutes>
-
The new lifetime (in minutes) of the proxy after it is refreshed.
- MyProxyPassword = <password>
-
The password needed to refresh a credential on the MyProxy server.
This password is set when the user initially stores
credentials on the server (using myproxy-init).
As an alternative to using MyProxyPassword in the
submit description file,
the password may be specified as a command line argument to condor_ submit
with the -password argument.
- MyProxyRefreshThreshold = <number-of-seconds>
-
The time (in seconds) before the expiration of a proxy
that the proxy should be refreshed.
For example, if MyProxyRefreshThreshold is set to the
value 600, the proxy will be refreshed 10 minutes before
it expires.
- MyProxyServerDN = <credential subject>
-
A string that specifies the expected Distinguished Name (credential subject,
abbreviated DN)
of the MyProxy server.
It must be specified when the MyProxy server
DN does not follow the
conventional naming scheme of a host credential.
This occurs, for
example, when the MyProxy server DN begins with a user credential.
- nordugrid_rsl = <RSL-string>
-
Used to provide any additional RSL
string attributes which are not covered by regular submit description
file parameters. Used when the universe is grid,
and the type of grid system is nordugrid.
- transfer_error = <True | False>
-
For jobs submitted to the grid universe only.
If True, then the error output (from stderr) from the job
is transferred from the remote machine back to the submit machine.
The name of the file after transfer is given
by the error command.
If False, no transfer takes place (from the remote machine
to submit machine),
and the name of the file is given
by the error command.
The default value is True.
- transfer_input = <True | False>
-
For jobs submitted to the grid universe only.
If True, then the job input (stdin) is transferred
from the machine where the job was submitted to the remote machine.
The name of the file that is transferred is given by the
input command.
If False, then the job's input is taken from a pre-staged
file on the remote machine, and
the name of the file is given by the input command.
The default value is True.
For transferring files other than stdin,
see transfer_input_files.
- transfer_output = <True | False>
-
For jobs submitted to the grid universe only.
If True, then the output (from stdout) from the job
is transferred from the remote machine back to the submit machine.
The name of the file after transfer is given
by the output command.
If False, no transfer takes place (from the remote machine
to submit machine),
and the name of the file is given
by the output command.
The default value is True.
For transferring files other than stdout,
see transfer_output_files.
- x509userproxy = <full-pathname>
- Used to override the default
path name for X.509 user certificates. The default location for X.509 proxies
is the /tmp directory,
which is generally a local file system.
Setting
this value would allow Condor to access the proxy in a shared file system
(for example, AFS).
Condor will use the proxy specified in the submit description file first.
If nothing is specified in the submit description file,
it will use the environment variable X509_USER_CERT.
If that variable is not present,
it will search in the default location.
x509userproxy is relevant when
the universe is grid,
and the type of grid system is one of gt2, gt3,
gt4, or nordugrid.
COMMANDS FOR PARALLEL, JAVA, SCHEDULER, and PVM UNIVERSES
- hold_kill_sig = <signal-number>
- For the scheduler universe only,
signal-number is the signal delivered
to the job when the job is put on hold
with condor_ hold.
signal-number may be either the platform-specific name or value
of the signal.
If this command is not present,
the value of kill_sig is used.
- jar_files = <file_list>
-
Specifies a list of additional JAR files to include when using
the Java universe. JAR files will be transferred along with
the executable and automatically added to the classpath.
- java_vm_args = <argument_list>
-
Specifies a list of additional arguments to the Java VM itself,
When Condor runs the Java program, these are the arguments that
go before the class name. This can be used to set VM-specific
arguments like stack size, garbage-collector arguments
and initial property values.
- machine_count = <min..max> | <max>
-
For the PVM universe,
both min and max or just
max may be defined.
If machine_count is
specified, Condor will not start the job until it can simultaneously
supply the job with min machines. Condor will continue to try
to provide up
to max machines, but will not delay starting of the job to do so.
If the job is started with fewer than max machines, the job
will be notified via a usual PvmHostAdd notification as additional
hosts come on line.
For the parallel (and therefore, the mpi) universe,
a single value (max) is required.
It is neither a maximum or minimum, but
the number of machines to be dedicated toward running the job.
- remove_kill_sig = <signal-number>
- For the scheduler universe only,
signal-number is the signal delivered
to the job when the job is removed
with condor_ rm.
signal-number may be either the platform-specific name or value
of the signal.
This example shows it both ways for a Linux signal:
remove_kill_sig = SIGUSR1
remove_kill_sig = 10
If this command is not present,
the value of kill_sig is used.
ADVANCED COMMANDS
- copy_to_spool = <True | False>
- If copy_to_spool is set to
True, then condor_ submit will copy the executable to the local spool
directory before running it on a remote host. Oftentimes this can be quite
time consuming and unnecessary. By setting it to False, condor_ submit
will skip this step. The default is False for grid universe jobs
or when the -spool or -remote options are used; the default is True for
all other jobs.
- coresize = <size>
- Should the user's program abort and produce
a core file, coresize specifies the maximum size in bytes of the
core file which the user wishes to keep. If coresize is not
specified in the command file, the system's user resource limit
``coredumpsize'' is used.
This limit is not used in HP-UX and DUX operating systems.
- deferral_time = <Unix Epoch Timestamp>
-
This option allows a job to begin execution at a specific time
instead of executing as soon as it arrives at the execution
machine. The deferral time is an expression that must
evaluate to a Unix Epoch timestamp (the number of
seconds elapsed since 00:00:00 on January 1, 1970, Coordinated
Universal Time). A job using this option will be delayed for
execution by the condor_ starter until the deferral
time arrives. If the job misses its execution time, that is, if
the deferral time is in the past, the job will be aborted and
removed from the queue. The time that the job will start at is
based on the execution machine's system clock.
The following example will set a job to run at exactly on
January 1st, 2006 at 12:00 pm:
DeferralTime = 1136138400
This example will cause a job to always wait 60 seconds after
it arrives at the execution machine before executing:
DeferralTime = (CurrentTime + 60)
To allow for jobs to run even if the deferral time is missed,
please refer to deferral_window.
Please note that scheduler universe jobs are unable to use this
feature because they are not executed by the condor_ starter.
A scheduler job will fail to be submitted if condor_ deferral_time
is defined in its submission file.
- deferral_window = <number-of-seconds>
-
The deferral window is used in conjunction with the
deferral_time command to allow jobs that
miss their execution time to run anyway. The window is
the number of seconds in the past Condor is willingly
to run a job if the deferral time is missed.
In the example below, the deferral_time always
evaluate to 60 seconds in the past from the current time, but
the job is still allowed to execute because the
deferral_window is 120 seconds:
DeferralWindow = 120
DeferralTime = (CurrentTime - 60)
- image_size = <size>
- This command tells Condor the maximum
virtual image size to which you believe your program will grow during
its execution. Condor will then execute your job only on machines which
have enough resources, (such as virtual memory), to support executing
your job. If you do not specify the image size of your job in the
description file, Condor will automatically make a (reasonably accurate)
estimate about its size and adjust this estimate as your program runs.
If the image size of your job is underestimated, it may crash due to
inability to acquire more address space, e.g. malloc() fails. If the image
size is overestimated, Condor may have difficulty finding machines which
have the required resources. size must be in kbytes, e.g. for
an image size of 8 megabytes, use a size of 8000.
- initialdir = <directory-path>
-
Used to give jobs a directory with respect to file input and output.
Also provides a directory
(on the machine from which the job is submitted)
for the user log, when a full path is not specified.
For vanilla or MPI universe jobs where there is a shared file system,
it is the current working directory on the machine where the
job is executed.
For vanilla, grid, or MPI universe jobs where file transfer mechanisms are
utilized (there is not a shared file system),
it is the directory on the machine from which the job is submitted
where the input files come from, and where the job's output
files go to.
For standard universe jobs,
it is the directory on the machine from which the job is submitted
where the condor_ shadow daemon runs;
the current working directory for file input and output accomplished
through remote system calls.
For scheduler universe jobs,
it is the directory on the machine from which the job is submitted
where the job runs;
the current working directory for file input and output with
respect to relative path names.
Note that the path to the executable is not relative to
initialdir; if it is a relative path, it is relative to the
directory in which the condor_ submit command is run.
- job_lease_duration = <number-of-seconds>
- For vanilla
and java universe jobs only, the duration (in seconds) of a
job lease. The default value is undefined.
See section 2.14.4 for details of job leases.
- kill_sig = <signal-number>
- When Condor needs to kick a job
off of a machine, it will send the job the signal specified by
signal-number. signal-number needs to be an integer which
represents a valid signal on the execution machine. For jobs submitted
to the standard universe, the default value is the number for
SIGTSTP
which tells the Condor libraries to initiate a checkpoint
of the process. For jobs submitted to the vanilla universe,
the default
is SIGTERM
which is the standard way to terminate a program in Unix.
- match_list_length = <integer value>
-
Defaults to the value zero (0).
When match_list_length is defined with an integer value
greater than zero (0),
attributes are inserted into the job ClassAd.
The maximum number of attributes defined is given by the integer
value.
The job ClassAds introduced are given as
LastMatchName0 = "most-recent-Name"
LastMatchName1 = "next-most-recent-Name"
The value for each introduced ClassAd is given by the
value of the Name attribute
from the machine ClassAd of a previous execution (match).
As a job is matched, the definitions for these attributes
will roll,
with LastMatchName1
becoming LastMatchName2
,
LastMatchName0
becoming LastMatchName1
,
and LastMatchName0
being set by the most recent
value of the Name attribute.
An intended use of
these job attributes is in the requirements expression.
The requirements can allow a job to prefer a match with either the same
or a different resource than a previous match.
- max_job_retirement_time = <integer expression>
-
An integer-valued expression (in seconds) that
does nothing unless the machine that runs the job has been configured
to provide retirement time
(see section 3.5.9).
Retirement time is a
grace period given to a job to finish naturally
when a resource claim is about to be preempted.
No kill signals are sent during a retirement time.
The default behavior in many cases is to take as much
retirement time as the machine offers,
so this command will rarely appear in a submit description file.
When a resource claim is to be preempted, this expression in the
submit file specifies the maximum run time of the job (in seconds, since
the job started).
This expression has no effect,
if it is greater than the maximum retirement time provided
by the machine policy.
If the resource claim is not preempted,
this expression and the machine retirement policy are irrelevant.
If the resource claim is preempted
and the job finishes sooner than the maximum time,
the claim closes gracefully and all is well.
If the resource claim is preempted
and the job does not finish in time,
the usual preemption
procedure is followed (typically a soft kill signal, followed by some
time to gracefully shut down, followed by a hard kill signal).
Standard universe jobs and any jobs running with nice_user
priority have a default max_job_retirement_time of 0,
so no retirement time is utilized by default.
In all other cases,
no default value is provided,
so the maximum amount of retirement time is utilized by default.
Setting this expression does not affect the job's resource
requirements or preferences.
For a job to only run on
a machine with a minimum ,
or to preferentially run on such machines, explicitly
specify this in the requirements and/or rank expressions.
- nice_user = <True | False>
- Normally, when a machine
becomes available to Condor, Condor decides which job to run based upon
user and job priorities. Setting nice_user equal to True
tells Condor not to use your regular user priority, but that this job
should have last priority among all users and all jobs. So jobs
submitted in this fashion run only on machines which no other
non-nice_user job wants -- a true ``bottom-feeder'' job! This is very
handy if a user has some jobs they wish to run, but do not wish to use
resources that could instead be used to run other people's Condor jobs. Jobs
submitted in this fashion have ``nice-user.'' pre-appended in front of
the owner name when viewed from condor_ q or condor_ userprio. The
default value is False.
- noop_job = <ClassAd Boolean Expression>
-
When this boolean expression is True,
the job is immediately removed from the queue,
and Condor makes no attempt at running the job.
The log file for the job will show a
job submitted event and a job terminated event,
along with an exit code of 0,
unless the user specifies a different signal or exit code.
- noop_job_exit_code = <return value>
-
When noop_job is in the submit description file
and evaluates to True,
this command allows the job
to specify the return value as shown in the job's log file
job terminated event.
If not specified, the job will show as having terminated with status 0.
This overrides any value specified with noop_job_exit_signal.
- noop_job_exit_signal = <signal number>
-
When noop_job is in the submit description file
and evaluates to True,
this command allows the job
to specify the signal number that the job's log event will show
the job having terminated with.
- remote_initialdir = <directory-path>
-
The path specifies the directory in which the job is to be
executed on the remote machine. This is currently supported in all
universes except for the standard universe.
- rendezvousdir = <directory-path>
- Used to specify the
shared file system directory to be used for file system authentication
when submitting to a remote scheduler. Should be a path to a preexisting
directory.
- +<attribute> = <value>
- A line which begins with a '+'
(plus) character instructs condor_ submit to insert the
following attribute into the job ClasssAd with the given
value.
In addition to commands, the submit description file can contain macros
and comments:
- Macros
- Parameterless macros in the form of $(macro_name)
may be inserted anywhere in Condor submit description files. Macros can be
defined by lines in the form of
<macro_name> = <string>
Three pre-defined macros are supplied by the submit description file parser.
The third of the pre-defined macros is only relevant to MPI universe
jobs.
The
$(Cluster) macro supplies the value of the
ClusterId job
ClassAd attribute, and the
$(Process) macro supplies the value of the
ProcId job
ClassAd attribute.
These macros are
intended to aid in the specification of input/output files, arguments,
etc., for clusters with lots of jobs, and/or could be used to supply a
Condor process with its own cluster and process numbers on the command
line. The $(Process) macro should not be used for PVM jobs.
The
$(Node) macro is defined only for MPI universe jobs.
It is a unique value assigned for the duration of the job
that essentially identifies the machine on which a program is
executing.
If the dollar sign ($
) is desired as a literal character,
then use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of macro
called a substitution macro
that allows the substitution of
expressions defined on the resource machine itself (gotten after a match
to the machine has been made) into specific expressions within the
submit description file. The substitution macro is of the form:
$$(attribute)
A common use of this macro is for the heterogeneous
submission of an executable:
executable = povray.$$(opsys).$$(arch)
Values for the opsys and arch attributes are substituted at
match time for any given resource. This allows Condor to automatically
choose the correct executable for the matched machine.
An extension to the syntax of the substitution macro provides an
alternative string to use if the machine attribute within the
substitution macro is undefined.
The syntax appears as:
$$(attribute:string_if_attribute_undefined)
An example using this extended syntax provides a path name to a
required input file.
Since the file can be placed in different locations on
different machines, the file's path name is given as an argument
to the program.
argument = $$(input_file_path:/usr/foo)
On the machine, if the attribute input_file_path is not
defined, then the path /usr/foo is used instead.
The environment macro, $ENV, allows the evaluation of an environment
variable to be used in setting a submit description file command.
The syntax used is
$ENV(variable)
An example submit description file command that uses this functionality
evaluates the submittor's home directory in order to set the
path and file name of a log file:
log = $ENV(HOME)/jobs/logfile
The environment variable is evaluated when the submit description
file is processed.
The $RANDOM_CHOICE macro allows a random choice to be made
from a given list of parameters at submission time.
For an expression, if some randomness needs to be generated,
the macro may appear as
$RANDOM_CHOICE(0,1,2,3,4,5,6)
When evaluated, one of the parameters values will be chosen.
- Comments
- Blank lines and lines beginning with a
pound sign
('#')
character are ignored by the submit description file parser.
- -verbose
- Verbose output - display the created job ClassAd
- -name schedd_name
- Submit to the specified condor_ schedd.
Use this option to submit to a condor_ schedd other than the default local one.
- -remote schedd_name
- Submit to the specified
condor_ schedd, spooling all required input files over the network connection.
This option is equivalent to using both -name and -spool.
- -pool pool_name
- Look in the specified pool for the
condor_ schedd to submit to.
This option is used with -name or -remote.
- -disable
- Disable file permission checks.
- -password passphrase
- Specify a password to the
MyProxy server.
- -debug
- Cause debugging information to be sent to stderr,
based on the value of the configuration variable SUBMIT_DEBUG.
- -append command
- Augment the commands in the submit
description file with the given command.
This command will be considered to immediately precede the Queue
command within the submit description file, and come after all other
previous commands.
The submit description file is not modified.
Multiple commands are specified by using the -append option multiple times.
Each new command is given in a separate -append option.
Commands with spaces in them will need to be enclosed in double quote
marks.
- -spool
- Spool all required input files, user log, and
proxy over the connection to the condor_ schedd.
After submission, modify
local copies of the files without affecting your jobs. Any output
files for completed jobs need to be retrieved with condor_ transfer_data.
- submit description file
- The pathname to the submit description
file. If this optional argument is missing or equal to ``-'', then the commands
are taken from standard input.
condor_ submit will exit with a status value of 0 (zero) upon success, and a
non-zero value upon failure.
- Submit Description File Example 1: This example queues three jobs for
execution by Condor. The first will be given command line arguments of
15 and 2000, and it will write its standard output
to foo.out1.
The second will be given command line arguments of
30 and 2000, and it will
write its standard output to foo.out2.
Similarly the third will have
arguments of
45 and 6000, and it will use foo.out3 for its standard
output. Standard error output (if any) from all three programs will
appear in foo.error.
####################
#
# submit description file
# Example 1: queuing multiple jobs with differing
# command line arguments and output files.
#
####################
Executable = foo
Universe = standard
Arguments = 15 2000
Output = foo.out1
Error = foo.err1
Queue
Arguments = 30 2000
Output = foo.out2
Error = foo.err2
Queue
Arguments = 45 6000
Output = foo.out3
Error = foo.err3
Queue
- Submit Description File Example 2: This submit description file
example queues 150
runs of program foo which must have been compiled and linked for
Sun workstations running Solaris 8.
Condor will not attempt
to run the processes on machines which have less than 32 Megabytes of
physical memory, and it will run them on machines which have at least 64
Megabytes, if such machines are available.
Stdin, stdout, and stderr will
refer to in.0, out.0, and err.0 for the first run
of this program (process 0).
Stdin, stdout, and stderr will refer to
in.1, out.1, and err.1 for process 1, and so forth.
A log file containing entries
about where and when Condor runs, takes checkpoints, and migrates processes
in this cluster will be written into file foo.log.
####################
#
# Example 2: Show off some fancy features including
# use of pre-defined macros and logging.
#
####################
Executable = foo
Universe = standard
Requirements = Memory >= 32 && OpSys == "SOLARIS28" && Arch =="SUN4u"
Rank = Memory >= 64
Image_Size = 28 Meg
Error = err.$(Process)
Input = in.$(Process)
Output = out.$(Process)
Log = foo.log
Queue 150
- Command Line example: The following command uses the
-append option to add two commands before the job(s) is queued.
A log file and an error log file are specified.
The submit description file is unchanged.
condor_submit -a "log = out.log" -a "error = error.log" mysubmitfile
Note that each of the added commands is contained within quote marks
because there are space characters within the command.
- periodic_remove example:
A job should be removed from the queue,
if the total suspension time of the job
is more than half of the run time of the job.
Including the command
periodic_remove = CumulativeSuspensionTime >
((RemoteWallClockTime - CumulativeSuspensionTime) / 2.0)
in the submit description file causes this to happen.
- For security reasons, Condor will refuse to run any jobs submitted
by user root (UID = 0) or by a user whose default group is group wheel
(GID = 0). Jobs submitted by user root or a user with a default group of
wheel will appear to sit forever in the queue in an idle state.
- All pathnames specified in the submit description file must be
less than 256 characters in length, and command line arguments must be
less than 4096 characters in length; otherwise, condor_ submit gives a
warning message but the jobs will not execute properly.
- Somewhat understandably, behavior gets bizarre if the user makes
the mistake of requesting multiple Condor jobs to write to the
same file, and/or if the user alters any files that need to be accessed
by a Condor job which is still in the queue.
For example, the compressing of data or
output files before a Condor job has completed is a common mistake.
- To disable checkpointing for Standard Universe jobs, include the
line:
+WantCheckpoint = False
in the submit description file before the queue command(s).
Condor User Manual
Condor Team, University of Wisconsin-Madison
Copyright © 1990-2006 Condor Team, Computer Sciences Department,
University of Wisconsin-Madison, Madison, WI. All Rights Reserved.
No use of the Condor Software Program is authorized
without the express consent of the Condor Team. For more information
contact: Condor Team, Attention: Professor Miron Livny,
7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685,
(608) 262-0856 or miron@cs.wisc.edu.
U.S. Government Rights Restrictions: Use, duplication, or disclosure
by the U.S. Government is subject to restrictions as set forth in
subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer
Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and
(2) of Commercial Computer Software-Restricted Rights at 48 CFR
52.227-19, as applicable, Condor Team, Attention: Professor Miron
Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison,
WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.
See the Condor Version 6.8.3 Manual for
additional notices.
Next: condor_ submit_dag
Up: 9. Command Reference Manual
Previous: condor_ store_cred
Contents
Index
condor-admin@cs.wisc.edu