Next: 8.5 Stable Release Series
Up: 8. Version History and
Previous: 8.3 Stable Release Series
8.4 Development Release Series 6.7
This is the development release series of Condor.
The details of each version are described below.
- Condor no longer supports SGI IRIX platforms. No futher
releases for this platform will be built or distributed.
- condor_ submit on Windows no longer checks that the schedd has
access to the submitter's credential if invoked with the -n or
-r option. It is therefore necessary to make sure ahead of time
that the credential is correctly stored with condor_ store_cred
before doing a remote submit.
- Version 1.3.2 of the Generic Connection Broker (GCB) library is
now used for building Condor, and it is the 1.3.2 versions of the
gcb_broker and gcb_relay_server programs that are
included in this release.
For more information about GCB, see section 3.7.3 on
- Added a variety of built-in functions to ClassAds. Examples of
new functionality include the ability to express conditionals, string
operations, and regular expression matching.
- Condor can now map authenticated names (e.g. an X509 subject
name or Kerberos principle) to canonical Condor user names via a
- condor_stats and the view server are now aware of the new backfill
state for machines, and record and report statistics on it.
- Condor now supports running backfill jobs on Windows machines.
See section 3.13.9 on page for
more information about running backfill jobs with Condor.
- Condor-C is now supported on Windows. When using Condor-C to
direct a job to a Windows remote schedd, one must be careful to ensure
that their credential is accessible to the remote schedd and that the
NTDomain attribute in the remote job ClassAd is set
correctly. In particular, if the local schedd resides in a different
Windows domain from that of the remote schedd, it is necessary to
include a line like the following in the submit file:
+remote_NTDomain = "OTHERDOMAIN"
- Added a SUBMIT_MAX_PROCS_IN_CLUSTER configuration
parameter to allow administrators to limit the number of jobs that can
be submitted in a single cluster when using condor_ submit. This
parameter defaults to 0, which implies no limit.
- Added config file parameter QUEUE_ALL_USERS_TRUSTED which can
be used to disable authorization checks to the job queue. See
section 3.3.11 on
- condor_ dagman now re-checks immediately before job submission
that every node job submit file defines a log file.
- condor_ dagman now requires that all Stork submit files used
in a DAG define a log file.
- New macro functionality in job ClassAds: $$([ClassAd Expression]). The contained ClassAd expression is evaluated when the job is matched. "My" refers to attributes in the job's ClassAd, "Target" refers to attributes in the machine classad.
- Condor can now run a program to obtain it's configuration
If a configuration filename (such the environment variable CONDOR_CONFIG
or the configuration parameter LOCAL_CONFIG_FILE) ends with a vertical
bar (``| ''), it is executed and its standard output is parsed
for configuration parameters. If LOCAL_CONFIG_FILE is used in this way,
then it can only contain a single item, and spaces in the value will be
interpreted as part of the command to be executed.
- Added the condor_ dagman configuration parameter
DAGMAN_PROHIBIT_MULTI_JOBS , which prohibits condor_ dagman
from running a DAG that references node job submit files that
queue multiple jobs (other than parallel universe).
- A number of types of failures to run a job now result in the job
going on hold, rather than immediately being returned to the idle
state to be tried again. This currently does not apply to
standard universe jobs. The types of errors that now result in the
job going on hold are failure to execute the specified program,
failure to transfer files, failure to open input or output files,
and failure to access the job's initial working directory.
In all such cases, a specific hold reason is specified in the job
ClassAd, along with a numeric hold code and subcode. If you wish to
automatically retry in such cases (the old behavior), then you can
specify a PeriodicRelease expression that checks for
specific hold states.
- Strong authentication using SSL is now available for web-service clients
using the SOAP (BirdBath) interface commands. The Condor daemons can
communicate via HTTPS on a specified port, and clients must present a
client-side SSL certificate.
- Previously, the condor_ schedd only would communicate with one
web-service client at a time. This restriction has now been removed; multiple
simultaneous transactions to the schedd via the SOAP (BirdBath) interface is
- condor_ submit will now issue a warning if the user / job log
is on an NFS mounted file system.
- When a job terminates or is removed and its working directory is
on an NFS mounted file system, the condor_ schedd creates and removes a
file in the working directory to force the NFS client to sync with the
NFS server and see any files written by the job.
- Non-blocking connect operations are now used in two cases:
sending ClassAd updates from Condor Daemons to the collector and
sending match information from the negotiator to the startd. Both of
these operations are UDP-based (unless you enable TCP updates to the
collector), so non-blocking connects would not be an issue, except
that TCP connections are required whenever it is necessary to
establish a new security session. An example where the new non-blocking
behavior is helpful is when a machine is down and TCP connections to it
timeout. Daemons that try to connect to it using non-blocking connections
will no longer stop everything they are doing for the full duration of the
This feature may result in a greater number of sockets being open at
one time than previously (especially in the negotiator). There is not
yet support for placing a limit on the number of simultaneous
connection attempts, therefore, if you need to turn off the use of
non-blocking connects, you may do so with the following
NONBLOCKING_COLLECTOR_UPDATE = False
NEGOTIATOR_USE_NONBLOCKING_STARTD_CONTACT = False
- condor_ quill now includes an additional ``schema version'' table.
If the database was created prior to 6.7.20, the new table is automatically
added by the 6.7.20 Quill daemon.
- condor_ submit will now issue a warning if the user / job log
is on an NFS mounted file system.
- A condor_ schedd can now submit jobs directly to a local PBS
or LSF installation. To do this, submit the job with a
universe of grid and a
grid_resource of pbs or lsf.
- Implemented all of the functions from new classads into the
"old" classads now in Condor.
- Condor's format for storing the history file has been improved
so that some queries will now go much faster. In particular,
condor_ history now accepts the -backwards option, which will take
advantage of this change. Queries that only reference the job's
cluster id and proc id will be able to take advantage of this speed
increase, and in the near future, more fast queries will be
supported. You need to make no changes in order to deal with this new
history file format, unless you want to be able to search your entire
history file backwards, in which case you should run the new
condor_ convert_history program.
- Condor can now delegate a job's GSI X509 credentials when
transferring them over the wire, instead of copying them. This is
much more secure when communications are not encrypted. As this can
be a major performance hit when submitting large numbers of jobs
remotely, the old behavior can be forced by setting
DELEGATE_JOB_GSI_CREDENTIALS to False in the configuration
- Added configuration parameter NO_DNS , which allows Condor
to work on machines with no DNS. When this option is set to True, Condor
will use pseudo-hostnames constructed from a machine's IP address and
DEFAULT_DOMAIN_NAME , rather than attempting to resolve hostnames
into IP addresses and vice-versa.
- The JobLeaseDuration now defaults to 20 minutes for all
jobs that support this feature (everything except standard and PVM
universe, and jobs that request streaming I/O).
This way, by default, if the submit host crashes or there is a short
network outage, the condor_ schedd will be able to reconnect to jobs
that were executing at the time of problem.
- Condor daemons now touch their daemon log file periodically. When
a daemon starts up, it prints to the log the last time the log file was
modified. This lets an admin estimate when a daemon stopped running.
The configuration parameter TOUCH_LOG_INTERVAL sets the time
between touches (in seconds) and defaults to 60 seconds.
- Added the ability to pass a specific condor_ config_val program
to the cron/Hawkeye ``modules''. If ``HAWKEYE_CONFIG_VAL'' is
specified in the configuration, an environment variable with the same
name and the same value will be added to all cron job environments.
This change has no effect if the above macro is not specified in the
configuration. The above name ``HAWKEYE_CONFIG_VAL'' is
derived from the cron name (i.e. STARTD_CRON_NAME or
- condor_ submit_dag now generates a submit file with
copy_to_spool set to false. This reduces the load and
saves file space on the submit machine, especially if you are running
multiple instances of condor_ dagman.
- The authorization levels in Condor's security system now form a
hierarchy. A client with DAEMON or ADMINISTRATOR access also have WRITE
access. A client with WRITE, NEGOTIATOR, or CONFIG access also have
- Added configuration parameter
GRIDMANAGER_EMPTY_RESOURCE_DELAY , which sets how long the
condor_ gridmanager retains information about a grid resource after it
has no active jobs to that resource.
- Added configuration parameter JOB_PROXY_OVERRIDE_FILE ,
which lets an admin force a particular X509 proxy to be used for all
grid universe jobs, overriding whatever proxy may be specified in the
- condor_ dagman no longer uses the popen() system
call when running commands; this provides better security and
allows it to run on Windows without being a service.
- Added a new version of DRMAA which includes fixes and updates per
DRMAA spec finalization.
- For grid-type gt4 jobs, the resource lifetime on the remote server
will be based on job_lease_duration, if it's set.
- Improved the error message in condor_ dagman for
pclose() failures after submitting a node job.
- Added new job attribute GridResourceUnavailableTime, which
is equivalent to GlobusResourceUnavailableTime, but is used for
all grid universe jobs. One benefit of this new attribute is that grid
resource up/down user log events are logged correctly when the gridmanager
crashes and restarts.
- Added the ability to set GROUP_AUTOREGROUP on a per-group
basis, using the syntax GROUP_AUTOREGROUP_<groupname> = True/False.
- Added configuration variable SYSAPI_GET_LOADAVG to control
if Condor should attempt to fetch the system load average.
See section 3.3.3.
- Added configuration variable SCHEDD_ROUND_ATTR_<xxxx>. See
description in section 3.3.11 on
- The password authentication method now works on all
platforms. It was previously only available on Windows. UNIX platforms
will store the pool password in the file defined by the configuration
parameter SEC_PASSWORD_FILE . This file will be owned by the
real UID that Condor runs as and will only be accessible by that user.
- A new tool, condor_ userlog_job_counter, has been added.
Given a userlog file as an argument, it determines the number of
queued (e.g., submitted but not yet terminated or aborted) jobs
recorded in that userlog, and returns that value as an exit code. It
returns 255 if there are more than 254 queued jobs or to indicate an
error (e.g., a userlog reading/parsing error, no events found, a job
count <0 or >254, improper usage, etc.).
- The condor_ chirp tool has been added to the Windows
The environment variable X509_USER_PROXY is now set to the
full path of the proxy for scheduler universe jobs if a proxy is
associated with the job.
- Fixed a bug where condor_ q would exit with a non-zero exit status
even though it found and displayed the requested information or job queue.
- Fixed a bug in the dedicated scheduler where parallel and mpi
jobs with more than one proc in a cluster would only have the
Scheduler attribute set in the first proc.
- Fixed a bug in the condor_ collector that could cause it to
crash if it's configured as a view collector
(i.e. KEEP_POOL_HISTORY is TRUE). In particular, machine
ads with a State value of ``Backfill'' could trigger this crash.
- Fixed a related bug in condor_ stats that could cause a crash
when encountering a machine state of ``Backfill''.
- Disconnected starter-shadow connections (job leases) now work for flocked jobs.
- Fixed numeric value wrap-around bug for the totals in condor_ status.
- Grid universe jobs sent to Globus Toolkit 2 resources now
generate an evict user log event when the job transitions from Running
to Idle, along with another execute even when the job restarts. Previously
no events were logged in these cases, leading to the potentially confusing
situation where a job would be Idle in the queue, but the last job log entry
would indicate that the job was Running.
- All grid universe jobs now properly handle the new job attributes
Arguments, Environment, and TransferOutputRemaps.
- In some cases, a restart of Condor was required to properly handle
a change in the undocumented configuration parameter
SIGNIFICANT_ATTRIBUTES. Now, a condor_ reconfig is sufficient.
- Fixed a permissions problem that would cause automatic X509 proxy renewal for vanilla universe jobs to fail.
- Fixed a bug introduced in 6.7.17 that caused the configuration parameter
ENABLE_GRID_MONITOR to be ignored. The value would always be
- Improved fault recovery of gt2 grid jobs. This includes a work-around
for Globus bugzilla ticket 871.
- When the condor_ gridmanager cancels a job after
GlobusResubmit evaluates to true, it will no longer put the job
on hold if the cancel fails.
- Fixed the default COLLECTOR_QUERY_WORKERS entry in the example
central manager config_config; due to a cut and paste error it was
- In some cases, CondorLoadAvg was reporting a different
result, depending on the setting of NUM_CPUS, even with
everything else, such as the actual number of cpus, being the same.
The specific case in which this effect was noticeable was when the
machine load was greater than NUM_CPUS. CondorLoadAvg
is now independent of the setting of NUM_CPUS.
- When the Grid Monitor encounters problems, Condor will now try to
restart the Globus JobManagers for the affect grid universe jobs,
limited by GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE . The
previous behavior caused problems with sites that don't have a fork
JobManager, and Condor wouldn't react when a job's proxy expired.
- Fixed a bug that could cause extra Grid Monitor file to accumulate
under /tmp until the condor_ gridmanager exited.
- Fixed a problem in which preempting claims waiting on retiring
jobs (i.e. waiting on MaxJobRetirementTime) could get preempted
without sufficient rank or priority (because the new preemption only
had to beat the retiring job, not the preempting claim). Furthermore,
both the new preempting claim and the original preempting claim had
the same claim id, so they collided in a way that ultimately caused
both to be removed, and the respective jobs would go back into
unmatched state. The result was unnecessary negotiation churn and
slower convergence of resource usage to the desired distribution.
Now, preemption of preempting claims during long job retirement is
- Fixed a bug that caused the shadow to transfer a job's files
twice to the starter if the files were stored in Condor's spool
- On some systems, when Condor starts a gridftp server for gt4 grid
jobs, all transfers to or from the server will fail if it's not told
where its executable is located (using '-exec' on the command line).
Condor now gives this option to the gridftp server.
- Fixed a bug in condor_ dagman that could cause DAGMan to crash
if all submits fail for several nodes that have POST scripts.
This bug existed in versions 6.7.17 and 6.7.18.
- Fixed a bug in how Condor determines the version of a Condor
executable. This was preventing the grid universe from working on
Tru64 5.1 on Alpha.
- Fixed a bug in the condor_ gridmanager that could cause gt2 grid
jobs with an invalid proxy to become stuck (the condor_ gridmanager
would do nothing with the jobs and not acknowledge a hold or removal).
- Fixed a bug on Win32 that caused a failure when sending a WM_CLOSE
message to a job when the Condor daemons are running as a normal user (i.e. not
running as LocalSystem). Also, fixed a thread handle leak when sending a
- On Win32, fixed a bug that would cause the condor_ master to exit upon a
condor_ restart command when started as a service with
a service name other than "condor" (the default name used by the installer).
- Fixed a bug that could cause the condor_ master to crash when
sending a shutdown fast to a child process after the
SHUTDOWN_GRACEFUL_TIMEOUT timeout expired.
- Fixed a bug with automatically setting the undocumented
SIGNIFICANT_ATTRIBUTES configuration parameter in order to
speed up negotiation-- previously, the job's Requirements
expression was not correctly considered. With certain scheduling policy
expressions, this bug could have resulted in jobs staying idle in the
queue when they should have been launched.
- It used to be impossible to use the SUBMIT_EXPRS
configuration setting to provide default values for job submit file
keywords that were recognized by condor_ submit.
For example, administrators could define a default value for a
custom job attribute, but not something like Notification or
Now, administrators can use SUBMIT_EXPRS for any
settings, whether they are regular condor_ submit keywords or
custom job attributes.
- Fixed a bug in the condor_ startd that could cause resources to
get stuck in the Backill/Killing state if both the
START_BACKFILL and EVICT_BACKFILL expressions
evaluated to TRUE at the same time.
Security Bugs Fixed:
- Bugs in previous versions of Condor could allow any user who can
submit jobs on a machine to gain access to the ``condor'' account
(or whatever non-privileged user the Condor daemons are running as).
This bug can not be exploited remotely, only by users already logged
onto a submit machine in the Condor pool.
- The security of the ``condor_ config_val -set'' feature was
found to be insufficient, so this feature is now disabled by default.
There are new configuration settings to enable this feature in a
Please read the descriptions of ENABLE_RUNTIME_CONFIG ,
ENABLE_PERSISTENT_CONFIG and PERSISTENT_CONFIG_DIR
in the example configuration file shipped with the latest Condor
releases, or in section 3.3.5 on
- Added a new LOCAL_CONFIG_DIR configuration setting.
This now allows entire directories of files to be included as though they
were configuration files.
See 3.3.3 for more info.
- You can now put extra information into the notification
email. The information is a list of attributes, which you provide. For
example, if your submit file has ``+EmailAttributes = "RemoteHost,
Requirements"'', then RemoteHost and Requirements will be listed in
the notification email.
- Added a new clipped port of Condor to HP-UX 11 running on
the HP-PA architecture.
- Condor is now much better at recognizing when a grid-type gt2 grid
universe job failure is unrecoverable and at cleaning up failed or canceled
job submissions. This should reduce the number of jobs
that perpetually return to held state when released.
- When job attribute GlobusResubmit evaluates to true for
grid-type gt2 jobs, the
condor_ gridmanager will try to cancel the existing job before starting
the new submission. If the cancel attempt fails, the condor_ gridmanager
will proceed with the new submission anyway.
- When BIND_ALL_INTERFACES is enabled, Condor daemons
now advertise their IP address as that of the network interface used
to contact the collector. This makes it possible, for example, to
have a schedd on a multi-homed machine flock jobs to Condor pools in
two separate networks, because the schedd can advertise a different IP
address to the two collectors. condor_ cod also benefits in the case
where the startd is reachable through a network interface other than
the default one that would normally be advertised. This change also
produces improved default behavior in cases such as condor_ glidein
where the startd lands on a dual-homed machine with both public
and private IP addresses.
- In condor_ dagman, the informational messages about hitting
the -maxidle, -maxjobs, -maxpre, and -maxpost
limits are no longer printed to the dagman.out file by default.
To see these messages, add -debug 4 to the condor_ submit_dag
command line. A summary of the total number of job and script deferrals
is now printed by default each time the node status is printed and at
the end of the dagman.out file. This can be turned off by
setting the debug level to 2 or lower on the condor_ submit_dag
- Added support for a new configuration setting,
For more information, see section 3.3.10
on page .
- The number of CPUs Condor detects may now have an upper bound.
The MAX_NUM_CPUS configuration setting controls this.
- When preempting a claim, the condor_ negotiator now prints the
startd rank of the job that is being preempted and the startd rank of the
job it is causing the preemption.
- Improved the error messages from condor_ check_userlogs,
especially if it fails because it doesn't have write permission
on the log files (unfortunately, the log reading code requires write
locks to avoid collisions between multiple readers and writers).
- Improved error messages in condor_ dagman when getcwd()
fails (this is only relevant if the -UseDagDir flag is used).
- Added QUILL_MANAGE_VACUUM to determine whether Quill needs to
perform vacuuming tasks or not. In the latter case,
vacuuming tasks can be automatically managed by PostgreSQL
version 8.1 onwards. Please see Quill's section in the Administrator's
Manual for more details.
- The Grid Monitor now works at sites where
/etc/grid-security/certificates is out of date, but
$(GLOBUS_LOCATION)/share/certificates is not.
- A new authentication method, PASSWORD, has been added; it
provides mutual authentication between a client and server using a
shared secret. Password authentication currently only works on
Windows, and only for daemon-to-daemon communication.
- Condor on Windows now supports running jobs as the submitting
user. This feature requires the use of a central daemon for storing
users' passwords (the condor_ credd). See the example configuration
file condor_config.local.credd included with the Condor
distribution for more information.
- Added a -n option to condor_ store_cred to allow for
storing a password to a remote host.
- Support for DRMAA on Windows has been added.
- Kerberos support has been upgraded to use version 1.4.3 of the
Kerberos library. This adds support for Kerberos as an authentication
method on Windows.
- Added the new condor_ replication daemon which works with
condor_ had to enable replication of data for daemons configured
for high availability. In particular, condor_ replication can be
configured to replicate the accountant log so the a fail-over
condor_ negotiator can share the user priority state from the
primary condor_ negotiator.
- The condor_ collector now has the ability to receive ClassAds
via it's SOAP interface.
- The default output for condor_ status was changed between
6.7.16 and 6.7.17 to support the new Backfill state which
Condor resources can now enter (described below in more detail).
Added two new columns to the Quill database schema to support
historical job queue logs (see MAX_JOB_QUEUE_LOG_ROTATIONS
in the New Features section below). These are log_seq_num and
For a description of those two columns, check out the schema of
the JobQueuePollingInfo table in section 3.12.3 on
Databases created by versions of Quill prior to 6.7.17 must be updated to
reflect these two new columns. This can be achieved by either dropping the
database and letting Quill recreate it on the next polling cycle, or by manually
adding the two columns and initializing their values via the following sql commands:
alter table jobqueuepollinginfo add column log_seq_num bigint;
alter table jobqueuepollinginfo add column log_creation_time bigint;
update jobqueuepollinginfo set log_seq_num = 0, log_creation_time=0;
If the schema is being manually changed, it must be done so before the
condor_ quill daemon is started.
- Added support for Condor resources to perform backfill
computations when there are no Condor jobs to run.
Condor can be configured such that whenever a machine is in the
Unclaimed/Idle state and otherwise has nothing else to do, the
condor_ startd will automatically spawn backfill jobs to continue
to perform useful work.
Currently, Condor only supports using the Berkeley Open
Infrastructure for Network Computing (BOINC) to provide the backfill
jobs (see http://boinc.berkeley.edu
for more information about
See section 3.13.9 on page for
more information about running backfill jobs with Condor.
At this time, backfill jobs are not supported on windows machines.
- The history file, which is a flat file for each submitting
computer that stores information about all jobs completed on that
computer is now rotated automatically. By default, the file will be
rotated when it is more than 20MB and two backup files will be allowed
(for a total of three history files with 60MB of data). This means
that older history will be lost once it is rotated out. You can
disable the history file rotation if you like, and you can change the
number and size of the backup files. condor_ history has been updated
to understand these backup history files.
- Added parallel universe support to condor_ dagman (condor_ dagman
can now handle submit files that submit more than one Condor job proc).
- Added a -format option to the condor_ history command which
behaves just like the -format option to condor_ status and condor_ q
- Added remove and get_job_attr options to the condor_ chirp
command line tool. Changed parallel universe script to use them.
- When the Grid Monitor encounters problems, Condor no longer tries
to restart the Globus JobManagers for all of the affected grid universe
jobs. Restarting the JobManagers can easily bring down a remote headnode.
Condor will attempt to restart the Grid Monitor, but there will be
no update of job status in the mean time.
- When started as root on a Linux 32-bit x86 machine, Condor daemons will
leave core files in the log directory when they crash. Recent changes to the
Linux kernel default to blocking these core files. This change means
Condor behaves more consistently across different Unix-like operating systems.
- Made several changes to make Condor-G much less likely to overload a
pre-WS GRAM server for grid-type gt2 jobs. Added configuration parameter
GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE, which limits the number of
globus-job-manager processes Condor will let run on the server at a time.
Streaming of output for gt2 jobs is disabled if
GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE isn't set to unlimited.
If the Grid Monitor encounters problems, the condor_ gridmanager doesn't
restart the globus-job-managers of the affected jobs. Fixed a couple bugs
in the Grid Monitor that could cause it to spawn extra polling processes
on the server.
- Added support for Parallel scheduling groups for the parallel
universe. This is useful if you have machines connected by InfiniBand
switches, and want to constrain your parallel jobs to never run across
two different switches.
- Added a new suite of tools to dynamically deploy Condor. The
most important of these tools are condor_ cold_start and
condor_ cold_stop. Another significant subset of this suite are
tools to determine whether a process is alive or dead. The most
advanced of which are the uniq_pid_midwife and
uniq_pid_undertaker. Currently these programs are only
supported on Linux.
- Added MAX_JOB_QUEUE_LOG_ROTATIONS to control how
many historical job queue logs are kept when the job queue log is
rotated. These historical logs are used by Quill to avoid missing
information in Quill's job history information when the schedd rotates
to a new log. The default value for this configuration setting is 1,
so one old copy of the job queue log file will be kept.
- Added support for DRMAA on the Mac OSX platform.
- Enabled COLLECTOR_QUERY_WORKERS in the default
condor_ collector configuration, and set this value to 16. This
replaces the previous implicit default of 0 and will result in a more responsive condor_ collector in the common case.
COLLECTOR_QUERY_WORKERS has no effect on non-UNIX systems (Windows).
- HIGHPORT and LOWPORT can now specify ports
below 1024 when Condor is started as root on Unix systems. This
always worked on Windows.
- It is now possible to specify separate port ranges for
binding incoming (listen) sockets and outgoing (connect) sockets
by using IN_LOWPORT /IN_HIGHPORT and
OUT_LOWPORT /OUT_HIGHPORT . if not present, we
still fall back to the regular LOWPORT /HIGHPORT
- Port ranges from LOWPORT , HIGHPORT ,
IN_LOWPORT , IN_HIGHPORT , OUT_LOWPORT , and
OUT_HIGHPORT are now passed to Globus through the correct
- Previously, the condor_ startd would not recompute the
CurrentRank attribute each time a new job was spawned, but
only computed it whenever a new claim was made.
Now, the condor_ startd correctly recomputes CurrentRank
each time a new job starts running.
- When running a gridftp server for grid-type gt4 jobs, Condor will now
start the server so as to ignore /etc/grid-security/gridftp.conf and
$GLOBUS_LOCATION/etc/gridftp.conf. These files may contain options that
would cause the gridftp server to fail when not run as root. Also, Condor's gridftp server is started to ensure that it does not erroneously try to load libraries from an existing Globus installation, causing the gridftp server to crash.
- Fixed a bug where jobs using the grid (or globus) universe that specified
an AccountingGroup would never run because the condor_ gridmanager would fail
- Fixed a bug introduced in 6.7.14 where the job attributes RemoteUserCpu
and RemoteSysCpu were incorrectly reported as 0 in the history file and the job queue
for non-standard universe jobs.
- Fixed a physical memory reporting bug for the Mac OSX port of Condor.
- Since the addition of the ``new'' cron syntax (introduced in
version 6.7.11), the condor_ startd has (silently) ignored any jobs
defined with the ``old'' syntax if any jobs are defined with the
``new'' syntax. Now, the condor_ startd will honor both definitions,
but will log a warning to it's log file if any jobs with the ``old''
syntax are found (whether or not any new jobs are found).
The condor_ schedd (which also has the ``cron'' logic) will behave in
the same way.
- The bug which was causing the ``Cron'' job command lines to have
the name added each invocation has been fixed.
- Fixed some messages about keyboard and mouse idle time had been logged
too often in the condor_ startd logs under certain conditions to be logged
- Fixed the -dag option to condor_ q. Previously, this did not
print DAG node names as it should have. (This bug has existed since
- Fixed a bug that could cause the condor_ gridmanager to crash
if the GridJobId attribute for a gt2 job became mangled. The cause of
mangling seen by some users is still unknown.
- Submission from 6.7.15 or 6.7.16 condor_ submit to a 6.7.14 or
earlier condor_ schedd was not working unless the submit file
explicitly set both arguments and environment using the old syntax.
Now condor_ submit automatically converts the environment and
argument syntax when necessary. If the conversion is not possible,
due to limitations in the old syntax, condor_ submit will generate
an error message and refuse to complete the submission.
- condor_ submit now returns an error if the executable file
specified in the submit file exists but is zero length.
- Support for running a personal Condor on Windows using
condor_ master -f
- Support for NorduGrid jobs was accidentally left out of the
condor_ gridmanager in previous releases. This has been corrected.
- The condor_ starter was refusing to run jobs if it could not
perform a reverse-DNS lookup of the submit-host. Now that this is
fixed, when the reverse-DNS lookup fails, the job can still run, but
Condor will not be able to verify the authenticity of the
submit-host's uid domain. In this case, if you enable
TRUST_UID_DOMAIN , everything will function as normal, minus
the verification of the domain; if you do not enable
TRUST_UID_DOMAIN, the starter will treat the job as being
from a different uid domain, regardless of what uid domain the job
- Fixed a few bugs with transfer_output_remaps that caused
files to be remapped while in a temporary sandbox. Now, the remapping
occurs only when the files are returned to the job submitter.
- Fixed some minor memory leaks in the condor_ gridmanager.
- Fixed a bug in 6.7.15 that was causing startd cron jobs to fail to run
if the old-style configuration setting STARTD_CRON_JOBS was used
instead of the new-style configuration setting STARTD_CRON_JOBLIST .
- If you have not used the undocumented configuration setting
SIGNIFICANT_ATTRIBUTES , there is no need to read the rest of
this paragraph. For sites that have been using
SIGNIFICANT_ATTRIBUTES in the config file, we suggest
removing that setting, because Condor now automatically selects the
list of attributes that are used to cluster job ClassAds into distinct
ads for negotiation. In 6.7.15, any setting of
SIGNIFICANT_ATTRIBUTES will be combined with the automated
list of attributes that Condor produces. In the future, this behavior
may change (e.g. it might override the automated behavior rather than
combining with it). If you know in advance that your use of Condor
heavily depends on SIGNIFICANT_ATTRIBUTES not
including some attributes that are used in requirements
expressions (e.g. ImageSize), then you should be aware that 6.7.15
provides no way for you to suppress such attributes. In
that case, we recommend that you wait for this issue to be addressed
This should not concern most users-especially anyone who is not even
using SIGNIFICANT_ATTRIBUTES , or who has defined
SIGNIFICANT_ATTRIBUTES to include all attributes that are
used in requirements expressions (which is the normal usage case).
- Added a clipped port of Condor to YellowDog Linux 3.0 on the
- ``Cron'' jobs defined with the ``old'' configuration syntax
(usually through ``STARTD_CRON_JOBS'' or ``HAWKEYE_CRON_JOBS'' -
see the condor_ startd manual section for more details) are broken.
Using the ``new'' syntax (``STARTD_CRON_JOBLIST'') will work around
- For those platforms which support it, libcondorapi.so is now
produced and available in the lib/ directory after installing Condor.
- The negotiation protocol between the condor_ schedd and
the condor_ negotiator daemons has been improved for both scalability and
correctness. In general, most sites will see faster negotiation
cycles when many jobs are submitted after upgrading both the negotiator
and all schedd daemons to version 6.7.15. This means the scheduling overhead
per job is reduced. If you have used the undocumented macro
SIGNIFICANT_ATTRIBUTES , please read the note above in the release
notes, because this new automated behavior affects the use of that
configuration setting-in most cases making it unnecessary.
- Due to kernel bugs between the Linux 2.4.x and 2.6.x kernels,
Condor now implements "checkpointing signatures" which allow more fine
grained and automatic control over whether or not a particular machine
is willing to resume a job using a previously created checkpoint. This functionality
is homogenized across all platforms which provide the standard universe
- Grid matchmaking ads are now aged and replaced by the negotiator
based on a configurable classad expression from the condor config file. This
configuration parameter is called STARTD_AD_REEVAL_EXPR .
In previous versions, this was done strictly based on the
UpdateSequenceNumber field in the ad. The default value for the new
parameter behaves the same as the older, hard-coded algorithm.
- Condor can now dynamically start its own gridftp server to handle
file transfers for grid-type gt4 jobs. The gridftp server appears
as a job in the queue and disappears when it's no longer needed.
- Automatic renewal of job proxies from a MyProxy server now works for
all grid universe jobs. Before, it only worked for grid-type gt2 jobs.
- condor_ dagman now reports to its POST scripts uniquely
distinguishable return codes for non-exe job failures (e.g.,
condor_ dagman, batch-system, or other external errors such as failed
batch job submission, or batch job removal). In the past these errors
were reported as various signals (e.g., SIGABRT for job removal or
SIGUSR1 for failed job submission), making it impossible to
distinguish them from the real signals as which they were
masquerading. We now represent these errors using the
previously-unused return-code space below -64 (we start below -1000,
in fact). As before, 0-255 reflect normal exe return codes, and -1 to
-64 represent signals 1 to 64 - but now -1000 and below represent
DAGMan, batch-system, or other external errors.
- Added the DAGMAN_RETRY_NODE_FIRST configuration macro to
condor_ dagman to control whether failed nodes are retried before
or after other ready nodes. The default is FALSE (condor_ dagman's
previous behavior), which means that failed nodes will be retried
after other ready nodes.
- Added a new (backward compatible) syntax for job arguments and
environment, allowing special characters to be escaped in a uniform
way. The old limit of 4096 characters in the job arguments has also
been removed. See condor_ submit manual for details of the new
- Added more configuration parameters to the condor_ master's
restart / backoff mechanism. You can now configure the initial value
of the backoff time (via MASTER_BACKOFF_CONSTANT ).
Additionally, you can now set daemon specific values for all of these
parameters. See the condor_ master entry in the manual for more
- condor_ userprio now supports -setaccum -setbegin
-setlast options to set the Accumulated Usage, Begin Usage Time, and
Last Usage time of a submitter. This is in addition to the existing
-setprio and -setfactor options.
These options can be used to safely reconstruct priority information if
the only backup data available is the output from condor_ userprio -l
- An updated DRMAA version is available on supported platforms. The
previous DRMAA implementation has been removed.
- Added new per-job Stork user logs. Stork user logs are now optional, and
specified in the job submit file. Stork now uses Condor user log output
format, including optional XML format. Previous, per-server Stork user log in
LOG/Stork.user_log is now deprecated, and will be removed in a future
- condor_ dagman now supports the new, per-job Stork user logs.
"Old-style" Stork logs (specified with -Storklog on the
condor_ submit_dag command line) are supported for now, but this
support will probably be eliminated in the 6.7.16 release.
- Added new per-job Stork input, output and error output file
specifications. Stork job output is now optional, and
specified in the job submit file. Previous, per-server Stork user log in
LOG/Stork-module.stderr and LOG/Stork-module.stdout has been
- The Condor installer for Windows is now MSI compliant.
- Fixed a bug introduced in Condor 6.7.14 that caused the GT2 GAHP
server to ignore configuration parameters LOWPORT and
HIGHPORT and the GT4 GAHP to fail at startup.
- condor_ status -any now reports quill ads when quill is enabled.
- condor_ restart -peaceful was causing condor_ master to only
do a graceful shutdown, rather than a peaceful one. This means that
GRACEFUL_SHUTDOWN_TIMEOUT would come into effect if jobs running
under the startd took too long to finish. However, -peaceful restart
did work in the case where a specific subsystem (e.g. -startd) was
- When run from a privileged (root) Stork server, modules lose
LD_LIBRARY_PATH and other key environments, for security
reasons. This is not actually a Stork bug, but a feature of glibc.
When run with a dynamically linked globus-url-copy, the
contributed modules for the HTTP, FTP and GSIFTP transfer protocols
will fail. To compensate, these modules can now restore their
environment via the pre-existing STORK_ENVIRONMENT
configuration macro. Unprivileged (user level) Storks are not
affected by this behavior.
- Jobs that are are placed on held because of on_exit_hold
evaluated to TRUE or jobs that stay in the queue after finishing because
on_exit_remove evaluated to FALSE again correctly report the
expression as being a "job attribute", not "UNKNOWN (never set)".
- condor_ glidein was creating a default configuration with
UPDATE_INTERVAL =20, which causes unnecessary scaling problems in large
glidein pools. It now simply leaves this value undefined so that
the default behavior may be assumed.
- Fixed a bug that could cause the condor_ gridmanager to crash when
a grid-type condor grid universe job left the queue.
- When using job leases with the condor grid-type, a completed job will
now leave the remote condor_ schedd's queue when the lease expires.
- Fixed a bug in the fullpath() function that tests whether
a file path is a full path - paths of the form "c:/" were not
recognized as full paths, which could lead to something being prepended
to what was already a full path, thereby creating an invalid path.
- Fixed a problem with WhenToTransferOutput=ALWAYS. The
bug affected jobs that were evicted after producing one or more
intermediate files that were removed by the job before finally running
to completion in a subsequent run. Condor was treating the missing
intermediate files as an error and the job would typically keep
running and failing until the user intervened. In addition to fixing
this bug, file transfer error messages are now propagated back to the
shadow log and the user log, making it easier to debug problems
related to file-transfers.
- condor_ submit was not paying attention to
transfer_output_remaps when doing permissions checks on
- The Condor grid universe can now be used to submit jobs to
Nordugrid and Unicore resources.
- The Condor daemons now automatically restart when the
system clock jumps more than 20 minutes in either
direction. This may happen if the machine running Condor entered
a "sleep" state. This resolves a variety of minor problems.
- Added a -direct debugging option to condor_ q which, when
using or querying a quill installation, allows talking directly to the
rdbms, the quill daemon, or the schedd without performing the queue
location discovery algorithm.
- condor_ schedd provides more flexibility in how local and
scheduler universe jobs are started. The new configuration macros
START_LOCAL_UNIVERSE and START_SCHEDULER_UNIVERSE
allow administrators to control whether condor_ schedd will start
an idle local or scheduler universe job. If a job's respective universe
macro evaluates to true, condor_ schedd will then evaluate the
Requirements expression for the job. Only if both conditions are
met will a job be allowed to begin execution.
- condor_ schedd advertises how many local and scheduler
universe jobs are currently running or idle in its ClassAd. The
total number of running jobs is denoted by the
TotalLocalJobsRunning and TotalSchedulerJobsRunning
attributes. The total number of idle jobs is denoted by the
TotalLocalJobsIdle and TotalSchedulerJobsIdle.
- A job submission can now specify the exact time that it should be
executed at using the DeferralTime attribute. The time is specified
as the number seconds since the Unix epoch (00:00:00 UTC, Jan 1, 1970).
An additional attribute DeferralWindow can be specified along with
the deferral time that will allow a job to run even if it misses the
execution time. The window is the number of seconds in the past that
Condor will allow for a missed job to execute. This feature is not
supported for scheduler universe jobs.
- Added the concept of a ``controlling'' daemon to the
condor_ master. This feature is currently used only for ``High
Availability'' (HA) configurations involving the condor_ had daemon.
To properly use these Condor HA features you must set this macro.
To configure the condor_ negotiator daemon to be controlled by the
condor_ had, you should add an entry to your condor_config:
MASTER_NEGOTIATOR_CONTROLLER = HAD
This will cause the condor_ master to treat the condor_ had as the
``controller'' of the condor_ negotiator.
- Grid-type condor grid universe jobs now respect configuration
parameters GRIDMANAGER_MAX_PENDING_SUBMIT_PER_RESOURCE and
- Grid universe jobs can now determine their grid_type
in addition to which resource they will be submitted to.
A grid universe job may become any grid_type job,
depending on what resource ad it is matched with.
- Added support for a new configuration value,
This setting can be used to tell the condor_ startd to
automatically publish a new update to the condor_ collector
whenever any of the cron modules it is configured to run have
For more information, see the description of
section 3.3.10 on
- Reduced delay in negotiation when a job is released. A reschedule
request is sent to the negotiator when a job is released from hold. This
reduces the delay in several cases, most notably when using Condor-C or
"condor_submit -s". Previously the negotiator would not be notified and
would normally wait until the next scheduled negotiation cycle.
- Added three new user log events: GridResourceUp, GridResourceDown,
and GridSubmit. They are equivalent to the existing Globus-specific log
events, but are used for all grid universe jobs.
- When known, CPU-usage information will be reflected in the Terminated
user log event for grid universe jobs.
- Changed ClassAd expression evaluation so that logical and and
logical or are short-circuited. This means that an expression like
TARGET.foo && TARGET.bar will not evaluate
TARGET.foo evaluates to false. This will speed up some
expressions, particularly those involving user-defined
functions. Although this was thoroughly tested, this is the sort of
change that could have subtle, unexpected behavior, so please be on
the lookout for problems that might be caused by it.
- Added the condor_ check_userlogs command, which checks user log
files for "illegal" events.
- New settings
SYSTEM_PERIODIC_RELEASE , and
These expressions behave identically to the job expressions
periodic_remove, but are evaluated for all jobs in the
queue. If not present, they default to FALSE.
- An improved version of the DRMAA C library is available for download from
- Added CLAIM_WORKLIFE configuration option. The startd
will not allow claims older than the specified number of seconds to
run more jobs. Any existing job that is running when the worklife expires,
however, is allowed to continue to run as normal.
- The condor_ dagman log file path is converted to an absolute
path inside condor_ dagman itself, so that the logging works for
multi-directory rescue DAGs (which it didn't before), but the
.condor.sub files are still portable.
- Added the Stork log file (if any) to the list of log files that
condor_ dagman lists in the dagman.out file.
- condor_ dagman now reports the node return value for all failed
- Attributes names forced into the job ad via '+' are no longer
converted to lower-case. This conversion was a side-effect of a bug-fix
in 6.7.11 and caused problems with code that assumed that Condor would
preserve the case of attribute names.
- Job policy expressions are now evaluated on COMPLETED and REMOVED
jobs in the schedd.
- The NEGOTIATOR_MATCHLIST_CACHING setting is broken.
It should not be used.
This setting is
FALSE by default, but if set to
the condor_ negotiator will crash.
- Jobs that are are placed on held because of on_exit_hold
evaluated to TRUE or jobs that stay in the queue after finishing because
on_exit_remove evaluated to FALSE will erroneously report the reason
as "UNKNOWN (never set)".
- Added a new natively compiled clipped port for the Red Hat
Enterprise Linux 3 IA64 distribution.
- Added support complete support for Quill on Windows, so job queues can
now be accessed via a relation database. Quill is now available on all Condor
supported platforms. See page for more information.
- Added support in Condor for the Generic Connection Broker
This is a system for managing network connections across public and
More information about GCB can be found in section 3.7.3 on
- Added a new configuration option, BIND_ALL_INTERFACES
This is a boolean value that controls if Condor should bind and
listen to all the network interfaces on a multi-homed machine.
If set to TRUE, the value of NETWORK_INTERFACE will only
control what IP address is published by Condor daemons, even though
they will still be listening on all interfaces.
The default is FALSE.
- Added a -pool option to condor_ submit. It lets you submit
jobs to a condor_ schedd in a different pool. The other options to
condor_ submit now have long names, but the single-character versions
- ``grid_resource'' can now be used to directly set the new grid
universe job attribute ``GridResource.'' The old attributes still work,
but they will be ignored if ``grid_resource'' is present. As a
side-effect, ``stream_output'' and ``stream_error'' will default to
``False'' for all jobs.
- X509 user proxies are now updated for vanilla universe jobs. If
a job specifically sets x509userproxy and is using file transfer, when
the proxy file is updated, it will be transfered to the running job.
- If a cycle is detected in the DAG while
running, condor_ dagman now prints (in the dagman.out file)
the status of all DAG nodes.
- BeginTransaction call in condor_ schedd's SOAP interface now
notifies the caller if too many transactions are currently running
via an error code of FAIL. Previous behavior was to abort a running
transaction in order to allow the BeginTransaction call to succeed.
- MAX_SOAP_TRANSACTION_DURATION config option added so that a
single transaction cannot take up too many condor_ schedd
resourced. This option specifies an optional maximum duration
between SOAP calls in a single transaction.
- If a machine is acting as both a submit and an execute node, and it
cannot communicate with the central manager, it will attempt to run jobs
locally. If Condor specific terms, if the condor_ schedd fails to hear
from the central manager, it will attempt to run jobs on a locally running
condor_ startd. The SCHEDD_ASSUME_NEGOTIATOR_GONE config
macro was added to support this feature; see
page for details.
- You can now specify per-subsystem entries in your condor_config file
by prepending the subsystem name and a period to the normal name. The
per-subsystem settings take precedence over the regular settings.
- condor_ dagman now recovers automatically after being abruptly
killed by something other than Condor itself (e.g., by Unix initd
during a ``fast'' system shutdown). This is accomplished through the
use of a default OnExitRemove expression inserted by
condor_ submit_dag which instructs the condor_ schedd not to treat
death by SIGKILL as a valid exit condition for condor_ dagman.
- Added submit attribute globus_xml, for use with grid-type
gt4 jobs. The given XML text will be inserted at the end of the XML job
description written by Condor for submission to the WS-GRAM server.
- For grid-type gt4 jobs, if a URL scheme is missing from the resource
name, ``https://'' will be inserted automatically.
- Added submit attribute transfer_output_remaps.
This specifies the name (and optionally path) to use when downloading output
files from the completed job. Normally output files are transferred back
to the initial working directory with the same name they had in the execution
directory. This gives you the option to save them with a different path
- Fixed a bug concerning backslash escaping in classad attribute values
when condor_ q was using quill.
- Fixed a bug where condor_ q could not accept multiple jobids
on the command line.
- Fixed parallel universe ssh script to now clean up all
temporary files it creates.
- Fixed a bug in the dedicated scheduler that caused it to request
resources it could not use, resulting in longer job startup times.
- Fixed a bug in the condor_ schedd that caused grid-type gt2 jobs
submitted by an older condor_ submit or in the queue during an upgrade
(version 6.7.10 or earlier) to go
on hold if the grid_type was ``globus''.
- Fixed a bug in condor_ submit that caused it to not set
JobGridType in the job ad for grid universe jobs when submitting
to a condor_ schedd older than version 6.7.11.
- When using file transfer, transferring the results back to
the submit machine could silently fail for Condor releases 6.7.0
though 6.7.12. This was relatively rare through 6.7.10. For
6.7.11 and 6.7.12, the bug would be easily triggered if a vanilla
job had an X509 user proxy associated with it. This is now
- Fixed a logic bug in the condor_ schedd.
Previously, if there was an error expanding any
references in a job classad when trying to spawn a condor_ shadow,
the condor_ schedd would die with the fatal exception ``Impossible:
GetJobAd() returned NULL for X.Y but that job is already known to
Now, the condor_ schedd correctly distinguishes between a non-fatal
$$(attribute) and the fatal error of the job
already being gone (which is, in fact, impossible).
This bug was first introduced in Condor version 6.7.1.
- The reason strings generated when a user job policy expression fires
are now consistent for grid universe jobs.
- The condor_ gridmanager now evaluates the periodic job policy
expressions at the interval set by PERIODIC_EXPR_INTERVAL .
- Fixed a bug which prevented standard universe from working on a linux
kernel post 22.214.171.124.
- The condor_ schedd used to crash in certain cases if a given
job was vacated using condor_ vacate_job, then put on hold and
The bug only appeared if a specific job id was given to
condor_ vacate_job, as opposed to specifying a username or another
Now, the use of condor_ vacate_job for individual job identifiers
is safe and the condor_ schedd will not crash.
This bug has been in Condor since support for condor_ vacate_job
was first added in version 6.7.0.
- Fixed a bug that caused the condor_ gridmanager to crash if a
grid-type condor job ad contained the attribute remote_.
- Fixed a bug with the FS_REMOTE authentication mechanism that caused
it to fail occasionally when using NFS.
- Fixed a bug in which a double terminated event in a DAG node with
a POST script could cause condor_ dagman to abort the DAG and claim
that a cycle exists in the DAG.
- In the DAG status messages in dagman.out files,
condor_ dagman now shows nodes with queued PRE or POST scripts
in the Pre or Post columns. Previously, these nodes were shown
in the Un-Ready column.
- Fixed the GetFile SOAP call on the condor_ schedd so that it
behaves more like POSIX read() and does not report errors when
trying to read more data than is available.
- Fixed a hash function bug that could cause condor_ dagman
- JobCurrentStartDate and JobLastStartDate are no longer
changed in the job ad when the condor_ schedd and condor_ shadow reconnect
to a running job after a crash.
- condor_ dagman now allows POST scripts to be used with DATA
nodes in a DAG (previously this caused the DAG to hang).
- Using the new Remote_ simplified syntax no longer
generates unnecessary debug messages.
- Fixed a bug in estimating the size of attribute value buffers that
caused quill to crash. This arose when job ads had variables with very
large values (more than 3KB).
- Fixed a bug in the condor_ gridmanager that could cause it to crash
when the Rematch attribute evaluates to True.
- The default base scratch directory for WS-GRAM doesn't exist on most
server machines. Added a work-around to create the directory as part of the
- Starting in version 6.7.11, the execute host reported for grid jobs
in the user log execute event can contain spaces. The C++ user log reading
code now properly reads the entire string for these events.
- Fixed a bug that caused the condor_ gridmanager to die when it
tried to renew the job lease of a grid-type condor job.
- Fixed a bug that was causing the condor_ schedd to crash on Solaris
if the cron macros aren't defined.
- Fixed a bug where output may be lost when spooling (with the -s option to
condor_ submit or implicitly with Condor-C). This bug could only happen if
the job terminated within one second of starting.
- Fixed a bug affecting transferral of output and error files where the
file specified in the submit file contains path information. The file
was being staged back into the initial working directory and then it was
copied to the final path specified. The bug is that if there was an error
copying the file to the final location, the intermediate copy would not
be deleted and the job would still exit successfully, as though it had
succeeded. Now, no intermediate copy of the file is made, and errors
in transferring the file will be treated as a failure to run the job,
which will typically cause the job to return to idle state and run again.
- Added a couple missing parameters to the example configuration file
- Slightly cleaned up event checking error messages in condor_ dagman.
- Fixed a bug in the condor_ c-gahp that caused it to crash when
handling grid-type condor jobs with job leases.
- Starting in 6.7.11, the ``JM-Contact'' field of the ``Job submitted
to Globus'' user log event was mis-printed. This has been corrected.
- Fixed bug that prevented Stork detection of hung jobs.
- Fixed an obscure bug that incorrectly quoted the status of completed
jobs, visible via stork_ status.
- The condor_ ckpt_server is broken in version 6.7.13.
Please do not attempt to use it.
It is safe to use the 6.7.12 condor_ ckpt_server in a pool running
6.7.13 until the 6.7.14 release is out.
Of course, the 6.7.12 condor_ ckpt_server will not work with GCB, so
sites wishing to use both GCB and a condor_ ckpt_server will have to
wait for 6.7.14.
- Rescue DAGs generated from DAGs run with the
-UseDagDir command-line flag no longer work.
(The original run with -UseDagDir should work,
but if it fails and generates a rescue DAG, the
rescue DAG will always fail.)
- 6.7.12 addresses several critical bugs in 6.7.11. 6.7.11 should
not be used.
- Fixed a serious bug introduced in 6.7.11 which prevented condor_ dagman from successfully removing its own jobs from the Condor
queue after receiving a condor_ rm request from the condor_ schedd.
- Fixed a serious bug introduced in 6.7.11 where the condor_ master
on Windows would not properly shut down.
- Condor is now linked against GSI from Globus 4.0.1.
- GSI security and the grid universe should now work in the Alpha
- All Condor release packages are now compressed with GNU's
We no longer ship releases compressed with the vendor's
- Added a new feature called Quill to Condor which allows an
SQL server to mirror the job queue in order to speed up queries about
the job queue via condor_ q and condor_ history. Please see
page for the description of this feature.
- condor_ dagman has a new -maxidle command-line argument
that can be used to throttle DAG job submissions according to the number
of idle jobs in the DAG.
- stork_ submit is now able to search for X.509 credentials in the
- The condor_ negotiator can now limit how long it negotiates with a
single submitter before moving on to the next one.
- On platforms and filesystems that support files larger than 2
GB, the history file can now be larger than 2 GB.
- Added two options to condor_ q: -jobads and
-machineads. They will take ads from files instead of the
schedd and collector, respectively. These options are mostly useful
- Added a new, hopefully less confusing, Cron (Hawkeye)
configuration syntax. The old syntax is still supported, but should
be considered deprecated, and will eventually go away. The new syntax
splits the old colon separated ``name:prefix:executable:period''
string into separate macros.
- Improved support for job leases. ``job_lease_duration'' now works for
grid-type condor jobs. New job ad attribute ``TimerRemove'' specifies a
specific time at which a job should be removed. These attributes will be
passed through multiple layers of grid-type condor jobs.
- Grid universe jobs now use a unified pair of attributes
(``GridResource'' and ``GridJobId'') to identify the remote resource. This
will make it possible to match jobs to multiple types of resources. The
submit file syntax remains the same for now, except that ``remote_pool'' is
now required for grid-type condor jobs.
- Significantly improved response time for condor_ q when job classads
are larger than 4 kbytes (by disabling TCP Nagle algorithm as appropriate).
- Fixed bug in the dedicated scheduler where if the condor_ startd
rejected a match, the condor_ schedd would never retry new matches
for that machine. This would result in MPI and parallel jobs sticking
in the Idle state, and the message "DedicatedScheduler::negotiate sent
match for machine, but we've already got it".
- Fixed problem with the parallel universe to allow for LAM jobs
to get SIGTERM on exit so they can exit cleanly.
- Fixed a bug that was visible to the end user as file transfer
failures on a busy system.
The root problem was that if the condor_ negotiator gave out the
same match twice (due to having stale info in the condor_ collector
when trying to negotiate), the condor_ schedd would be confused,
attempt to re-use the match, fail to do so, and then kill the
previous (legitimate) use of the match.
This bug was introduced in version 6.7.4.
- Fixed bug in the parallel universe that caused the schedd to
crash when reconnecting to jobs that couldn't be reconnected to.
- Fixed bug in parallel shadow which caused Shadow Exceptions
in parallel jobs when the components exited in the wrong order.
- Fixed a bug in condor_ dagman that caused it to fail on Windows
for DAGs with nodes having absolute paths to their log files. (This bug
was introduced in version 6.7.10.)
- Fixed a bug whereby condor_ dagman could crash after executing
the POST script of a node whose Condor job had never been successfully
submitted due to repeated condor_ submit failures. (This bug was
introduced in 6.7.7 or earlier.)
- Fixed a bug in a debug message. If an error occurred during file
transfer, Condor would print the wrong expected filesize in the error message
on some platforms.
- Fixed bug where stork_ submit was corrupting log notes passed from the
command line. This bug also had the effect of disabling Stork jobs running
from DAGMan versions v6.7.10, and later.
- If you have DATA nodes in your DAG but no Stork log specified
(with the -Storklog argument),
condor_ dagman now fails with an explanatory message when parsing
the DAG file(s). (Previously, it would just wait forever for the Stork
jobs to finish, because it wouldn't see the relevant events.)
- In condor_ dagman, argument quoting for stork_ submit now matches
argument quoting for condor_ submit.
- Corrected how condor_ submit handles attributes forced into the
job ad with '+'. Now, the attribute names are case-insensitive, they
are not treated as normal submit attributes, and they always over-ride
normal submit attributes.
- Fixed bugs that would cause a segfault when reading a classad from
a file. Triggered by consecutive blank lines and lines containing only
- Fixed a bug that could cause duplicated output when a gt4 grid job
is executed more than once.
- Fixed a bug that could cause the condor_ gridmanager to assert if
it tried to delegate credentials for gt4 grid jobs before the gahp server
- Fixed a race condition that could cause condor grid-type jobs to be
held with hold reason ``Spooling input data files''.
- condor_ glidein now correctly handles extracting necessary
information from modern Condor configurations where
NEGOTIATOR_HOST is not defined.
- Refinements in how grid universe components track jobs. Grid universe
jobs are less likely to generate multiple terminate events in the job's user
log. There will also be slight performance improvements are redundant work
is no longer done.
- Fixed a bug in condor_ dagman that caused it to core dump on
a 'job reconnected' event from a node job.
- condor_ submit will now exit zero as long as the submission succeeds. Debugging output will still be printed if the internal reschedule fails.
- On Windows, exited child processes of the Condor services
will be handled in order of termination. This fixes the problem where jobs
submitted from a Windows machine appear to run much longer than normal
because the condor_ schedd fails to notice that a condor_ shadow exits
when the system is very busy.
- Fixed a bug that caused scheduler universe jobs to often wait five
minutes (or whatever SCHEDD_INTERVAL is set to) before running.
- Fixed a bug that prevented the condor_ starter from running on a
Win32 machine with a FAT32 filesystem.
- A reschedule command will now be sent to the condor_ schedd whenever
a job is released from held state. This should make grid-type condor jobs
start much faster.
- Config parameters GAHP and GAHP_ARGS have been
deprecated. GT2_GAHP should be used instead.
- condor_ configure no longer creates a
directory, which was begun in version 6.7.10. This directory was of limited
value for most users.
- This release contains all of the bug fixes and improvements from
the 6.6 stable series up to and including version 6.6.10.
- The Mac OS X binaries shipped with this release were built on OS
10.3. Previous versions of Condor for OS X were built with version
10.2. Condor is officially dropping support for Mac OS 10.2 with
this release (though it is possible the 10.3 binaries still work, we
have not verified it either way). These binaries are known to work
with Mac OS 10.4 (``Tiger''), as well.
- There is a minor bug in version 6.7.10's condor_ configure
It will create a directory called ViewHist in the local
directory (next to log, spool, etc).
This directory is not used by Condor at all, except in the case of a
condor_ view collector (which is optional, and not enabled by
This behavior will be removed in version 6.7.11, and
condor_ configure will go back to not creating the ViewHist
- condor_ dagman can now run multiple DAGs in separate directories.
- Added DAGMAN_CONDOR_SUBMIT_EXE ,
DAGMAN_STORK_SUBMIT_EXE , DAGMAN_CONDOR_RM_EXE ,
and DAGMAN_STORK_RM_EXE configuration settings to specify
the condor_ submit, stork_ submit, condor_ rm, and stork_ rm
executables used by condor_ dagman. If unset (which they are by
default), condor_ dagman looks for each in the PATH.
- For Condor-C jobs, the condor_ gridmanager will retry and delay
failed connections to a remote condor_ schedd like it does for
Condor-G jobs. The same configuration settings apply
- remote_initialdir is now supported in all universes except
for standard universe. Previously, it was only supported in the grid universe.
- +Remote_ syntax for Condor-C jobs has been
simplified for the specific commands of
universe, remote_schedd, remote_pool, globus_rsl, and globus_scheduler.
- Added default user priority factors for accounting groups. More on
accounting groups will be available in future versions of the manual.
- The condor_ startd can now be configured to write out the
ClaimId of the next available claim for each virtual machine
to separate files.
This functionality will enable enhanced fault tolerance in future
versions of Condor.
For more information, see section 3.3.10 for
details on STARTD_SHOULD_WRITE_CLAIM_ID_FILE and
STARTD_CLAIM_ID_FILE , the two configuration settings that
control this behavior.
- Fixed bugs on the Win32 platform in the condor_ schedd that could
cause jobs to never complete when the condor_ schedd is busy with many jobs
running at once.
- Fixed a bug on Windows where if lots of jobs submitted were from
the same condor_ schedd, some of the condor_ shadow processes
would block for an extremely long time trying to get a lock for
writing to the ShadowLog file.
Now, log writing happens more fairly, and no condor_ shadow
processes can be delayed indefinitely.
- condor_ submit -name formerly had no effect on Windows and
did not work properly. This is now fixed.
- Significantly sped up the removal of large groups of jobs by
changing the default value of JOB_IS_FINISHED_INTERVAL from 1
to 0 (see section 3.3.11 for details on this
- Improved performance of the condor_ schedd when not running as
In version 6.7.7, the new code to support the scheduler universe
with Condor-C involved adding some additional overhead to the
However, this overhead is not needed unless the condor_ schedd is
running as root.
In version 6.7.10, the condor_ schedd notices if it is not root and
does an optimization to avoid the overhead.
- Fixed a bug that caused the gridmanager to crash if a gt2, gt3, or
gt4 grid job had a proxy that couldn't be read properly. Now the job gets
put on hold.
- The Condor-C GAHP now performs file staging in a
separate process, allowing remote grid jobs to be started earlier.
- When contacting the embedded web server on Condor daemons,
authentication is no longer requested.
The previous authentication requirement didn't provide any
additional security, and could confuse users.
- Fixed rare bug that could cause condor_ submit to crash when
both getenv=true and environment=... were in a submit file and when
very large variable names were in the environment.
- Fixed a rare bug where the condor_ schedd would die with a
fatal exception under extremely heavy load on the machine.
The error message was:
ERROR ``Impossible: Create_Thread child_errno (xxx) is not
ERRNO_PID_COLLISION!'' at line 6181 in file daemon_core.C
- Fixed a rare bug where certain attributes in a job description
file could cause the condor_ schedd to crash when restarting and
parsing the job_queue.log file.
- Improved performance of standard universe jobs when
WantRemoteIO is set to false in the job ClassAd.
In this case, Condor's checkpointing libraries now avoid some
additional communication with the condor_ shadow which are not
required if there's no remote IO.
- Fixed some messages in the Condor log files that were improperly
formatted, or contained incomplete information.
- Improved some user-log-reading error messages in condor_ dagman.
- Removed support for deprecated -NoPostFail option from
condor_ dagman. (The same functionality can be achieved through the
use of a simple POST script.)
- Fixed bug in dedicated scheduler, where under heavy load, the
schedd would occasionally try to start the same job twice, and
subsequently exit with the message:
ERROR ``Trying to run job x.x, but already marked RUNNING!''
- Fixed bug in dedicated scheduler, so that it now creates a
spool directory for each condor proc of a parallel or MPI job
with multiple requirements.
- On Windows only, condor_ dagman fails for DAGs with
nodes having absolute log file paths in their submit files.
- condor_ dagman does not correctly handle the case where all
submit attempts for a node job fail, and the node has a POST script.
If this happens for a single node in a DAG, it is usually okay,
but if it happens for a second node, condor_ dagman will crash.
- The Condor-C GAHP now performs file staging in a
separate process, allowing remote grid jobs to be started earlier.
- Using the new Remote_ syntax simplification causes
condor_ submit to display debug messages to standard output, possibly
confusing programs that parse condor_ submit's output. Fixed in 6.7.13.
- This release contains all of the bug fixes and improvements from
the 6.6 stable series up to and including version 6.6.10.
- The Parallel Universe has been added.
For more information, see section 2.10 on
The environment variable X509_USER_PROXY is set to the
full path of the proxy if a proxy is associated with the job.
This is usually done using x509userproxy in the submit file.
This currently works in the local, java, and vanilla universes.
- condor_ submit generates more precise error messages in
some failure cases.
- condor_ hold, condor_ release and condor_ rm now allow the user
to change the HoldReason, ReleaseReason or RemoveReason with the -reason
- condor_ dagman no longer does a one-second sleep before each
submit if all node jobs have the same log file. (The sleep is still
needed if there are multiple log files, for unambiguous ordering of
events during bootstrapping.) Note that if DAGMAN_SUBMIT_DELAY
is specified, the specified delay takes effect whether or not all
jobs have the same log file.
- Many crashes related to running the Dedicated Scheduler have
- Setting COLLECTOR_HOST or NEGOTIATOR_HOST with a port but without
a hostname no longer causes the condor_ master to crash.
- The Condor-G Grid Monitor now works with Globus 4.0 pre-Web Services
- Several deadlocks in the Condor-C GAHP server have been fixed.
- This release contains all of the bug fixes and improvements from
the 6.6 stable series up to and including version 6.6.9.
- Controlling whether or not a standard universe job asks the
condor_ shadow about how/where to open every single file can be better
controlled with the want_remote_io attribute in the submit
This attribute can be set to true or false and it is true be default.
If set to false, then this attribute forces a standard universe job in
Condor to always look to the local file system when opening files and not
to contact the shadow.
This increases performance of user jobs where the jobs open a very large
amount of files in a small space of time.
However, the user jobs must be matched to machines that have the same
UID_DOMAIN and FILESYSTEM_DOMAIN, as per vanilla universe jobs with a
homogeneous file system.
- condor_ dagman now has the capability to run more than one
independent DAG in a single condor_ dagman process.
- User policy expressions (on_exit_remove and on_exit_hold)
now work for scheduler universe jobs.
- TotalCpus and TotalMemory are now set in machine ads.
- condor_ dagman now tolerates the "two terminated events for
a single job" bug by default. There is a new bit in
DAGMAN_ALLOW_EVENTS to control whether this bug is considered
a fatal error in a condor_ dagman run.
- Added a new debug formatting flag, D_ PID, that prints out
the process id (PID) of the process writing a given entry to a log
This is useful in Condor daemons (such as the condor_ schedd) where
the daemon can fork() multiple processes to perform various tasks
and it is helpful to see what log messages are coming from forked
process versus the main thread of execution.
The default SCHEDD_DEBUG in the sample configuration files
shipped with Condor now includes this flag.
- When condor_ dagman writes rescue files, each node is now
specified with the same number of retries as was specified in the
original DAG, rather than with only the ``remaining'' number of
retries based on the failed run. The latter behavior can be restored
by setting DAGMAN_RESET_RETRIES_UPON_RESCUE to false.
- Added ``Hawkeye'' capabilities to condor_ schedd. It's
configured identically to that of condor_ startd, but using
``SCHEDD'' in place of ``STARTD'', in particular for the
- Fixed a bug in condor_ dagman that prevented POST scripts
from being used with jobs that write XML-format logs.
- The event-checking code used by condor_ dagman now defaults
to allowing an execute event before the submit event for the same
job; if this happens, there will be a warning, but the DAG will
continue. See section 3.3.22 for more info.
- condor_ userprio option -pool was failing with ``Can't
find address for negotiator'' since version 6.7.5.
- Fixed a bug the prevented SOAP clients from being able to access
a job's spooled data files if the condor_ schedd restarted.
- Fixed a bug that caused the condor_ gridmanager to panic when
trying to retire a job from the queue that was already gone. This
could cause multiple terminate events to be logged for some jobs.
- Fixed a bug that caused match-making to not work for Condor-C
- Added workaround for a Globus bug that can cause re-execution of
a completed GT2 job in the correct failure case (Globus bugzilla ticket
- Properly extend the lifetime of GT4 jobs and credentials on the
- This release contains all of the bug fixes and improvements from
the 6.6 stable series up to and including version 6.6.9.
- The STARTD_EXPRS list can now be on a per-VM basis, and
entries on the list can also be specific to a VM.
See 3.13.7 for more details.
- The LOCAL_CONFIG_FILE can now be overridden.
This now allows files to include other local config files.
See 3.3.3 for more info.
- Resources that are claimed but suspended can now optionally
not be charged for at the accountant.
When the resource is unsuspended, the accountant will resume charging
This is controlled by the NEGOTIATOR_DISCOUNT_SUSPENDED_RESOURCES
config file entry, and it defaults to false.
- The DAGManJobID attribute which condor_ dagman inserts
into the classad of every job it submits now contains only its cluster
ID (instead of a cluster.proc ID pair), so that it may be referenced
as an integer in DAG job submit files. This allows, for example, a
user to automatically set the relative local queue priority of jobs
based on the condor_ dagman job that submitted them, so that jobs
submitted by ``older'' DAGs will start before jobs submitted by
``newer'' DAGs (assuming they are otherwise identical).
- GSI authentication can now be used when Condor-C jobs are submitted
from one condor_ schedd to another.
- File permissions are now preserved when a job's data files are
transferred between unix machines. File transfers that involve a windows
machine or older version of Condor remain as before.
- Condor-C now supports the scheduler remote universe.
- condor_ advertise now publishes a ``MyAddress'' if none is provided
in the source ClassAd. This will prevent the collector from throwing out
ads with no address (see Bugs Fixed).
- Added a new condor_ dagman parameter DAGMAN_ALLOW_EVENTS
controlling which ``bad'' events are not considered fatal errors;
the -NoEventChecks command-line argument is deprecated and has no effect.
- condor_ fetchlog now takes an optional log file extension in order to
select logs such as ``StarterLog.vm2''.
- Fixed a throughput performance bottle neck when standard universe
jobs vacate when the user has specified WantCheckpoint equal to
False in the submit file.
- Added initial support for the getdents(),
getdents64(), glob(), and the family of functions
opendir(), readdir(), closedir() for the
It is recommended that you do not directly invoke getdents()
or getdents64(), but instead use the other POSIX functions
There are two caveats: these calls will not work in heterogeneous
contexts, and you may not call getdents() directly when
condor_ compileing a 32-bit program while specifying the 64-bit
interfaces for the Unix API.
- In versions 6.7.4 through 6.7.6, Computing On Demand (COD)
support was broken due to a bug in how Condor daemons parsed their
command line arguments.
The bug was introduced with the changes to provide a web services
(SOAP) interface to Condor.
This bug has been fixed and COD support is now working again.
- In version 6.7.6, the DAGParentNodeNames attribute
which condor_ dagman adds to all DAG job classads could grow too long
and cause job submission to fail. Now, if the
DAGParentNodeNames value would be too long to add to the job
classad, the attribute is instead left undefined and a warning is
emitted in the DAGMan debugging log. This behavior means that such a
node can be reliably distinguished from a node with no parents, as the
latter will have a DAGParentNodeNames attribute defined but
- In version 6.7.3, the value of the X509UserProxySubject job attribute
was changed in such a way that Condor-G jobs submitted by a newer
condor_ submit to an older condor_ schedd could fail to run. Now,
condor_ submit reverts to the old behavior when talking to an old
- Bug-fixes and improvements to grid_type gt4:
- Condor will now delegate a single proxy to the GT4 server for
multiple. If the local proxy is refreshed, Condor will forward the
refreshed copy to the server.
- Exit codes are now recorded properly.
- JAVA_EXTRA_ARGUMENTS now used when invoking the GT4 GAHP
server (which is written in java).
- If LOWPORT and HIGHPORT are set in the config file,
the GT4 GAHP server will now obey the port restriction.
- Fixed a bug that caused Condor not to notice when some GT4 jobs
- Fixed a bug in handling the job's environment for GT4 jobs. Condor
<name>=<value> for each variable's name.
- Improved hold reason in certain cases when a GT4 job goes on hold.
- condor_ q -globus now works properly for GT4 jobs. Also, the resource
name in the user log execute event is printed properly for GT4 jobs.
- Fixed a bug that could cause Condor to not detect when a GT4 job
completes. This was triggered by Condor not properly recognizing the
StageOut Globus job state.
- Fixed a bug that can cause the condor_ gridmanager to abort if
PeriodicRelease evaluates to true while it's putting a job on hold.
- Fixed a bug in condor_ dagman that
caused the DAG to be aborted if a job generated an executable error
- Fixed a bug in condor_ dagman on Windows that would cause it to
hang or crash on exit.
- MPI universe jobs now honor the JOB_START_DELAY
- The condor_ collector now throws out startd, schedd, and License
ClassAds that don't have a valid IP address (used in it's hashing). The
collector now correctly will fall back to ``MyAddress'' if it's provided.
- Fixed a bug in condor_ dagman that could cause condor_ dagman
to fail an assertion if PRE or POST scripts are throttled with the
-maxpre or -maxpost condor_ submit_dag command line flags.
- Version 6.7.6 contains all the bug fixes and improvements from
the 6.6 stable series up to and including version 6.6.9.
- Added support for libc's (()system) function for standard
universe executables. This call is not checkpoint-safe in that
the standard universe job could call it twice or more times
in the event of a resumption from an earlier checkpoint. The
invocation of this call by the shadow on behalf of the user
job is controlled by a configuration file parameter called
SHADOW_ALLOW_UNSAFE_REMOTE_EXEC and is off by default.
The full environment of the user job is preserved during the
invocation of (()system) and this might cause problems in
heterogeneous submission contexts of the user is not careful.
- Added support for a web services (SOAP) interface to Condor.
For more information, see and section 4.4.1 on
NOTE: Due to a bug in gSOAP, the SOAP support in Condor 6.7.6 does
not work with all SOAP toolkits.
Some of the responses that gSOAP generates contain unqualified tags.
Therefore, SOAP toolkits that are strict (such as gSOAP or .Net)
will not accept these poorly formed responses.
SOAP toolkits that are more lax in the responses they accept (such
as Axis, SOAP::Lite, or ZSI) will work with version 6.7.6.
This problem has already been fixed and the solution will be
released in Condor version 6.7.7.
- Added support for the GT4 grid_type in Condor's grid universe.
This new grid type supports jobs submitted to grid resources
controlled by Globus Toolkit version 4 (GT4).
New configuration settings are required to support jobs
submitted for the GT4 grid type.
These settings have been added to the default configuration files
shipped with Condor, but sites that are upgrading an existing
installation and choosing to keep their old configuration files must
add these settings to allow GT4 jobs to work:
## The location of the wrapper for invoking GT4 GAHP server
GT4_GAHP = $(SBIN)/gt4_gahp
## The location of GT4 files. This should normally be lib/gt4
GT4_LOCATION = $(LIB)/gt4
## gt4-gahp requires gridftp server. This should be the address of gridftp
## server to use
GRIDFTP_URL_BASE = gsiftp://$(FULL_HOSTNAME)
- Condor version 6.7.6 includes the Stork data movement system,
the Condor Credential Daemon (condor_ credd), and support for using
MyProxy for credential management.
However, currently these are only supported in our release for Linux
using the 2.4 kernel with glibc version 2.3 (RedHat 9, etc).
All of these features require changes to the Condor configuration
files to function properly.
The default configuration files shipped with Condor already include
all the new settings, but sites upgrading an existing installation
must add these new settings to their Condor configuration.
For a list of settings and more information, see
section 3.3.28 on
page for Stork,
section 3.3.19 on
page for condor_ credd,
and section 3.3.26 on
page for MyProxy.
For more information about MyProxy, you can also see
- Added preliminary support for the High Availability Daemon (HAD).
- Added a new SCHED_UNIV_RENICE_INCREMENT
configuration variable used by the condor_ schedd for scheduler
universe jobs, analogous to the existing
JOB_RENICE_INCREMENT variable used by the condor_ startd
for other job universes. The SCHED_UNIV_RENICE_INCREMENT
variable is undefined by default, and when undefined, defaults to 0
- The relative priority of a user's own jobs in the local
condor_ schedd queue is no longer limited to the range -20 to +20,
but can be any integer value.
- DAGMan Improvements:
- condor_ dagman now inserts a DAGParentNodeNames
attribute into classad of all Condor jobs it submits, containing the
names of the job's parents in the DAG. The list is in the form of a
- Added the condor_ dagman arguments -noeventchecks and
-allowlogerror to condor_ submit_dag.
- condor_ glidein Improvements:
- Added condor_ glidein options for setting up GSI authentication.
- Added condor_ glidein option -run_here for direct
execution of Glidein, instead of submitting it for remote execution.
You may also save a script for doing this and then run the script
through whatever mechanism you want (like some batch system
interface not supported by Condor-G).
- Added support for the NEGOTIATOR_CYCLE_DELAY
configuration setting, which is only intended for expert
For more information, see section 3.3.18
on page .
- Previous versions of the condor_ master had a bug where if the
administrator attempted to use <SUBSYS>_ARGS to pass -p
to any Condor daemon to have it listen on a specific, fixed port,
the underlying daemon would not honor the flag.
Now, the condor_ master correctly supports using
<SUBSYS>_ARGS to define a port using -p.
For more information about <SUBSYS>_ARGS, see
section 3.3.9 on page .
- Removed case-sensitivity of command-line argument names in
- Fixed the -r (remote schedd) option in condor_ submit_dag.
- Condor versions 6.7.1 through 6.7.5 exhibit a bug in
which the commands condor_ off, condor_ restart, and
condor_ vacate did not handle the -pool command-line option
The bug caused these commands to correctly query the central manager
of the remote pool,
and to incorrectly send the command to the central manager machine.
This bug has now been fixed, and these tools no longer send
the command to the central manager machine.
- Added DAG aborting feature - a DAG can be configured to
abort immediately if a node exits with a given exit value.
- The dedicated scheduler can now preempt running MPI jobs from
appropriately configured machines. See
3.13.8 for details.
- The MPI universe now supports submit files with multiple procs (queue
commands), each with distinct requirements. This is useful for placing
the head node of an MPI job on a specific machine, and the rest of the
nodes elsewhere. See 2.10.5 for details.
- The condor_ negotiator now publishes its own ClassAd to the
condor_ collector which includes the IP address and port where it
This negotiator ClassAd can be viewed using the new
-negotiator option with condor_ status.
In addition to removing an unnecessary fixed port for the
condor_ negotiator, this change corrects some problems with
commands that attempted to communicate directly with the
These bugs were first listed in the Known Bugs section of the 6.6.0
To enable this feature and have the condor_ negotiator listen on a
dynamic port, you must comment out the NEGOTIATOR_HOST
setting in your configuration file.
The new example configuration files shipped with version 6.7.4 and
later will already have this setting undefined.
However, if you upgrade your binaries and retain an older copy of
your configuration files, you should consider commenting out
To disable this feature and have the condor_ negotiator still
listen on a well-known port, you can uncomment the
NEGOTIATOR_HOST setting in the default configuration.
NEGOTIATOR_HOST = $(CONDOR_HOST)
Pools that are comprised of older versions of Condor and a 6.7.4 or
later central manager machine should either continue to use their
old condor_config file (which will still have
NEGOTIATOR_HOST defined) or they should re-define the
NEGOTIATOR_HOST setting in the new example configuration
files which are used during the installation process.
- Added optional DAGMAN_RETRY_SUBMIT_FIRST configuration
parameter that tells condor_ dagman whether to immediately retry
the submit if a node submit fails, or to put that job at the end of
the ready jobs queue. The default is TRUE, which retries the failed
submit before trying to submit any other jobs.
- The schedd now uses non-blocking connection attempts when contacting
startds. This prevents the long (typically 40 second) hang of all schedd
operations when the connection attempt does not complete, due to
- Fixed a performance problem with the standard universe when
gettimeofday() is called in a very tight loop by the application.
- Fixed the default value of OPSYS in the MacOSX version
Once again, Condor reports
OSX for all versions of MacOSX.
This bug was introduced in version 6.7.3 of Condor.
- Fixed a bug in condor_ dagman that caused it to be killed if
the DAGMAN_MAX_SUBMIT_ATTEMPTS parameter was set to too
high a value.
- Fixed a bug in condor_ gridmanager that caused it to crash if
the grid_monitor was activated.
- Fixed support for the getdents64() system call inside the
standard universe on Linux and Solaris.
- Fixed a bug in condor_ dagman that dealt
incorrectly with the problem of Condor sometimes writing both a
terminated and an aborted event for the same job. The spurious
aborted event is now ignored.
- This release contains all the bug fixes from the 6.6 stable
series up to and including version 6.6.7, and some of the fixes that
will be included in version 6.6.8.
The bug fixes in version 6.6.8 that were not included in version
6.7.3 are listed in a separate section of the 6.6.8 version
- Added Full Ports of Condor to Redhat Fedora Core 1, 2 and 3 on
the 32-bit x86 architecture.
Please read the Linux platform specific
section 6.1.6 in this manual for more information
on caveats with this port.
- Added a feature to condor_ dagman that will allow VARS names to include
numerics and underscores.
- Added optional COLLECTOR_HOST_FOR_NEGOTIATOR configuration parameter to indicate which condor_ collector the condor_ negotiator on this (local) host should query first. This is designed to improve negotiation performance.
- Added a new condor_ dagman capability to allow the DAG to continue
if it encounters a double run of the same node job (set the
DAGMAN_IGNORE_DUPLICATE_JOB_EXECUTION parameter to true to do this).
- Added Condor-C: the "condor" grid_type. Condor-C allows jobs to be handed from one condor_ schedd to another condor_ schedd.
- Added setup_here option to condor_ glidein for cases where
direct installation is desired instead of submitting a setup job to the
remote gatekeeper. (For example, this is useful when doing an installation
- If RemoteOwner is exported via STARTER_VM_EXPRS into the
ad of other virtual machines, the condor_ negotiator automatically inserts
RemoteUserPrio into the ad as well, so policy expressions can now take
into account the priority of jobs running on other virtual machines on the
- Linux 2.6 kernels do not update the access time for console devices,
so Condor was unable to detect if there has been activity at the keyboard
or mouse. As a work-around, Condor now polls /proc/interrupts to detect
if the keyboard has requested attention. This does not work for USB keyboards
or pseudo TTYs, so ConsoleIdle on 2.6 kernels will be wrong for some
devices. Future versions of Condor or Linux may correct this.
- condor_ dagman no longer removes the X509_USER_PROXY environment
This should allow users to set the environment variable before invoking
condor_ submit_dag and have the jobs submitted by condor_ dagman correctly
find the proxy file.
- Fixed a condor_ dagman bug that could cause it to leave jobs running
when aborting a DAG.
- Fixed a condor_ dagman bug which, if its debug level was set to
zero (silent), could cause it to to improperly recognize persistent
condor_ submit failures.
- Fixed a bug in Condor's file transfer mechanism that showed up
when users tried to use streaming output for either STDOUT or
There were situations where Condor would attempt to transfer back
the STDOUT or STDERR file from the execution host, even though these
files didn't exist and all the data was already streamed back to the
Now, if either stream_output or stream_error are set
to true in the job submit description file, Condor will transfer any
other output but will not attempt to transfer back STDOUT or STDERR.
- The Condor user log library (libcondorapi) now correctly handles
execute events that lack a hostname.
- Unfortunately, the default OPSYS value for the MacOSX
version of Condor was incorrectly changed in version 6.7.3.
Condor used to always report
OSX, but in version 6.7.3 it
will report either
This is wrong, since Condor jobs submitted to any version of OSX
should be able to run on any other version of OSX, and the above
change needlessly partitions resources and complicates things for
Therefore, anyone running version 6.7.3 on MacOSX is encouraged to
add the following line to their global condor_config file:
OPSYS = OSX
If your pool is already running the new release, you can cause the
above change to take effect by running the following command on your
pool's central manager machine (or any machine listed in the
HOSTALLOW_ADMINISTRATOR list) after you have changed the
OPSYS value in your configuration:
However, if you have already submitted jobs to your pool with the
old OPSYS value, the Requirements expression in
those jobs will still refer to the incorrect value.
In this case, you should either a) wait for the jobs to complete
before making the above change, b) remove the jobs and resubmit
them after you've made the change, or c) manually run condor_ qedit
on the jobs to change their Requirements expressions.
- When running in recovery mode on a DAG that has PRE scripts,
condor_ dagman may attempt more than the specified number of retries
of a node (counting retries attempted during the first run of the
DAG). This is because if a node fails because of the PRE script
failing, that fact is not recorded in the log, so that retry is missed
in recovery mode.
- Condor Version 6.7.2 includes some bug fixes from Version 6.6.7,
but none from Version 6.6.8.
- MPI users who are upgrading from previous versions of Condor
to version 6.7.2 will need to modify the
MPI_CONDOR_RSH_PATH configuration macro of their dedicated
resource to be $(LIBEXEC) instead of $(SBIN).
Users who are installing Condor version 6.7.2
for the first time will not need to make any changes.
- Added an INCLUDE configuration file variable
to define the location of header files shipped with Condor
that are currently needed to be included when compiling
When INCLUDE is defined,
condor_ config_val can be used to list header files.
- A Condor pool can now support multiple Collectors. This should
improve stability due to automatic failover. All daemons will now
send updates to ALL of the specified collectors. All daemons/tools
will query the Collectors in sequence, until an appropriate
response is received. Thus if one (or more) of the Collectors are
down, the pool will continue to function normally, as long as
there is at least one functioning Collector.
You can specify multiple (comma-separated) collector host (and port)
addresses in the COLLECTOR_HOST entry in the configuration
file. A given condor_ master can only run one Collector.
- When the condor_ master is started with the -r option to
indicate that it should quite after a period of time, the
condor_ startd will now indicate how much time is remaining before it
exits. It does this by advertising TimeToLive in the machine
- Added new macro JOB_START_COUNT that works in
conjunction with existing macro JOB_START_DELAY to
throttle job starts.
Together, this macro pair provides greater flexibility
tuning job start rate given available condor_ schedd performance.
- Added a LIBEXEC directory to the install process.
Support commands that
the Condor system needs will be added to this directory in future releases.
This directory should not be added to a user or system-wide path.
- Added the ability to decide for each file that condor transfers whether
it should be encrypted or not, using encrypt_input_files,
dont_encrypt_input_files, encrypt output files, and
dont_encrypt_output_files in the job's submit file.
- Added DISABLE_AUTHENTICATION_IP_CHECK which will work around problems
on dual-homed machines where the IP address is reported incorrectly to condor.
This is particularly a problem when using Kerberos on multi-homed machines.
- Fixed a bug on Linux systems caused by both
Condor and the Linux distribution having a library file
The problem caused the link step to fail on Condor API
The evaluation order to determine the location of library
files caused use of the wrong file, given the duplicate naming.
The bug is fixed by renaming the Condor library files.
- When the condor_ startd is evaluating the state of each virtual
machine (VM), it now refreshes any ClassAd attributes which are
shared from other virtual machines (using STARTD_VM_EXPRS)
before it tries to evaluate.
This way, if a given VM changes its state, all other VMs will
immediately see this state change.
- Fixed a bug where you couldn't transfer input files larger than 2 gigabytes.
- Condor can now detect the size of memory on a Linux machine with the 2.6
- JAR files specified in the submit file were not being transfered
along with the job unless they were also explicitly placed in the list
of input files to transfer. Now, the JAR files are implicitly added to the
list of input files to transfer.
- Version 6.7.1 contains all of the features, ports, and bug fixes
from the previous stable series, up to and including version 6.6.6.
There are a few additional bugs that have been fixed in the 6.6.x
stable series which have not yet been released, but which will
appear in version 6.6.7.
These bug fixes have been included in version 6.7.1, and appear in
the ``Bugs fixes included from version 6.6.7'' list below.
In addition, a number of new features and some bug fixes have been
made, which are described below in more detail.
- Added an option to DAGMan's retry ability. If a DAG specifies
something like ``RETRY job 10 unless-exit 9'', then the retries will
only happen if the node doesn't exit with a value of 9.
- Condor-G can now submit jobs to Globus 3.2 (WS) (for jobs with
universe = grid, grid_type = gt3). Submitting to Globus
3.0 (as in Condor 6.7.0) is no longer supported. Submitting to pre-WS
Globus (2.x) is still supported (grid_type = gt2).
- Added new startd policy expression MaxJobRetirementTime. This
specifies the maximum amount of time (in seconds) that the startd
is willing to wait for a job to finish on its own when the startd
needs to preempt the job (for owner preemption, negotiator preemption,
or graceful startd shutdown).
- Added -peaceful shutdown/restart mode. This will shut down the
startd without killing any jobs, effectively treating both
MaxJobRetirementTime and GRACEFUL_SHUTDOWN_TIMEOUT as
infinite. The default shutdown/restart mode is still -graceful, which
behaves according to whatever MaxJobRetirementTime and
GRACEFUL_SHUTDOWN_TIMEOUT are. The behavior of -fast mode
is unchanged; it kills jobs immediately, regardless of the other
- Jobs can now be submitted as ``noop'' jobs. Jobs submitted with
noop_job = true will not be executed by Condor, and instead will
immediately have a terminate event written to the job log file and
removed from the queue. This is useful for DAGs where the pre-script
determines the job should not run.
- Added preliminary support for the Tool Daemon Protocol (TDP)
This protocol is still under development, but the goal is to provide
a generic way for scheduling systems (daemons) to interact with
Assuming this protocol is adopted by other scheduling systems and by
various monitoring tools, it would allow arbitrary combinations of
tools and schedulers to co-exist, function properly, and provide
monitoring services for jobs running under the schedulers.
This initial support allows users to specify a ``tool'' that should
be spawned along-side their regular Condor job.
On Linux, the ability to have the batch Condor job suspend
immediately upon start-up is also implemented, which allows a
monitoring tool to attach with ptrace() before the job's main()
function is called.
- Fixed a significant memory leak in the condor_ schedd that was
introduced in version 6.7.0.
In 6.7.0, the condor_ schedd would leak a copy of ClassAd for every
job it tried to spawn (on average, around 2000 bytes per job).
- Fixed the bugs in Condor's MPI support that were introduced in
Condor now supports MPI jobs linked with MPICH 1.2.4 and older.
Improved Condor's log messages and email notifications when MPI jobs
run on multiple virtual machines (the messages now include the
appropriate ``vmX'' identifier, not just the hostname).
Unfortunately, due to changes in MPICH between version 1.2.4 and
1.2.5, Condor's MPI support is not compatible with MPICH 1.2.5.
We will be addressing this problem in a future release.
Bugs fixes included from version 6.6.7:
- Fixed an important bug in the low-level code that Condor uses to
transfer files across a network.
There were certain temporary failure cases that were being treated
as permanent, fatal errors.
This resulted in file transfers that aborted prematurely, causing
jobs to needlessly re-run.
The code now gracefully recovers from these temporary errors.
This should significantly help throughput for some sites,
particularly ones that transfer very large files as output from
- Fixed a number of bugs in the -format option to condor_ q
and condor_ status.
Now, these tools will properly handle printing boolean expressions
in all cases.
Previously, depending on how the boolean evaluated, either the
expression was printed, or the tool could crash.
Furthermore, the tools do a better job of handling the different
types of format conversion strings and printing out the appropriate
For example, if a user tries to print out a boolean attribute with
condor_status -format "%d\n" HasFileTransfer, the
condor_ status tool will evaluate HasFiletransfer and print
either a 0 or a 1 (FALSE or TRUE).
If, on the other hand, a user tries to print out a boolean attribute
condor_status -format "%s\n" HasFileTransfer, the
condor_ status tool will print out the string ``FALSE'' or ``TRUE''
- The ClassAd attribute scope resolution prefixes, MY.
and TARGET., are no longer case sensitive.
- condor_ dagman now does better checking for inconsistent events
(such as getting multiple terminate events for a single job). This
checking can be disabled with the -NoEventChecks command-line
- Version 6.7.0 contains all of the features, ports, and bug fixes
from the previous stable series, up to and including version 6.6.4.
In addition, a number of new features and some bug fixes have been
made, which are described below in more detail.
- Added support for vanilla and Java jobs to reconnect when the
connection between the submitting and execution nodes is lost for
Possible reasons for this disconnect include: network outages,
rebooting the submit machine, restarting the Condor daemons on the
submit machine, etc.
If the execution machine is rebooted or the Condor daemons are
restarted, reconnection is not possible.
To take advantage of this reconnect feature, jobs must be submitted
with a JobLeaseDuration.
There are new events in the UserLog related to disconnect and
- Added a new Condor tool, condor_ vacate_job.
This command is similar to condor_ vacate, except the kinds of
arguments it takes define jobs in a job queue, not machines to
For example, a user can vacate a specific job id, all the jobs in a
given cluster, all the jobs matching a job queue constraint, or even
all jobs owned by that user.
The owner of a job can always vacate their own jobs, regardless of
the pool security policy controlling condor_ vacate (which is an
administrative command which acts directly on machines).
See the new command reference, section 9
on page for details.
- Added a new ``High Availability'' service to the condor_ master.
You can now specify a daemon which can have ``fail over'' capabilities
(i.e. the master on another machine can start a matching daemon if the
first one fails). Currently, this is only available over a shared
file system (i.e. NFS), and has only been tested for the condor_ schedd.
- Scheduler universe jobs on UNIX can now specify a
HoldKillSig, the signal that should be sent when the job is
put on hold.
If not specified, the default is to use the KillSig, and if
that is not defined, the job will be sent a SIGTERM.
The submit file keyword to use for defining this signal is
hold_kill_sig, for example,
hold_kill_sig = SIGUSR1.
- The condor_ startd can now support policies on SMP machines
where each virtual machine (VM) has knowledge of the other VMs on
the same host.
For example, if a job starts running on one of the VMs, a job
running on another VM could immediately be suspended.
This is accomplished by using the new configuration variable
STARTD_VM_EXPRS , which is a list of ClassAd attribute
names that should be shared across all VMs on the machine.
For each VM on the machine, every attribute in this list is looked
up in the VM-specific machine ClassAd, the attribute name is given a
prefix indicating what VM it came from, and then inserted into the
machine ClassAds of all the other VMs.
- The condor_ startd publishes four new attributes into the
machine ClassAds it generates when it is in the Claimed state:
These attributes keep track of the total time the resource was
either running a job (in the Busy activity) or had a job suspended,
regardless of how many suspend/resume cycles the job went through.
The first two attributes (with ``Job'' in the name) keep track for a
single job (i.e. since the last time the resource was
The last two attributes (with ``Claim'' in the name) keep track of
these totals across all jobs that ran under the same claim
(i.e. since the last state change into the Claimed state).
- Added a -num option to the condor_ wait tool to wait for
a specified number of jobs to finish.
- Added a configuration option STARTER_JOB_ENVIRONMENT
so the admin can configure the default environment inherited by
- Added a (configurable, defaults to off) feature to the condor_ schedd
to allow backup the spool file before doing anything else.
- The "Continuous" option of the condor_ startd ``cron'' jobs is
being deprecated. It's being replaced by two new options which
control separate aspects of it's behavior:
- "WaitForExit" specifies the "exit timing" mode
- "ReConfig" specifies that the job can handle SIGHUPs, and it should
be sent a SIGHUP when the condor_ startd is reconfigured.
- A lot of the items logged by the condor_ startd ``cron'' logic,
changed to D_FULLDEBUG (from D_ALWAYS), etc.
- Added NEGOTIATOR_PRE_JOB_RANK and
NEGOTIATOR_POST_JOB_RANK . These expressions are applied
respectively before and after the user-supplied job rank when deciding
which of the possible matches to choose. (The existing expression
PREEMPTION_RANK is applied after
NEGOTIATOR_POST_JOB_RANK .) The pool administrator may use
these expressions to steer jobs in ways that improve the overall
performance of the pool. For example, using the pre job rank,
preemption may be avoided as long as there are idle machines, even
when the user-supplied rank expression prefers a machine that happens
to be busy. Using the post job rank, one could steer jobs towards
machines that are known to be dedicated to batch jobs, or one could
enforce breadth-first instead of depth-first filling of a cluster of
- Added the ability for Condor to transfer files larger than 2G on
platforms that support large files. This works automatically for
transferred executables, input files and output files.
- Added the ability for jobs to stream back standard input, output, and
error files while running. This is activated by the stream_input,
stream_output, and stream_error options to condor_ submit.
Note that this feature is incompatible with the new feature described
above where the shadow and starter can reconnect in certain
- Added support for vanilla jobs to be mirrored on a second
condor_ schedd. The jobs are submitted to the second condor_ schedd
on hold and will be released if the second condor_ schedd hasn't
heard from the first condor_ schedd (actually, a condor_ gridmanager
running under the first condor_ schedd) for a configurable amount of
time. Once the second condor_ schedd releases the jobs, the first
condor_ schedd acts as a mirror, reflecting the state of the jobs on
the second condor_ schedd.
To use this mirroring feature, jobs must be submitted
with a mirror_schedd parameter in the submit file and require
no file transfer.
- Fixed a bug in the condor_ startd ``cron'' logic which caused the
condor_ startd to except when trying to delete a job that could never
be run (i.e. invalid executable, etc).
- Fixed a bug in condor_ startd ``cron'' logic which caused it to
not detect when the starting of a ``job'' failed.
- Fixed several bugs in the reconfiguration handling of the
condor_ startd ``cron'' logic. In particular, even if the job has
the "reconfig" option set (or "continuous"), the job(s) won't be sent
a SIGHUP when the startd first starts, or when the job itself is first
run (until it outputs its first output block, defined by the "-"
- Condor's MPI support (for MPICH 1.2.4) was broken by other
changes in version 6.7.0.
Support for MPI jobs will return in Condor version 6.7.1.
Condor 6.7.0 supported platforms
|Hewlett Packard PA-RISC (both PA7000 and PA8000 series)
|Sun SPARC Sun4m,Sun4c, Sun UltraSPARC
||Solaris 2.6, 2.7, 8, 9
|Silicon Graphics MIPS (R5000, R8000, R10000)
||IRIX 6.5 (clipped)
||Red Hat Linux 7.1, 7.2, 7.3, 8.0
||Red Hat Linux 9
||Windows 2000 Professional and Server, 2003 Server (clipped)
||Windows XP Professional (clipped)
||Digital Unix 4.0
||Red Hat Linux 7.1, 7.2, 7.3 (clipped)
||Tru64 5.1 (clipped)
||Macintosh OS X (clipped)
||AIX 5.2L (clipped)
||Red Hat Linux 7.1, 7.2, 7.3 (clipped)
||SuSE Linux Enterprise 8.1 (clipped)
Next: 8.5 Stable Release Series
Up: 8. Version History and
Previous: 8.3 Stable Release Series