Next: 8.4 Development Release Series Up: 8. Version History and Previous: 8.2 Upgrade Surprises Contents Index

Subsections

8.3 Stable Release Series 6.8

This is a stable release series of Condor. It is based on the 6.7 development series. All new features added or bugs fixed in the 6.7 series are available in the 6.8 series. As usual, only bug fixes (and potentially, ports to new platforms) will be provided in future 6.8.x releases. New features will be added in the forthcoming 6.9.x development series.

The 6.8.x series supports a different set of platforms than 6.6.x. Please see the updated table of available platforms in section 1.5 on page .

The details of each version are described below.

Version 6.8.3

Release Notes:

Performed a security audit of all places where Condor opens files, to make certain files are opened with a reasonable permission mode and with the O_EXCL flag whenever possible.

New Features:

Added the JOB_INHERITS_STARTER_ENVIRONMENT configuration macro. When set to True, jobs inherit all environment variables from the condor_ starter. This is useful for glidein jobs that need to access environment variables from the batch system running the glidein daemons. The default for this configuration macro is False, so existing behavior is unchanged. This feature does not apply to standard and pvm universe jobs.
Changed the default UDP receive buffer for the condor_ collector from 1M to 10M. This value can be configured with the (existing) COLLECTOR_SOCKET_BUFSIZE macro.
NOTE: For some Linux distributions, it may be necessary to configure a larger value than the default; this parameter is /proc/sys/net/core/rmem_max . You can see the values that the condor_ collector actually used by enabling D_FULLDEBUG for the condor_ collector and looking at the log line that looks like this:
Reset OS socket buffer size to 2048k (UDP), 255k (TCP).
Added a new configuration macro to control the size of the TCP send buffers for the condor_ collector. This macro used to be the same as COLLECTOR_SOCKET_BUFSIZE. The new macro is COLLECTOR_TCP_SOCKET_BUFSIZE , and it defaults to 128K.
Added a clipped port for SuSE Linux Enterprise Server 9 running on the PowerPC architecture. Note the known bug below.
The condor_ schedd now maintains a birth date for the job queue. Nothing in Condor currently uses this feature, but future versions of condor_ quill may require it.
There is a new configuration file macro RANDOM_INTEGER(min,max[,step]). It produces a pseudo-random integer within the range min and max, inclusive at configuration time.

Bugs Fixed:

Fixed a deadlock situation between the condor_ schedd and the condor_ startd that can significantly impact the condor_ schedd's performance. The likelihood of the deadlock increased based upon the number of VMs advertised by the condor_ startd.
Fixed a bug reading the user job log on Windows that caused occasional DAGMan confusion. Thanks to Fairview Software, Inc. for both finding the bug and writing a patch.
Fixed a denial of service problem: Condor daemons no longer freeze for 20 seconds when a client connects to them and then sends no data. This behavior is common with port scanners.
Fixed a race condition with condor_ quill caused by PostgreSQL's default transaction isolation level being ``read committed''. This bug would cause truncated condor_ q reads when using Quill.
Fixed a bug where the condor_ ckpt_server would segfault when turned off with condor_ off -fast.
Fixed a bug in the condor_ startd where it could die with SIGABRT when a condor_ starter exited under certain rare circumstances. The bug seems to have been most likely to appear on x86_64 Linux machines, but could potentially affect all platforms.
Fixed a problem with condor_ history when running with Quill enabled, which caused it to allocate an unbounded amount of memory.
Fixed a problem with condor_ q when running with Quill, which caused it to silently truncate the printing of the job queue.
Fixed a bug in the condor_ gridmanager that caused the following configuration files parameters to be ignored for grid types condor and nordugrid jobs: GRIDMANAGER_RESOURCE_PROBE_INTERVAL, GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE, and GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE.
Fixed a bug in condor_ run that caused it to abort on non-fatal warnings from condor_ submit and print incorrect error messages.
Fixed a bug in the condor_ gridmanager dealing with grid type gt4 grid universe jobs. If the job's standard output or error was not specified in the job ClassAd, the condor_ gridmanager would create an improper GRAM RSL string, causing the job to fail.
Fixed a bug in the condor_ gridmanager that could cause it to delegate the wrong credential when refreshing the credentials for a grid type gt4 grid universe job.
The condor_ gridmanager could get into a state where it would no longer start up Globus jobmanagers for grid type gt2 grid universe jobs, if previous requests failed due to connection errors. This bug has been fixed.
The condor_ c-gahp now properly exits when the pipe to its parent goes away. Before, it would fill its log with large amounts of useless messages, before exiting several minutes later.
Fixed a bug where a problem opening standard input, output, or error, the standard universe might generate an incorrect warning in the condor_ shadow's log.
The condor_ gridmanager now recovers properly when a proxy refresh fails for a gt2 grid universe job in the stage-out state. Before, the job would become held with a hold reason of ``Globus error 3: an I/O operation failed''.
A number of fixes to minor typos and incorrect formatting in Condor's log files.
When REQUEST_CLAIM_TIMEOUT was reached and the condor_ schedd failed to contact the condor_ startd to release the claim, the condor_ schedd would periodically try releasing the claim indefinitely, possibly resulting in a lengthy communication delay each time.
Under Windows, Condor daemons such as the condor_ schedd were sometimes limiting their use of pending connect operations more than they should have. This would result in the message, ``file descriptor safety level exceeded''.
condor_ fetchlog no longer allows or documents the -dagman option. The option's appearance was an error. The option never worked.
The condor_ schedd ensures that the initial job queue log file contains a sequence number for use by Quill. This fixes a case in which no sequence number was inserted, because the initial rotation of this (empty) file failed. Quill also now reports exactly what the problem is if it reads a job queue log in this state, rather than simply crashing. This problem has so far only been observed under Windows.
Fixed a problem on Windows where, when submitting a job with a sandbox (for example, using the -s or -r option to condor_ submit), an erroneous file permissions check in the condor_ schedd would result in a failed submission.
The condor_ startd would crash shortly after start up if the RANK expression contained any use of the unary minus operator. This patch should also fix any other cases where Condor daemons crashed due to the use of the unary minus operator in ClassAd expressions.
Stork now writes a terminated event to the user log when it removes a transfer job from its queue because of failures to invoke a transfer module. Without this event, DAGMan would not notice that these jobs had left the queue.
Fixed a problem where the condor_ schedd on Windows would incorrectly reject a job if the client provided an Owner attribute that was correct but differed in case from the authenticated name. This bug was thought to have been fixed in Condor 6.8.0.
Fixed problems with condor_ store_cred behaving strangely when storing or removing a user name that is some initial substring of ``condor_pool''. Specifying such a user name would be incorrectly interpreted as equivalent to specifying the -c option.
Fixed a problem with condor_ glidein spewing lots of text to the screen when checking the status of a job it submitted.
A new version of the GT4 GAHP is included, with the following changes:
- A new axis.jar from Globus fixes a thread safety bug that can cause lockups in subscriptions for WS notifications. See Globus Bugzilla 4858 (http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=4858).
- Fixed bugs that caused memory related to destroyed jobs to not be reclaimed in both the client and the server.
- Removed redundant usage of Secure Message, Secure Conversation, and Transport Security when talking to a WS GRAM service. Now, only Transport Security is used.
Fixed memory leaks in condor_ quill.
Fixed a bug that might have caused condor_ startd problems launching the condor_ starter for the standard universe on 64-bit systems.
Improved Condor's file transfer. If you request that Condor automatically transfer back your output, it now detects changes better. Previously, it would only transfer back files that had a more recent timestamp than the spool date. Now, it will transfer back any file that has changed in date (including being dated in the past) or changed in size.

Known Bugs:

SuSE Linux Enterprise Server 9 on PowerPC only: The default Java interpreter on SuSE Linux Enterprise Server 9 running on the PowerPC architecture has compatibility problems with this release of Condor. The problem exhibits itself as the condor_ startd hanging, never reporting itself to the condor_ collector. The workaround is to either disable the Java universe (set JAVA to an empty string), or disable just-in-time compilation when running in the Java universe with the following configuration setting:
```
  JAVA_EXTRA_ARGUMENTS = -Djava.compiler=""
```

Version 6.8.2

Release Notes:

Condor now uses Globus 4.0.3 for GSI, GRAM, and GridFTP support. This includes a patch for the OpenSSL vulnerability detailed in CVE-2006-4339 and http://www.openssl.org/news/secadv_20060905.txt. It also includes fixes for Globus Bugzilla 4689 (http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=4689) and a bug that can cause duplicate UUIDs to be generated for WS GRAM jobs.
The condor_ schedd daemon no longer forks separate processes to change ownership of job directories in the spool. Previously on Unix-like systems, this would create a new process before a job started running and after it finished running. Some sites with very busy condor_ schedd daemons were encountering scaling problems.

New Features:

Because, by default, the condor_ startd daemon references the job ClassAd attribute NumCkpts, Condor's default configuration will now round up the value of NumCkpts, in order to improve matchmaking performance. See the entry on SCHEDD_ROUND_ATTR in section 3.3.11.
Enhanced the RHEL3 x86_64 port of Condor to include the standard universe.
condor_ submit_dag -f no longer deletes the dagman.out file. condor_ submit_dag without the -f option will now submit a DAGMan run even if the dagman.out file exists. In this case, the file will be appended to.
Added a property to the Windows installer program to determine whether the Condor service will be started after installation. The property name is STARTSERVICE, and the default value is ``Y''.

Bugs Fixed:

A bug caused the condor_ master daemon to kill only immediate children within the process tree, upon an abnormal exit of the condor_ master daemon. The condor_ master daemon now kills all descendant processes.
Fixed a bug where if the file system was full, the debugging log files (for example SchedLog) would silently lose messages. Now, if the disk is full, the Condor daemons will exit.
Fixed a bug in the condor_ schedd daemon that caused it to stop negotiating for grid universe jobs in the case that it decided it could not spawn any new condor_ shadow processes.
Added the ProcessId class (which more uniquely identifies a process than a PID does) to the condor_ dagman abort duplicate runs feature. This makes it less likely that a given instance of condor_ dagman will mistakenly conclude that another instance of condor_ dagman is already running on the same DAG. Also fixed an unrelated bug in the abort duplicate runs feature that could cause a condor_ dagman to not abort itself when it should.
Condor daemons leaked memory (consuming more and more memory over time) when parsing ClassAds that use functions with arguments.
Fixed a bug in the condor_ starter daemon, which caused it to look in the wrong place for the job's executable, if TransferExecutable was set to True in the job ClassAd.
condor_ history no longer crashes if HISTORY is not defined in the Condor configuration file.
Fixed an unintentional change to the value of -Condorlog in a condor_ dagman submit description file: it is once again the log file of the first node job.
Fixed a bug in condor_ q that would cause condor_ q -hold or condor_ q -run to exit with an error on some platforms.
Fixed a bug on Unix platforms, in which a misconfiguration of MAIL would cause the condor_ master daemon to restart all of its child daemons whenever it tried (and failed) to send e-mail to the administrator.
Network related error messages have been improved to make debugging easier. For example, when timing out on a read or write operation, the peer's address is now included in the error message.
An invalid value for UPDATE_INTERVAL now causes the condor_ startd daemon to abort. Previously, it would continue running, but some invalid values (for example, 0) could cause it to stop sending periodic ClassAd updates to the condor_ collector, even after being reconfigured with a valid value. Only a complete restart of the condor_ startd daemon was sufficient to get it out of this state.
Fixed a bug that caused X.509 limited proxies to be delegated as impersonation (i.e. non-limited) proxies. Any authentication attempted with the resulting proxies would fail.
Fixed a couple bugs that would cause Condor to lose track of some Condor-related processes and subsequently fail to clean up (kill) these processes.
Fixed a bug that would cause condor_ history to crash when dealing with rotated history files. Note that history file rotation is turned on by default. (See Section 3.3.3 for descriptions of ENABLE_HISTORY_ROTATION and MAX_HISTORY_ROTATIONS .)

Known Bugs:

None.

Version 6.8.1

Release Notes:

Version 6.8.1 fixes important bugs, some of which have security implications. All users are encouraged to upgrade, and full disclosure of the vulnerabilities will be given at the end of October 2006.
Condor is now linked against GSI from Globus 4.0.2. This includes a patch for Globus Security Advisories 2006-01 (http://www.globus.org/mail_archive/security-announce/2006/08/msg00000.html) and 2006-02 (http://www.globus.org/mail_archive/security-announce/2006/08/msg00001.html). It also includes a patch for the OpenSSL vulnerability detailed in CVE-2006-4339 and http://www.openssl.org/news/secadv_20060905.txt.
The PCRE (Perl Compatible Regular Expressions) library used by Condor is now dynamically linked and shipped as a DLL with Condor for Windows, rather than being statically linked.

New Features:

Added an optional argument to the condor_ dagman ABORT-DAG-ON command that allows the DAGMan exit code to be specified separately from the node value that causes the abort; also, a DAG can now be aborted on a zero exit code from a node.
Added the ALLOW_FORCE_RM configuration variable. If this expression evaluates to True, then an condor_ rm -f attempt is allowed. If it evaluated to False, the attempt is disallowed. The expression is evaluated in the context of the job ClassAd. If not defined, the value defaults to True, matching the behavior of previous Condor releases.
condor_ dagman will now reject DAGs for which any of the nodes' user job log files are on NFS (because of the unreliability of NFS file locking, this can cause DAGs to fail). This feature can be turned off by setting the DAGMAN_LOG_ON_NFS_IS_ERROR configuration macro to False (the default is True).
condor_ submit can now be configured to reject jobs for which the log file is on NFS. To do this, set the LOG_ON_NFS_IS_ERROR configuration macro to True. The default is that condor_ submit will issue a warning for a log file on NFS.
Added the DAGMAN_ABORT_DUPLICATES configuration macro, which causes condor_ dagman to attempt to detect at startup whether another condor_ dagman is already running on the same DAG; if so, the second condor_ dagman will abort itself.
The new configuration variable NETWORK_MAX_PENDING_CONNECTS may be used to limit the maximum number of simultaneous network connection attempts. This is primarily relevant to the condor_ schedd daemon, which may try to connect to large numbers of condor_ startd daemons when claiming them. The condor_ negotiator may also connect to large numbers of condor_ startd daemons when initiating security sessions used for sending MATCH messages. On Unix, the default is to allow up to eighty percent of the process file descriptor limit. On Windows, the default is 1600.
Added some more debug output to condor_ dagman to clarify fatal errors.
The -format argument to condor_ q and condor_ status can now take an expression in addition to a simple attribute name.
DRMAA is now available on most Linux platforms, Windows and PPC MacOS.

Bugs Fixed:

When a large number of jobs (roughly 200 or more) are running from a single condor_ schedd daemon, and those jobs are using job leases (the default in 6.8), it is possible for the condor_ schedd daemon to enter a state where it crashes on startup until all of the job leases expire.
Condor jobs submitted with the NiceUser priority were not being matched if the NEGOTIATOR_MATCHLIST_CACHING setting was TRUE (which is enabled by default).
Fixed a Quill bug that prevented it from running on Windows. The symptom showed with errors in the QuillLog such as
```
POLLING RESULT: ERROR
```
Fixed a bug in Quill where it would cause errors such as
```
duplicate key violates unique constraint "history_vertical_pkey"
```
in the QuillLog and the PostgreSQL log file. These errors triggered a significant slowdown in the performance of Quill and the database. This would only happen when a job attribute changed type from a string type to a numeric type, or vice versa.
In those unusual cases where Condor is unable to create a new process, it shuts down cleanly, eliminating a small possibility of data corruption.
Fixed a bug with the gt4 and nordugrid grid universe jobs that caused the stdout and stderr of a job to not be transferred correctly, if the given file names had absolute paths.
condor_ dagman now echos warnings from condor_ submit and stork_ submit to the dagman.out file.
Fixed a bug introduced in 6.7.20, causing the condor_ ckpt_server to exit immediately after starting up, unless Condor's security negotiation was disabled.
MAX_<SUBSYS>_LOG defaults to one Megabyte, even if the setting is missing from the configuration. Previously it was 64 Kilobytes.
Fixed a bug related to non-blocking connect that could occasionally cause Condor daemons to crash.
Fixed a rare bug where an exceptionally large query to the condor_ collector could cause it to crash. The most common cause was a single condor_ schedd daemon restarting, and trying to recover a large number of job leases at once. More than approximately 250 running jobs on a single condor_ schedd daemon would be necessary to trigger this bug.
When using the JOB_PROXY_OVERRIDE_FILE configuration parameter, the X.509 proxy will now be properly forwarded for Condor-C jobs.
Greatly reduced the chance that a Condor-C job in the REMOVED state will be HELD due to an expired proxy or failure to talk to the remote condor_ schedd.
Fixed error and debug messages added in Condor version 6.7.20 that incorrectly reported IP and port numbers. These messages were intended to report the peer's address, but they were instead reporting the local address of the network socket.
Fixed a bug introduced in Condor version 6.7.20 which could cause Condor daemons to die with the message
```
PANIC -- OUT OF FILE DESCRIPTORS
```
The conditions causing this related to failed attempts to send updated status to the condor_ collector daemon, with both non-blocking updates and security negotiation enabled (the defaults).
Also fixed a bug in the negotiator with the same effect as above, except it only happened with the configuration setting NEGOTIATOR_USE_NONBLOCKING_STARTD_CONTACT=False.
Fixed a bug in condor_ schedd under Solaris that could also cause file descriptors to become exhausted over time when many machines were claimed in a short spans of time (e.g. over 100) and the condor_ schedd process file descriptor limit was near 256.
Fixed a bug in condor_ schedd under Windows that could cause network sockets to be allocated and never released back to the system. The circumstances that could cause this were very rare. The error message in the logs indicating that this problem was happening is
```
ERROR: DuplicateHandle() failed in Sock::set_inheritable
```
In cases where this error message is displayed, the network socket is closed.
Under some conditions, when making TCP connections, Condor was still trying to connect for the full duration of the operation timeout (often 10 or 20 seconds), even if the connection attempt was refused (for example, because the port being accessed is not accepting connections). Now, the connect operation finishes immediately after the first such failure, allowing the Condor process to continue with other tasks.
Fixed the problems relating to credential cache problems in the Kerberos authentication mechanism. The current version of Kerberos is 1.4.3.
Fixed bugs in the SSL authentication mechanism that caused the condor_ schedd to crash when submitting a job (on Unix) and caused all tools and daemons to crash on Windows when using SSL.
Some of the binaries required to use Condor-C on Windows were mistakenly not included in previous releases of Condor. This has been fixed.
Fixed a problem on Windows where the condor_ startd could fail to include some attributes in its ClassAd. This would result in some jobs incorrectly not being matched to that machine. This only happened if CREDD_HOST was defined and Condor daemons on the execute machine were unable to authenticate with the condor_ credd.
Fixed a condor_ dagman bug which had prevented the $(DAGManJobId) attribute from being expanded in job submit files (for example, when used as the value to define the Priority command).
Fixed a bug in condor_ submit that caused parallel universe jobs submitted via Condor-C to become mpi universe jobs.
Fixed a bug which could cause Condor daemons to hang if they try to write to the standard error stream (stderr) on some platforms. In general, this should never happen, but can, due to third party libraries (beyond our control) trying to write error or other messages.
Fixed condor_ status to report error messages.
Fixed a bug in which setting the configuration variable
```
NEGOTIATOR_CONSIDER_PREEMPTION = False
```
caused an incorrect calculation. The fraction of the pool already being claimed by a user was calculated using the wrong total number of condor_ startd daemons. This could cause some condor_ startd daemons to remain unclaimed, even when there were jobs available to run on them.
Fixed a security vulnerability in Condor's FS and FS_REMOTE authentication methods. The vulnerability allowed an attacker to impersonate another user on the system, potentially allowing submission of jobs as a different user. This may allow escalation to root privilege if the Condor binaries and configuration files have improper permissions. The fix is not backwards compatible, which means all daemons and tools using FS authentication must be running Condor 6.8.1 or greater. The same applies to FS_REMOTE; All daemons and tools using FS_REMOTE must be using Condor 6.8.1 or greater. In practice, this means that for FS, all Condor binaries on one host must be version 6.8.1 or greater, but versions can be different from host to host. For FS_REMOTE it means all binaries across all hosts must be 6.8.1 or greater.
Fixed a couple race conditions in stork and the credd where credential files were possibly created with improper permissions before being set to owner permissions.
Fixed a bug in the condor_ gridmanager that caused it to delegate 12-hour proxies for grid-type gt4 jobs and then not refresh them.
Fixed a bug in the condor_ gridmanager that caused a directory needed for staging-in of grid-type gt4 job files to be removed when the condor_ Gridmanager exited, causing the stage-in to fail.
Fixed a bug that caused the checkpoint server to restart because of (ostensibly) getting an unexpected errno from select().
Fixed a bug on Windows where setting output or error to a relative or absolute path (as opposed to a simple file name without path information) would not work properly.
History file rotation did not previously work on Windows because the name of a rotated files would contain an ISO 8601 extended format timestamp, which contains colon characters. The naming convention for rotated files has been modified to use ISO 8601 basic format, avoiding this problem.
The CLAIMTOBE authentication method (which is inherently insecure and should only be used for testing or other special circumstances) previously would authenticate without providing the ``domain'' portion of the user name. As an example, a user would be authenticated as simply ``user'' rather than ``user@cs.wisc.edu''. This problem has been fixed, but the new protocol is not backwards compatible so the fix is turned off by default. Correct behavior can be enabled by setting the SEC_CLAIMTOBE_INCLUDE_DOMAIN parameter to True.
Fixed a bug with the NEGOTIATOR_MATCHLIST_CACHING that would cause very low-priority jobs (like jobs submitted with nice_user=True) to not match even if resources were available.
Fixed a buffer overflow that could crash the condor_ negotiator.
SCHEDD_ROUND_ATTR_<xxxx> preserves the value being rounded up when it is a multiple of the power of 10 specified for rounding. Previously, the value would be incremented; now it remains the same. For example, if SCHEDD_ROUND_ATTR_<xxxx>=2 and the value being rounded up is 100, it now remains 100, rather than being incremented to 200.
Fixed condor_ updates_stats to report it's version number correctly.

Known Bugs:

The -completedsince option to condor_ history works when Quill is enabled. The behavior of condor_ history -completedsince is undefined when Quill is not enabled.

Version 6.8.0

Release Notes:

The default configuration for Condor now requires that HOSTALLOW_WRITE be explicitly set. Condor will refuse to start if the default configuration is used unmodified. Existing installations should not need to change anything. For those who desire the earlier default, you can set it to "*", but note that this is potentially a security hole allowing anyone to submit jobs or machines to your pool.
Most Linux distributions are now supported using dynamically linked binaries built on a RedHat Enterprise Linux 3 machine. Recent security patches to a number of Linux distributions have rendered the binaries built on RedHat 9 machines ineffective. The download pages have been changed to reflect this, but Linux users should be aware of this change. The recommended download for most x86 Linux users is now: condor-6.8.0-linux-x86-rhel3-dynamic.tar.gz.
Some log messages have been clarified or moved to different debugging levels. For example, certain messages that looked like errors were printed to D_ALWAYS, even though nothing was wrong and the system was behaving as expected.
The new features and bugs fixed in the rest of this section only refer to changes made since the 6.7.20 release, not the last stable release (6.6.11). For a complete list of changes since 6.6.11, read the 6.7 version history in section 8.4 on page .

New Features:

Version 1.4 of the Condor DRMAA libraries are now included with the Condor release. For more information about DRMAA, see section 4.4.2 on page .
Version 1.0.15 of the Condor GAHP is now used for Condor-G and Condor-C.
Added the -outfile_dir command-line argument to condor_ submit_dag. This allows you to change the directory in which condor_ dagman writes the dagman.out file.
Added a new -summary (also -s) option to the condor_ update_stats tool. If enabled, this prevents it from displaying the entire history for each machine and only displays the summary info.

Bugs Fixed:

Fixed a number of potential static buffer overflows in various Condor daemons and libraries.
Fixed some small memory leaks in the condor_ startd, condor_ schedd, and a potential leak that effected all Condor daemons.
Fixed a bug in Quill which caused it to crash when certain long attributes appeared in a job ad.
The startd would crash after a reconfig if the address of a collector had not been resolved since the previous reconfig (e.g. because DNS was down during that time).
Once a Condor daemon failed to lookup the IP address of the collector (e.g. because DNS was down), it would fail to contact the collector from that time until the next reconfig. Now, each time Condor tries to contact the collector, it generates a fresh DNS query if the previous attempt failed.
When using Condor-C or the -s or -r command-line options to condor_ submit, the job's standard output and error would be placed in the job's initial working directory, even if the job ad said to place them in a different directory.
Greatly sped up the parsing of large DAGs (by a factor of 50 or so) by using a hash table instead of linear search to find DAG nodes.
Fixed a bug in condor_ dagman that caused an EXECUTABLE_ERROR event from a node job to abort the DAG instead of just marking the relevant node as failed.
Fixed a bug in condor_ collector that caused it to discard machine ads that don't have an IP address field (either StartdIpAddr or STARTD_IP_ADDR). The condor_ startd will always produce a StartdIpAddr field, but machine ads published through condor_ advertise may not.
When using BIND_ALL_INTERFACES on a dual-homed machine, a bug introduced in 6.7.18 was causing Condor daemons to sometimes incorrectly report their IP addresses, which could cause jobs to fail to start running.
Made the event checking in condor_ dagman less strict: added the new "allow duplicate events" value to the DAGMAN_ALLOW_EVENTS macro (this value is part of the default); 16 value now also allows terminate event before submit; changed "allow all events" to "allow almost all events" (all except "run after terminal event"), so it is more useful.
condor_ dagman and condor_ submit_dag now report -NoEventChecks as ignored rather than deprecated.
Fixed a bug in the condor_ dagman -maxidle feature: a shadow exception event now puts the corresponding job into the idle state in condor_ dagman's internal count.
Fixed a problem on Windows where daemons would sometimes crash when dealing with UNC path names.
Fixed a problem where the condor_ schedd on Windows would incorrectly reject a job if the client provided an Owner attribute that was correct but differed in case from the authenticated name.
Fixed a condor_ startd crash introduced in version 6.7.20. This crash would appear if an execute machine was matched for preemption but then not claimed in time by the appropriate condor_ schedd.
Resolved an issue where the condor_ startd was unable to clean up jobs' execute directories on Windows when the condor_ master was started from the command line rather than as a service.
Added more patches to Condor's DRMAA interface to make it more compatible with Sun Grid Engine's DRMAA interface.
Removed the unused D_UPDOWN debug level and added the D_CONFIG debug level.
Fixed a bug that caused condor_ q with the -l or -xml arguments to print out duplicate attributes when using Quill.
Fixed a bug that prevented Condor-C jobs (universe grid jobs of type condor) from submitting correctly if QUEUE_ALL_USERS_TRUSTED is set to True.
Fixed a bug that could cause the condor_ negotiator to crash if the pool contains several different versions of the condor_ schedd and in the config file NEGOTIATOR_MATCHLIST_CACHING is set to True.
Changed the default value for config file entry NEGOTIATOR_MATCHLIST_CACHING from False to True. When set to True, this will instruct the negotiator to safely cache data in order to improve matchmaking performance.
The Condormaster now recognizes condor_ quill as a valid Condor daemon without any manual configuration on the part of site administrators. This simplifies the configuration changes required to enable Quill.
Fixed a rare bug in the condor_ starter where if there was a failure transferring job output files back to the submitting host, it could hang indefinitely, and the job appeared as if it was continuing to run.

Known Bugs:

The -completedsince option to condor_ history works when Quill is enabled. The behavior of condor_ history -completedsince is undefined when Quill is not enabled.

Next: 8.4 Development Release Series Up: 8. Version History and Previous: 8.2 Upgrade Surprises Contents Index

condor-admin@cs.wisc.edu