Next: 6.3 Macintosh OS X Up: 6. Platform-Specific Information Previous: 6.1 Linux Contents Index

Subsections

6.2 Microsoft Windows

Windows is a strategic platform for Condor, and therefore we have been working toward a complete port to Windows. Our goal is to make Condor every bit as capable on Windows as it is on Unix - or even more capable.

Porting Condor from Unix to Windows is a formidable task, because many components of Condor must interact closely with the underlying operating system. Instead of waiting until all components of Condor are running and stabilized on Windows, we have decided to make a clipped version of Condor for Windows. A clipped version is one in which there is no checkpointing and there are no remote system calls.

This section contains additional information specific to running Condor on Windows. Eventually this information will be integrated into the Condor Manual as a whole, and this section will disappear. In order to effectively use Condor, first read the overview chapter (section 1.1) and the user's manual (section 2.1). If you will also be administrating or customizing the policy and set up of Condor, also read the administrator's manual chapter (section 3.1). After reading these chapters, review the information in this chapter for important information and differences when using and administrating Condor on Windows. For information on installing Condor for Windows, see section 6.2.10.

6.2.1 What is missing from Condor Version 6.8.3 for Windows?

In general, this release for Windows works the same as the release of Condor for Unix. However, the following items are not supported in this version:

The Standard and PVM job universes are not present. This means transparent process checkpoint/migration and remote system calls are not supported.
For grid universe jobs, the only supported grid type is condor.
Accessing files via a network share that requires a kerberos ticket (such as AFS) is not yet supported.

6.2.2 What is included in Condor Version 6.8.3 for Windows?

Except for those items listed above, most everything works the same way in Condor as it does in the Unix release. This release is based on the Condor Version 6.8.3 source tree, and thus the feature set is the same as Condor Version 6.8.3 for Unix. For instance, all of the following work in Condor:

The ability to submit, run, and manage queues of jobs running on a cluster of Windows machines.
All tools such as condor_ q, condor_ status, condor_ userprio, are included. Only condor_ compile is not included.
The ability to customize job policy using ClassAds. The machine ClassAds contain all the information included in the Unix version, including current load average, RAM and virtual memory sizes, integer and floating-point performance, keyboard/mouse idle time, etc. Likewise, job ClassAds contain a full complement of information, including system dependent entries such as dynamic updates of the job's image size and CPU usage.
Everything necessary to run a Condor central manager on Windows.
Security mechanisms.
Support for SMP machines.
Condor for Windows can run jobs at a lower operating system priority level. Jobs can be suspended, soft-killed by using a WM_CLOSE message, or hard-killed automatically based upon policy expressions. For example, Condor can automatically suspend a job whenever keyboard/mouse or non-Condor created CPU activity is detected, and continue the job after the the machine has been idle for a specified amount of time.
Condor correctly manages jobs which create multiple processes. For instance, if a Condor job spawns multiple processes and Condor needs to kill the job, all processes created by the job will be terminated.
In addition to interactive tools, users and administrators can receive information from Condor by e-mail (standard SMTP) and/or by log files.
Condor includes a friendly GUI installation and set up program, which can perform a full install or deinstall of Condor. Information specified by the user in the set up program is stored in the system registry. The set up program can update a current installation with a new release using a minimal amount of effort.

6.2.3 Secure Password Storage

In order for Condor to operate properly, it must at times be able to act on behalf of users who submit jobs. In particular, this is required on submit machines so that Condor can access a job's input files, create and access the job's output files, and write to the job's log file from within the appropriate security context. It may also be desirable for Condor to execute the job itself under the security context of its submitting user (see 6.2.4 for details on running jobs as the submitting user on Windows).

On Unix systems, arbitrarily changing what user Condor performs its actions as is easily done when Condor is started with root privileges. On Windows, however, performing an action as a particular user requires knowledge of that user's password, even when running at the maximum privilege level.

Condor on Windows supports the notion of user privilege switching through the use of a secure password store. Users can provide Condor with their passwords using the condor_ store_cred tool. Passwords managed by Condor are encrypted and stored at a secure location within the Windows registry. When Condor needs to perform an action as a particular user, it can then use the securely stored password to do so.

The secure password store can be managed by the condor_ schedd. This is Condor's default behavior, and is usually a good approach in environments where the user's password is only needed on the submit machine. This occurs when users are are not allowed to submit jobs that run under the security context of the submitting user.

In environments where users can submit Condor jobs that run using their Windows accounts, it is necessary to configure a centralized condor_ credd daemon to manage the secure password store. This makes a user's password available, via an encrypted connection to the condor_ credd, to any execute machine that may need to execute a job under the user's Windows account.

The condor_config.local.credd example file, included in the etc subdirectory of the Condor distribution, demonstrates how to configure a Condor pool to use the condor_ credd for password managment.

The following configuration macros are needed for all hosts that share a condor_ credd daemon for password management. These will typically be placed in the global Condor configuration file.

CREDD_HOST - This is the name of the machine that runs the condor_ credd.
CREDD_CACHE_LOCALLY - This affects Condor's behavior when a daemon does a password fetch operation to the condor_ credd. If CREDD_CACHE_LOCALLY is True, the first successful fetch of a user's password will result in the password being stashed in a local secure password store. Subsequent uses of that user's password will not require communication with the condor_ credd. If not defined, the default value is False.

Careful attention must be given to the condor_ credd daemon's security configuration. All communication with the condor_ credd daemon should be strongly authenticated and encrypted. The condor_config.local.credd file configures the condor_ credd daemon to only accept password store requests from users authenticated using the NTSSPI authentication method. Password fetch requests must come from Condor daemons authenticated using a shared secret via the password authentication method. Both types of traffic are required to be encrypted. Please refer to section 3.6.1 for details on configuring security in Condor.

6.2.4 Executing Jobs as the Submitting User

By default, Condor executes jobs on Windows using a dedicated ``run account'' that has minimal access rights and privileges. As an alternative, Condor can be configured to run a user's jobs using their own account if the job owner wishes. This may be useful if the job needs to access files on a network share, or access other resources that aren't available to a low-privilege run account. To enable this feature, the following steps must be taken.

Execute machines must have access to users' passwords so they may log into a user's account before running jobs on their behalf. This can be accomplished through the use of a central condor_ credd. Please refer to section 6.2.3 for more information on password storage and the condor_ credd.
The boolean configuration parameter STARTER_ALLOW_RUNAS_OWNER must be set to True on all execute machines.

A user that then wants a job to run using their own account can simply use the run_as_owner command in the job's submit file as follows:

run_as_owner = true

6.2.5 Details on how Condor for Windows starts/stops a job

This section provides some details on how Condor starts and stops jobs. This discussion is geared for the Condor administrator or advanced user who is already familiar with the material in the Administrator's Manual and wishes to know detailed information on what Condor does when starting and stopping jobs.

When Condor is about to start a job, the condor_ startd on the execute machine spawns a condor_ starter process. The condor_ starter then creates:

a run account on the machine with a login name of ``condor-reuse-vmX'', where X is the Virtual Machine number of the condor_ starter. This account is added to group Users. This step is skipped if the job is to be run using the submitting user's account (see section 6.2.4).
a new temporary working directory for the job on the execute machine. This directory is named ``dir_XXX'', where XXX is the process ID of the condor_ starter. The directory is created in the $(EXECUTE) directory as specified in Condor's configuration file. Condor then grants write permission to this directory for the user account newly created for the job.
a new, non-visible Window Station and Desktop for the job. Permissions are set so that only the account that will run the job has access rights to this Desktop. Any windows created by this job are not seen by anyone; the job is run in the background. (Note: Setting USE_VISIBLE_DESKTOP to True will allow the job to access the default desktop instead of a newly created one.)

Next, the condor_ starter (called the starter) contacts the condor_ shadow (called the shadow) process, which is running on the submitting machine, and pulls over the job's executable and input files. These files are placed into the temporary working directory for the job. After all files have been received, the starter spawns the user's executable. Its current working directory set to the temporary working directory (that is, $(EXECUTE)/dir_XXX, where XXX is the process id of the condor_ starter daemon).

While the job is running, the starter closely monitors the CPU usage and image size of all processes started by the job. Every 20 minutes the starter sends this information, along with the total size of all files contained in the job's temporary working directory, to the shadow. The shadow then inserts this information into the job's ClassAd so that policy and scheduling expressions can make use of this dynamic information.

If the job exits of its own accord (that is, the job completes), the starter first terminates any processes started by the job which could still be around if the job did not clean up after itself. The starter examines the job's temporary working directory for any files which have been created or modified and sends these files back to the shadow running on the submit machine. The shadow places these files into the initialdir specified in the submit description file; if no initialdir was specified, the files go into the directory where the user invoked condor_ submit. Once all the output files are safely transferred back, the job is removed from the queue. If, however, the condor_ startd forcibly kills the job before all output files could be transferred, the job is not removed from the queue but instead switches back to the Idle state.

If the condor_ startd decides to vacate a job prematurely, the starter sends a WM_CLOSE message to the job. If the job spawned multiple child processes, the WM_CLOSE message is only sent to the parent process (that is, the one started by the starter). The WM_CLOSE message is the preferred way to terminate a process on Windows, since this method allows the job to cleanup and free any resources it may have allocated. When the job exits, the starter cleans up any processes left behind. At this point, if transfer_files is set to ONEXIT (the default) in the job's submit description file, the job switches from states, from Running to Idle, and no files are transferred back. If transfer_files is set to ALWAYS, then any files in the job's temporary working directory which were changed or modified are first sent back to the submitting machine. But this time, the shadow places these so-called intermediate files into a subdirectory created in the $(SPOOL) directory on the submitting machine ($(SPOOL) is specified in Condor's configuration file). The job is then switched back to the Idle state until Condor finds a different machine on which to run. When the job is started again, Condor places into the job's temporary working directory the executable and input files as before, plus any files stored in the submit machine's $(SPOOL) directory for that job.

NOTE: A Windows console process can intercept a WM_CLOSE message via the Win32 SetConsoleCtrlHandler() function if it needs to do special cleanup work at vacate time; a WM_CLOSE message generates a CTRL_CLOSE_EVENT. See SetConsoleCtrlHandler() in the Win32 documentation for more info.

NOTE: The default handler in Windows for a WM_CLOSE message is for the process to exit. Of course, the job could be coded to ignore it and not exit, but eventually the condor_ startd will become impatient and hard-kill the job (if that is the policy desired by the administrator).

Finally, after the job has left and any files transferred back, the starter deletes the temporary working directory, the temporary account (if one was created), the WindowStation, and the Desktop before exiting. If the starter should terminate abnormally, the condor_ startd attempts the clean up. If for some reason the condor_ startd should disappear as well (that is, if the entire machine was power-cycled hard), the condor_ startd will clean up when Condor is restarted.

6.2.6 Security Considerations in Condor for Windows

On the execute machine (by default), the user job is run using the access token of an account dynamically created by Condor which has bare-bones access rights and privileges. For instance, if your machines are configured so that only Administrators have write access to C:\WINNT, then certainly no Condor job run on that machine would be able to write anything there. The only files the job should be able to access on the execute machine are files accessible by the Users and Everyone groups, and files in the job's temporary working directory. Of course, if the job is configured to run using the account of the submitting user (as described in section 6.2.4), it will be able to do anything that the user is able to do on the execute machine it runs on.

On the submit machine, Condor impersonates the submitting user, therefore the File Transfer mechanism has the same access rights as the submitting user. For example, say only Administrators can write to C:\WINNT on the submit machine, and a user gives the following to condor_ submit :

         executable = mytrojan.exe
         initialdir = c:\winnt
         output = explorer.exe
         queue

Unless that user is in group Administrators, Condor will not permit explorer.exe to be overwritten.

If for some reason the submitting user's account disappears between the time condor_ submit was run and when the job runs, Condor is not able to check and see if the now-defunct submitting user has read/write access to a given file. In this case, Condor will ensure that group ``Everyone'' has read or write access to any file the job subsequently tries to read or write. This is in consideration for some network setups, where the user account only exists for as long as the user is logged in.

Condor also provides protection to the job queue. It would be bad if the integrity of the job queue is compromised, because a malicious user could remove other user's jobs or even change what executable a user's job will run. To guard against this, in Condor's default configuration all connections to the condor_ schedd (the process which manages the job queue on a given machine) are authenticated using Windows' SSPI security layer. The user is then authenticated using the same challenge-response protocol that Windows uses to authenticate users to Windows file servers. Once authenticated, the only users allowed to edit job entry in the queue are:

the user who originally submitted that job (i.e. Condor allows users to remove or edit their own jobs)
users listed in the condor_config file parameter QUEUE_SUPER_USERS. In the default configuration, only the ``SYSTEM'' (LocalSystem) account is listed here.

WARNING: Do not remove ``SYSTEM'' from QUEUE_SUPER_USERS, or Condor itself will not be able to access the job queue when needed. If the LocalSystem account on your machine is compromised, you have all sorts of problems!

To protect the actual job queue files themselves, the Condor installation program will automatically set permissions on the entire Condor release directory so that only Administrators have write access.

Finally, Condor has all the IP/Host-based security mechanisms present in the full-blown version of Condor. See section 3.6.8 starting on page for complete information on how to allow/deny access to Condor based upon machine host name or IP address.

6.2.7 Network files and Condor

Condor can work well with a network file server. The recommended approach to having jobs access files on network shares is to configure jobs to run using the security context of the submitting user (see section 6.2.4). If this is done, the job will be able to access resources on the network in the same way as the user can when logged in interactively.

In some environments, running jobs as their submitting users is not a feasible option. This section outlines some possible alternatives. The heart of the difficulty in this case is that on the execute machine, Condor creates a temporary user that will run the job. The file server has never heard of this user before.

Choose one of these methods to make it work:

METHOD A: access the file server as a different user via a net use command with a login and password
METHOD B: access the file server as guest
METHOD C: access the file server with a "NULL" descriptor
METHOD D: create and have Condor use a special account
METHOD E: use the contrib module from the folks at Bristol University

All of these methods have advantages and disadvantages.

Here are the methods in more detail:

METHOD A - access the file server as a different user via a net use command with a login and password

Example: you want to copy a file off of a server before running it....

   @echo off
   net use \\myserver\someshare MYPASSWORD /USER:MYLOGIN
   copy \\myserver\someshare\my-program.exe
   my-program.exe

The idea here is to simply authenticate to the file server with a different login than the temporary Condor login. This is easy with the "net use" command as shown above. Of course, the obvious disadvantage is this user's password is stored and transferred as clear text.

METHOD B - access the file server as guest

Example: you want to copy a file off of a server before running it as GUEST

   @echo off
   net use \\myserver\someshare
   copy \\myserver\someshare\my-program.exe
   my-program.exe

In this example, you'd contact the server MYSERVER as the Condor temporary user. However, if you have the GUEST account enabled on MYSERVER, you will be authenticated to the server as user "GUEST". If your file permissions (ACLs) are setup so that either user GUEST (or group EVERYONE) has access the share "someshare" and the directories/files that live there, you can use this method. The downside of this method is you need to enable the GUEST account on your file server. WARNING: This should be done *with extreme caution* and only if your file server is well protected behind a firewall that blocks SMB traffic.

METHOD C - access the file server with a "NULL" descriptor

One more option is to use NULL Security Descriptors. In this way, you can specify which shares are accessible by NULL Descriptor by adding them to your registry. You can then use the batch file wrapper like:

net use z: \\myserver\someshare /USER:""
z:\my-program.exe

so long as 'someshare' is in the list of allowed NULL session shares. To edit this list, run regedit.exe and navigate to the key:

HKEY_LOCAL_MACHINE\
   SYSTEM\
     CurrentControlSet\
       Services\
         LanmanServer\
           Parameters\
             NullSessionShares

and edit it. unfortunately it is a binary value, so you'll then need to type in the hex ASCII codes to spell out your share. each share is separated by a null (0x00) and the last in the list is terminated with two nulls.

although a little more difficult to set up, this method of sharing is a relatively safe way to have one quasi-public share without opening the whole guest account. you can control specifically which shares can be accessed or not via the registry value mentioned above.

METHOD D - create and have Condor use a special account

Create a permanent account (called condor-guest in this description) under which Condor will run jobs. On all Windows machines, and on the file server, create the condor-guest account.

On the network file server, give the condor-guest user permissions to access files needed to run Condor jobs.

Securely store the password of the condor-guest user in the Windows registry using condor_ store_cred on all Windows machines.

Tell Condor to use the condor-guest user as the owner of jobs, when required. Details for this are in section 3.6.10.

METHOD E - access with the contrib module from Bristol

Another option: some hardcore Condor users at Bristol University developed their own module for starting jobs under Condor NT to access file servers. It involves storing submitting user's passwords on a centralized server. Below I have included the README from this contrib module, which will soon appear on our website within a week or two. If you want it before that, let me know, and I could e-mail it to you.

Here is the README from the Bristol Condor contrib module:

README
Compilation Instructions
Build the projects in the following order

CondorCredSvc
CondorAuthSvc
Crun
Carun
AfsEncrypt
RegisterService
DeleteService
Only the first 3 need to be built in order. This just makes sure that the 
RPC stubs are correctly rebuilt if required. The last 2 are only helper 
applications to install/remove the services. All projects are Visual Studio 
6 projects. The nmakefiles have been exported for each. Only the project 
for Carun should need to be modified to change the location of the AFS 
libraries if needed.

Details
CondorCredSvc
CondorCredSvc is a simple RPC service that serves the domain account 
credentials. It reads the account name and password from the registry of 
the machine it's running on. At the moment these details are stored in 
clear text under the key

HKEY_LOCAL_MACHINE\Software\Condor\CredService

The account name and password are held in REG_SZ values "Account" and 
"Password" respectively. In addition there is an optional REG_SZ value 
"Port" which holds the clear text port number (e.g. "1234"). If this value 
is not present the service defaults to using port 3654.

At the moment there is no attempt to encrypt the username/password when it 
is sent over the wire - but this should be reasonably straightforward to 
change. This service can sit on any machine so keeping the registry entries 
secure ought to be fine. Certainly the ACL on the key could be set to only 
allow administrators and SYSTEM access.

CondorAuthSvc and Crun
These two programs do the hard work of getting the job authenticated and 
running in the right place. CondorAuthSvc actually handles the process 
creation while Crun deals with getting the winstation/desktop/working 
directory and grabbing the console output from the job so that Condor's 
output handling mechanisms still work as advertised. Probably the easiest 
way to see how the two interact is to run through the job creation process:

The first thing to realize is that condor itself only runs Crun.exe. Crun 
treats its command line parameters as the program to really run. e.g. "Crun 
\\mymachine\myshare\myjob.exe" actually causes 
\\mymachine\myshare\myjob.exe to be executed in the context of the domain 
account served by CondorCredSvc. This is how it works:

When Crun starts up it gets its window station and desktop - these are the 
ones created by condor. It also gets its current directory - again already 
created by condor. It then makes sure that SYSTEM has permission to modify 
the DACL on the window station, desktop and directory. Next it creates a 
shared memory section and copies its environment variable block into it. 
Then, so that it can get hold of STDOUT and STDERR from the job it makes 
two named pipes on the machine it's running on and attaches a thread to 
each which just prints out anything that comes in on the pipe to the 
appropriate stream. These pipes currently have a NULL DACL, but only one 
instance of each is allowed so there shouldn't be any issues involving 
malicious people putting garbage into them. The shared memory section and 
both named pipes are tagged with the ID of Crun's process in case we're on 
a multi-processor machine that might be running more than one job. Crun 
then makes an RPC call to CondorAuthSvc to actually start the job, passing 
the names of the window station, desktop, executable to run, current 
directory, pipes and shared memory section (it only attempts to call 
CondorAuthSvc on the same machine as it is running on). If the jobs starts 
successfully it gets the process ID back from the RPC call and then just 
waits for the new process to finish before closing the pipes and exiting. 
Technically, it does this by synchronizing on a handle to the process and 
waiting for it to exit. CondorAuthSvc sets the ACL on the process to allow 
EVERYONE  to synchronize on it.

[ Technical note: Crun adds "C:\WINNT\SYSTEM32\CMD.EXE /C" to the start of 
the command line. This is because the process is created with the network 
context of the caller i.e. LOCALSYSTEM. Pre-pending cmd.exe gets round any 
unexpected "Access Denied" errors. ]

If Crun gets a WM_CLOSE (CTRL_CLOSE_EVENT) while the job is running it 
attempts to stop the job, again with an RPC call to CondorAuthSvc passing 
the job's process ID.

CondorAuthSvc runs as a service under the LOCALSYSTEM account and does the 
work of starting the job. By default it listens on port 3655, but this can 
be changed by setting the optional REG_SZ value "Port" under the registry key

HKEY_LOCAL_MACHINE\Software\Condor\AuthService

(Crun also checks this registry key when attempting to contact 
CondorAuthSvc.) When it gets the RPC to start a job CondorAuthSvc first 
connects to the pipes for STDOUT and STDERR to prevent anyone else sending 
data to them. It also opens the shared memory section with the environment 
stored by Crun.  It then makes an RPC call to CondorCredSvc (to get the 
name and password of the domain account) which is most likely running on 
another system. The location information is stored in the registry under 
the key

HKEY_LOCAL_MACHINE\Software\Condor\CredService

The name of the machine running CondorCredSvc must be held in the REG_SZ 
value "Host". This should be the fully qualified domain name of the 
machine. You can also specify the optional "Port" REG_SZ value in case you 
are running CondorCredSvc on a different port.

Once the domain account credentials have been received the account is 
logged on through a call to LogonUser. The DACLs on the window station, 
desktop and current directory are then modified to allow the domain account 
access to them and the job is started in that window station and desktop 
with a call to CreateProcessAsUser. The starting directory is set to the 
same as sent by Crun, STDOUT and STDERR handles are set to the named pipes 
and the environment sent by Crun is used. CondorAuthSvc also starts a 
thread which waits on the new process handle until it terminates to close 
the named pipes. If the process starts correctly the process ID is returned 
to Crun.

If Crun requests that the job be stopped (again via RPC), CondorAuthSvc 
loops over all windows on the window station and desktop specified until it 
finds the one associated with the required process ID. It then sends that 
window a WM_CLOSE message, so any termination handling built in to the job 
should work correctly.

[Security Note: CondorAuthSvc currently makes no attempt to verify the 
origin of the call starting the job. This is, in principal, a bad thing 
since if the format of the RPC call is known it could let anyone start a 
job on the machine in the context of the domain user. If sensible security 
practices have been followed and the ACLs on sensitive system directories 
(such as C:\WINNT) do not allow write access to anyone other than trusted 
users the problem should not be too serious.]

Carun and AFSEncrypt
Carun and AFSEncrypt are a couple of utilities to allow jobs to access AFS 
without any special recompilation. AFSEncrypt encrypts an AFS 
username/password into a file (called .afs.xxx) using a simple XOR 
algorithm. It's not a particularly secure way to do it, but it's simple and 
self-inverse. Carun reads this file and gets an AFS token before running 
whatever job is on its command line as a child process. It waits on the 
process handle and a 24 hour timer. If the timer expires first it briefly 
suspends the primary thread of the child process and attempts to get a new 
AFS token before restarting the job, the idea being that the job should 
have uninterrupted access to AFS if it runs for more than 25 hours (the 
default token lifetime). As a security measure, the AFS credentials are 
cached by Carun in memory and the .afs.xxx file deleted as soon as the 
username/password have been read for the first time.

Carun needs the machine to be running either the IBM AFS client or the 
OpenAFS client to work. It also needs the client libraries if you want to 
rebuild it.

For example, if you wanted to get a list of your AFS tokens under Condor 
you would run the following:

Crun \\mymachine\myshare\Carun tokens.exe

Running a job
To run a job using this mechanism specify the following in your job 
submission (assuming Crun is in C:\CondorAuth):

Executable= c:\CondorAuth\Crun.exe
Arguments = \\mymachine\myshare\carun.exe 
\\anothermachine\anothershare\myjob.exe
Transfer_Input_Files = .afs.xxx

along with your usual settings.

Installation
A basic installation script for use with the Inno Setup installation 
package compiler can be found in the Install folder.

6.2.8 Interoperability between Condor for Unix and Condor for Windows

Unix machines and Windows machines running Condor can happily co-exist in the same Condor pool without any problems. Jobs submitted on Windows can run on Windows or Unix, and jobs submitted on Unix can run on Unix or Windows. Without any specification (using the requirements expression in the submit description file), the default behavior will be to require the execute machine to be of the same architecture and operating system as the submit machine.

There is absolutely no need to run more than one Condor central manager, even if you have both Unix and Windows machines. The Condor central manager itself can run on either Unix or Windows; there is no advantage to choosing one over the other. Here at University of Wisconsin-Madison, for instance, we have hundreds of Unix (Solaris, Linux, etc) and Windows machines in our Computer Science Department Condor pool. Our central manager is running on Linux. All is happy.

6.2.9 Some differences between Condor for Unix -vs- Condor for Windows

On Unix, we recommend the creation of a ``condor'' account when installing Condor. On Windows, this is not necessary, as Condor is designed to run as a system service as user LocalSystem.
On Unix, Condor finds the condor_config main configuration file by looking in ~condor, in /etc, or via an environment variable. On NT, the location of condor_config file is determined via the registry key HKEY_LOCAL_MACHINE/Software/Condor. You can override this value by setting an environment variable named CONDOR_CONFIG.
On Unix, in the VANILLA universe at job vacate time Condor sends the job a softkill signal defined in the submit-description file (defaults to SIGTERM). On NT, Condor sends a WM_CLOSE message to the job at vacate time.
On Unix, if one of the Condor daemons has a fault, a core file will be created in the $(Log) directory. On Condor NT, a ``core'' file will also be created, but instead of a memory dump of the process it will be a very short ASCII text file which describes what fault occurred and where it happened. This information can be used by the Condor developers to fix the problem.

6.2.10 Installation on Windows

This section contains the instructions for installing the Microsoft Windows version of Condor at your site. The install program will set you up with a slightly customized configuration file that you can further customize after the installation has completed.

Please read the copyright and disclaimer information in section on page of the manual, or in the file LICENSE.TXT, before proceeding. Installation and use of Condor is acknowledgement that you have read and agreed to these terms.

Be sure that the Condor tools that get run are of the same version as the daemons installed. If they were not (such as 6.5.3 daemons, when running 6.4 condor_ submit), then things will not work. There may be errors generated by the condor_ schedd daemon (in the log). It is likely that a job would be correctly placed in the queue, but the job will never run.

The Condor executable for distribution is packaged in a single file such as:

  condor-6.7.8-winnt40-x86.msi

This file is approximately 80 Mbytes in size, and may be removed once Condor is fully installed.

Before installing Condor, please consider joining the condor-world mailing list. Traffic on this list is kept to an absolute minimum. It is only used to announce new releases of Condor. To subscribe, follow the directions given at http://www.cs.wisc.edu/condor/mail-lists/.

6.2.10.1 Installation Requirements

Condor for Windows requires Windows 2000 (or better) or Windows XP.
300 megabytes of free disk space is recommended. Significantly more disk space could be desired to be able to run jobs with large data files.
Condor for Windows will operate on either an NTFS or FAT filesystem. However, for security purposes, NTFS is preferred.

6.2.10.2 Preparing to Install Condor under Windows

Before you install the Windows version of Condor at your site, there are two major decisions to make about the basic layout of your pool.

What machine will be the central manager?
Do I have enough disk space for Condor?

If you feel that you already know the answers to these questions, skip to the Windows Installation Procedure section below, section 6.2.10 on page . If you are unsure, read on.

What machine will be the central manager?
One machine in your pool must be the central manager. This is the centralized information repository for the Condor pool and is also the machine that matches available machines with waiting jobs. If the central manager machine crashes, any currently active matches in the system will keep running, but no new matches will be made. Moreover, most Condor tools will stop working. Because of the importance of this machine for the proper functioning of Condor, we recommend you install it on a machine that is likely to stay up all the time, or at the very least, one that will be rebooted quickly if it does crash. Also, because all the services will send updates (by default every 5 minutes) to this machine, it is advisable to consider network traffic and your network layout when choosing the central manager.
For Personal Condor, your machine will act as your central manager.
Install Condor on the central manager before installing on the other machines within the pool.
Do I have enough disk space for Condor?
The Condor release directory takes up a fair amount of space. The size requirement for the release directory is approximately 200 Mbytes.
Condor itself, however, needs space to store all of your jobs, and their input files. If you will be submitting large amounts of jobs, you should consider installing Condor on a volume with a large amount of free space.

6.2.10.3 Installation Procedure using the included Set Up Program

Installation of Condor must be done by a user with administrator privileges. After installation, the Condor services will be run under the local system account. When Condor is running a user job, however, it will run that User job with normal user permissions.

Download Condor, and start the installation process by running the file (or by double clicking on the file). The Condor installation is completed by answering questions and choosing options within the following steps.

If Condor is already installed.

For upgrade purposes, you may be running the installation of Condor after it has been previously installed. In this case, a dialog box will appear before the installation of Condor proceeds. The question asks if you wish to preserve your current Condor configuration files. Answer yes or no, as appropriate.

If you answer yes, your configuration files will not be changed, and you will proceed to the point where the new binaries will be installed.

If you answer no, then there will be a second question that asks if you want to use answers given during the previous installation as default answers.

STEP 1: License Agreement.

The first step in installing Condor is a welcome screen and license agreement. You are reminded that it is best to run the installation when no other Windows programs are running. If you need to close other Windows programs, it is safe to cancel the installation and close them. You are asked to agree to the license. Answer yes or no. If you should disagree with the License, the installation will not continue.

After agreeing to the license terms, the next Window is where fill in your name and company information, or use the defaults as given.

STEP 2: Condor Pool Configuration.

The Condor installation will require different information depending on whether the installer will be creating a new pool, or joining an existing one.

If you are creating a new pool, the installation program requires that this machine is the central manager. For the creation of a new Condor pool, you will be asked some basic information about your new pool:

Name of the pool
hostname: of this machine.
Size of pool: Condor needs to know if this a Personal Condor installation, or if there will be more than one machine in the pool. A Personal Condor pool implies that there is only one machine in the pool. For Personal Condor, several of the following steps are omitted as noted.

If you are joining an existing pool, all the installation program requires is the hostname of the central manager for your pool.

STEP 3: This Machine's Roles.

This step is omitted for the installation of Personal Condor.

Each machine within a Condor pool may either submit jobs or execute submitted jobs, or both submit and execute jobs. This step allows the installation on this machine to choose if the machine will only submit jobs, only execute submitted jobs, or both. The common case is both, so the default is both.

STEP 4: Where will Condor be installed?

The next step is where the destination of the Condor files will be decided. It is recommended that Condor be installed in the location shown as the default in the dialog box: C:\Condor.

Installation on the local disk is chosen for several reasons.

The Condor services run as local system, and within Microsoft Windows, local system has no network privileges. Therefore, for Condor to operate, Condor should be installed on a local hard drive as opposed to a network drive (file server).

The second reason for installation on the local disk is that the Windows usage of drive letters has implications for where Condor is placed. The drive letter used must be not change, even when different users are logged in. Local drive letters do not change under normal operation of Windows.

While it is strongly discouraged, it may be possible to place Condor on a hard drive that is not local, if a dependency is added to the service control manager such that Condor starts after the required file services are available.

STEP 5: Where is the Java Virtual Machine?

While not required, it is possible for Condor to run jobs in the Java universe. In order for Condor to have support for java, you must supply a path to java.exe on your system. The installer will tell you if the path is invalid before proceeding to the next step. To disable the Java universe, simply leave this field blank.

STEP 6: Where should Condor send e-mail if things go wrong?

Various parts of Condor will send e-mail to a Condor administrator if something goes wrong and requires human attention. You specify the e-mail address and the SMTP relay host of this administrator. Please pay close attention to this email since it will indicate problems in your Condor pool.

STEP 7: The domain.

This step is omitted for the installation of Personal Condor.

Enter the machine's accounting (or UID) domain. On this version of Condor for Windows, this setting only used for User priorities (see section 3.4 on page ) and to form a default email address for the user.

STEP 8: Access permissions.

This step is omitted for the installation of Personal Condor.

Machines within the Condor pool will need various types of access permission. The three categories of permission are read, write, and administrator. Enter the machines to be given access permissions.

Read: Read access allows a machine to obtain information about Condor such as the status of machines in the pool and the job queues. All machines in the pool should be given read access. In addition, giving read access to *.cs.wisc.edu will allow the Condor team to obtain information about your Condor pool in the event that debugging is needed.
Write: All machines in the pool should be given write access. It allows the machines you specify to send information to your local Condor daemons, for example, to start a Condor Job. Note that for a machine to join the Condor pool, it must have both read and write access to all of the machines in the pool.
Administrator: A machine with administrator access will be allowed more extended permission to to things such as change other user's priorities, modify the job queue, turn Condor services on and off, and restart Condor. The central manager should be given administrator access and is the default listed. This setting is granted to the entire machine, so care should be taken not to make this too open.

For more details on these access permissions, and others that can be manually changed in your condor_config file, please see the section titled Setting Up IP/Host-Based Security in Condor in section section 3.6.8 on page .

STEP 9: Job Start Policy.

Condor will execute submitted jobs on machines based on a preference given at installation. Three options are given, and the first is most commonly used by Condor pools. This specification may be changed or refined in the machine ClassAd requirements attribute.

The three choices:

After 15 minutes of no console activity and low CPU activity.
Always run Condor jobs.
After 15 minutes of no console activity.

Console activity is the use of the mouse or keyboard. For instance, if you are reading this document online, and are using either the mouse or the keyboard to change your position, you are generating Console activity.

Low CPU activity is defined as a load of less than 30% (and is configurable in your condor_config file). If you have a multiple processor machine, this is the average percentage of CPU activity for both processors.

For testing purposes, it is often helpful to use use the Always run Condor jobs option. For production mode, however, most people chose the After 15 minutes of no console activity and low CPU activity.

STEP 10: Job Vacate Policy.

This step is omitted if Condor jobs are always run as the option chosen in STEP 9.

If Condor is executing a job and the user returns, Condor will immediately suspend the job, and after five minutes Condor will decide what to do with the partially completed job. There are currently two options for the job.

The job is killed 5 minutes after your return.: The job is suspended immediately once there is console activity. If the console activity continues, then the job is vacated (killed) after 5 minutes. Since this version does not include check-pointing, the job will be restarted from the beginning at a later time. The job will be placed back into the queue.
Suspend job, leaving it in memory.: The job is suspended immediately. At a later time, when the console activity has stopped for ten minutes, the execution of Condor job will be resumed (the job will be unsuspended). The drawback to this option is that since the job will remain in memory, it will occupy swap space. In many instances, however, the amount of swap space that the job will occupy is small.

So which one do you choose? Killing a job is less intrusive on the workstation owner than leaving it in memory for a later time. A suspended job left in memory will require swap space, which could possibly be a scarce resource. Leaving a job in memory, however, has the benefit that accumulated run time is not lost for a partially completed job.

STEP 11: Review entered information.

Check that the entered information is correctly entered. You have the option to return to previous dialog boxes to fix entries.

6.2.10.4 Unattended Installation Procedure using the included Set Up Program

This section details how to run the Condor for Windows installer in an unattended batch mode, i.e. completely from the command prompt without the GUI interface.

The Condor for Windows installer uses the Microsoft Installer (MSI) technology, and can be configured for unattended installs just like any other ordinary MSI installer.

The following is a sample batch file that is used to set all the properties necessary for an unattended install.

@echo on
set ARGS=
set ARGS=%ARGS% NEWPOOL=N
set ARGS=%ARGS% POOLNAME=""
set ARGS=%ARGS% RUNJOBS=C
set ARGS=%ARGS% VACATEJOBS=Y
set ARGS=%ARGS% SUBMITJOBS=Y
set ARGS=%ARGS% CONDOREMAIL="you@yours.com"
set ARGS=%ARGS% HOSTALLOWREAD="*"
set ARGS=%ARGS% HOSTALLOWWRITE="*"
set ARGS=%ARGS% HOSTALLOWADMINISTATOR="$(FULL_HOSTNAME)"
set ARGS=%ARGS% INSTALLDIR="C:\Condor"
set ARGS=%ARGS% POOLHOSTNAME="$(FULL_HOSTNAME)"
set ARGS=%ARGS% ACCOUNTINGDOMAIN="none"
set ARGS=%ARGS% JVMLOCATION="C:\Windows\system32\java.exe"
set ARGS=%ARGS% SMTPSERVER="smtp.localhost"

msiexec /qb /l* condor-install-log.txt /i condor-6.7.18-winnt50-x86.msi %ARGS%

Each property corresponds to answers supplied in the interactive installer as described above. The following is a brief explanation of each property as it applies to unattended installations:

NEWPOOL = < Y | N >

determines whether the installer will create a new pool with the target machine as the central manager.

POOLNAME

sets the name of the pool if a new pool is to be created. Possible values are either the name or the empty string "".

RUNJOBS = < N | A | I | C >

determines when Condor will run jobs. This can be set to:

Never run jobs (N)
Always run jobs (A)
Only run jobs when the keyboard and mouse are Idle (I)
Only run jobs when the keyboard and mouse are idle and the CPU usage is low (C)

VACATEJOBS = < Y | N >

determines what Condor should do when it has to stop the execution of a user job. When set to Y, Condor will vacate the job and start it somewhere else if possible. When set to N, Condor will merely suspend the job in memory and wait for the machine to become available again.

SUBMITJOBS = < Y | N >

will cause the installer to configure the machine as a submit node when set to Y.

CONDOREMAIL

sets the e-mail address of the Condor admininistrator. Possible values are an e-mail address or the empty string "".

HOSTALLOWREAD

is a list of host names that are allowed to issue READ commands to Condor daemons. This value should be set in accordance with the HOSTALLOW_READ setting in the configuration file, as described in section 3.6.8 on page

HOSTALLOWWRITE

is a list of host names that are allowed to issue WRITE commands to Condor daemons. This value should be set in accordance with the HOSTALLOW_WRITE setting in the configuration file, as described in section 3.6.8 on page

HOSTALLOWADMINISTRATOR

is a list of host names that are allowed to issue ADMINISTRATOR commands to Condor daemons. This value should be set in accordance with the HOSTALLOW_ADMINISTRATOR setting in the configuration file, as described in section 3.6.8 on page

INSTALLDIR

defines the path to where Condor will be installed.

POOLHOSTNAME

defines the host name of the pool's central manager.

ACCOUNTINGDOMAIN

defines the accounting (or UID) domain the target machine will be in.

JVMLOCATION

defines the path to Java virtual machine on the target machine.

SMTPSERVER

defines the host name of the SMTP server that the target machine is to use to send e-mail.

After defining each of these properties for the MSI installer, the installer can be started with the msiexec command. The following command starts the installer in unattended mode, and dumps a journal of the installer's progress to a log file:
msiexec /qb /l* condor-install-log.txt /i condor-6.7.18-winnt50-x86.msi [property=value] ...

More information on the features of msiexec can be found at Microsoft's website at http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/msiexec.mspx.

6.2.10.5 Manual Installation Condor on Windows

If you are to install Condor on many different machines, you may wish to use some other mechanism to install Condor on additional machines rather than running the Setup program described above on each machine.

WARNING: This is for advanced users only! All others should use the Setup program described above.

Here is a brief overview of how to install Condor manually without using the provided GUI-based setup program:

The Service

The service that Condor will install is called "Condor". The Startup Type is Automatic. The service should log on as System Account, but do not enable "Allow Service to Interact with Desktop". The program that is run is condor_ master.exe.

The Condor service can be installed and removed using the sc.exe tool, which is included in Windows XP and Windows 2003 Server. The tool is also available as part of the Windows 2000 Resource Kit.

Installation can be done as follows:

sc create Condor binpath= c:\condor\bin\condor_master.exe

To remove the service, use:

sc delete Condor

The Registry

Condor uses a few registry entries in its operation. The key that Condor uses is HKEY_LOCAL_MACHINE/Software/Condor. The values that Condor puts in this registry key serve two purposes.

The values of CONDOR_CONFIG and RELEASE_DIR are used for Condor to start its service.
CONDOR_CONFIG should point to the condor_config file. In this version of Condor, it must reside on the local disk.
RELEASE_DIR should point to the directory where Condor is installed. This is typically C:\Condor, and again, this must reside on the local disk.
The other purpose is storing the entries from the last installation so that they can be used for the next one.

The Filesystem

The files that are needed for Condor to operate are identical to the Unix version of Condor, except that executable files end in .exe. For example the on Unix one of the files is condor_master and on Condor the corresponding file is condor_master.exe.

These files currently must reside on the local disk for a variety of reasons. Advanced Windows users might be able to put the files on remote resources. The main concern is twofold. First, the files must be there when the service is started. Second, the files must always be in the same spot (including drive letter), no matter who is logged into the machine.

6.2.10.6 Condor is installed... now what?

After the installation of Condor is completed, the Condor service must be started. If you used the GUI-based setup program to install Condor, the Condor service should already be started. If you installed manually, Condor must be started by hand, or you can simply reboot. NOTE: The Condor service will start automatically whenever you reboot your machine.

To start Condor by hand:

From the Start menu, choose Settings.
From the Settings menu, choose Control Panel.
From the Control Panel, choose Services.
From Services, choose Condor, and Start.

Or, alternatively you can enter the following command from a command prompt:

         net start condor

Run the Task Manager (Control-Shift-Escape) to check that Condor services are running. The following tasks should be running:

condor_ master.exe
condor_ negotiator.exe, if this machine is a central manager.
condor_ collector.exe, if this machine is a central manager.
condor_ startd.exe, if you indicated that this Condor node should start jobs
condor_ schedd.exe, if you indicated that this Condor node should submit jobs to the Condor pool.

Also, you should now be able to open up a new cmd (DOS prompt) window, and the Condor bin directory should be in your path, so you can issue the normal Condor commands, such as condor_ q and condor_ status.

6.2.10.7 Condor is running... now what?

Once Condor services are running, try building and submitting some test jobs. See the README.TXT file in the examples directory for details.

Next: 6.3 Macintosh OS X Up: 6. Platform-Specific Information Previous: 6.1 Linux Contents Index

condor-admin@cs.wisc.edu