next up previous contents index
Next: 3.5 Startd Policy Configuration Up: 3. Administrators' Manual Previous: 3.3 Configuration   Contents   Index


3.4 User Priorities and Negotiation

Condor uses priorities to determine machine allocation for jobs. This section details the priorities and the allocation of machines (negotiation).

For accounting purposes, each user is identified by username@uid_domain. Each user is assigned a priority value even if submitting jobs from different machines in the same domain, or even if submitting from multiple machines in the different domains.

The numerical priority value assigned to a user is inversely related to the goodness of the priority. A user with a numerical priority of 5 gets more resources than a user with a numerical priority of 50. There are two priority values assigned to Condor users:

This section describes these two priorities and how they affect resource allocations in Condor. Documentation on configuring and controlling priorities may be found in section 3.3.18.

3.4.1 Real User Priority (RUP)

A user's RUP measures the resource usage of the user through time. Every user begins with a RUP of one half (0.5), and at steady state, the RUP of a user equilibrates to the number of resources used by that user. Therefore, if a specific user continuously uses exactly ten resources for a long period of time, the RUP of that user stabilizes at ten.

However, if the user decreases the number of resources used, the RUP gets better. The rate at which the priority value decays can be set by the macro PRIORITY_HALFLIFE , a time period defined in seconds. Intuitively, if the PRIORITY_HALFLIFE in a pool is set to 86400 (one day), and if a user whose RUP was 10 removes all his jobs, the user's RUP would be 5 one day later, 2.5 two days later, and so on.

3.4.2 Effective User Priority (EUP)

The effective user priority (EUP) of a user is used to determine how many resources that user may receive. The EUP is linearly related to the RUP by a priority factor which may be defined on a per-user basis. Unless otherwise configured, the priority factor for all users is 1.0, and so the EUP is the same as the the RUP. However, if desired, the priority factors of specific users (such as remote submitters) can be increased so that others are served preferentially.

The number of resources that a user may receive is inversely related to the ratio between the EUPs of submitting users. Therefore user A with EUP=5 will receive twice as many resources as user B with EUP=10 and four times as many resources as user C with EUP=20. However, if A does not use the full number of allocated resources, the available resources are repartitioned and distributed among remaining users according to the inverse ratio rule.

Condor supplies mechanisms to directly support two policies in which EUP may be useful:

Nice users
A job may be submitted with the parameter nice_user set to TRUE in the submit command file. A nice user job gets its RUP boosted by the NICE_USER_PRIO_FACTOR priority factor specified in the configuration file, leading to a (usually very large) EUP. This corresponds to a low priority for resources. These jobs are therefore equivalent to Unix background jobs, which use resources not used by other Condor users.

Remote Users
The flocking feature of Condor (see section 5.2) allows the condor_ schedd to submit to more than one pool. In addition, the submit-only feature allows a user to run a condor_ schedd that is submitting jobs into another pool. In such situations, submitters from other domains can submit to the local pool. It is often desirable to have Condor treat local users preferentially over these remote users. If configured, Condor will boost the RUPs of remote users by REMOTE_PRIO_FACTOR specified in the configuration file, thereby lowering their priority for resources.

The priority boost factors for individual users can be set with the setfactor option of condor_ userprio. Details may be found in the condor_ userprio manual page on page [*].

3.4.3 Priorities and Preemption

Priorities are used to ensure that users get their fair share of resources. The priority values are used at allocation time. In addition, Condor preempts machine claims and reallocates them when conditions change.

To ensure that preemptions do not lead to thrashing, a PREEMPTION_REQUIREMENTS expression is defined to specify the conditions that must be met for a preemption to occur. It is usually defined to deny preemption if a current running job has been running for a relatively short period of time. This effectively limits the number of preemptions per resource per time interval.

Note that PREEMPTION_REQUIREMENTS only applies to preemptions due to user priority. It does not have any effect if the machine rank expression prefers a different job, or if the startd policy expression causes the job to vacate due to other activity on the machine. See section 3.5.10 for a general discussion of limiting preemption.

3.4.4 Priority Calculation

This section may be skipped if the reader so feels, but for the curious, here is Condor's priority calculation algorithm.

The RUP of a user u at time t, $\pi_r(u,t)$, is calculated every time interval $\delta t$ using the formula

\begin{displaymath}\pi_r(u,t) = \beta\times\pi(u,t-\delta t) + (1-\beta)\times\rho(u,t)\end{displaymath}

where $\rho(u,t)$ is the number of resources used by user u at time t, and $\beta=0.5^{{\delta t}/h}$. h is the half life period set by PRIORITY_HALFLIFE .

The EUP of user u at time t, $\pi_e(u,t)$ is calculated by

\begin{displaymath}\pi_e(u,t) = \pi_r(u,t)\times f(u,t)\end{displaymath}

where f(u,t) is the priority boost factor for user u at time t.

As mentioned previously, the RUP calculation is designed so that at steady state, each user's RUP stabilizes at the number of resources used by that user. The definition of $\beta$ ensures that the calculation of $\pi_r(u,t)$ can be calculated over non-uniform time intervals $\delta t$ without affecting the calculation. The time interval $\delta t$ varies due to events internal to the system, but Condor guarantees that unless the central manager machine is down, no matches will be unaccounted for due to this variance.

3.4.5 Negotiation

Negotiation is the method Condor undergoes periodically to match queued jobs with resources capable of running jobs. The condor_ negotiator daemon is responsible for negotiation.

During a negotiation cycle, the condor_ negotiator daemon accomplishes the following ordered list of items.

  1. Build a list of all possible resources, regardless of the state of those resources.
  2. Obtain a list of all job submitters (for the entire pool).
  3. Sort the list of all job submitters based on EUP (see section 3.4.2 for an explanation of EUP). The submitter with the best priority is first within the sorted list.
  4. Iterate until there are either no more resources to match, or no more jobs to match.
    For each submitter (in EUP order):
    For each submitter, get each job. Since jobs may be submitted from more than one machine (hence to more than one condor_ schedd daemon), here is a further definition of the ordering of these jobs. With jobs from a single condor_ schedd daemon, jobs are typically returned in job priority order. When more than one condor_ schedd daemon is involved, they are contacted in an undefined order. All jobs from a single condor_ schedd daemon are considered before moving on to the next. For each job:
    • For each machine in the pool that can execute jobs:
      1. If machine.requirements evaluates to False or job.requirements evaluates to False, skip this machine
      2. If the machine is in the Claimed state, but not running a job, skip this machine.
      3. If this machine is not running a job, add it to the potential match list by reason of No Preemption.
      4. If the machine is running a job
        • If the machine.RANK on this job is better than the running job, add this machine to the potential match list by reason of Rank.
        • If the EUP of this job is better than the EUP of the currently running job, and PREEMPTION_REQUIREMENTS is True, and the machine.RANK on this job is not worse than the currently running job, add this machine to the potential match list by reason of Priority.
    • Of machines in the potential match list, sort by NEGOTIATOR_PRE_JOB_RANK, job.RANK, NEGOTIATOR_POST_JOB_RANK, Reason for claim (No Preemption, then Rank, then Priority), PREEMPTION_RANK
    • The job is assigned to the top machine on the potential match list. The machine is removed from the list of resources to match (on this negotiation cycle).

The condor_ negotiatior asks the condor_ schedd for the "next job" from a given submitter/user. Typically, the condor_ schedd returns jobs in the order of job priority. If priorities are the same, job submission time is used; older jobs go first. If a cluster has multiple procs in it and one of the jobs cannot be matched, the condor_ schedd will not return any more jobs in that cluster on that negotiation pass. This is an optimization based on the theory that the cluster jobs are similar. The configuration variable NEGOTIATE_ALL_JOBS_IN_CLUSTER disables the cluster-skipping optimization. Use of the configuration variable SIGNIFICANT_ATTRIBUTES will change the definition of what the condor_ schedd considers a cluster from the default definition of all jobs that share the same ClusterId.

3.4.6 Group Accounting

By default, Condor does all accounting on a per-user basis, and this accounting is primarily used to compute priorities for Condor's fair-share scheduling algorithms. However, accounting can also be done on a per-group basis. Multiple users can all submit jobs into the same accounting group, and all of the jobs will be treated with the same priority.

To use an accounting group, each job inserts an attribute into the job ClassAd which defines the accounting group name for the job. A common name is decided upon and used for the group. The following line is an example that defines the attribute within the job's submit description file:

+AccountingGroup = "group_physics"

The AccountingGroup attribute is a string, and it therefore must be enclosed in double quote marks. The string may have a maximum length of 40 characters. The name should not be qualified with a domain. Certain parts of the Condor system do append the value $(UID_DOMAIN) (as specified in the configuration file on the submit machine) to this string for internal use. For example, if the value of UID_DOMAIN is, and the accounting group name is as specified, condor_ userprio will show statistics for this accounting group using the appended domain, for example

User Name                           Priority
------------------------------      ---------                0.50                        23.11                  111.13

Additionally, the condor_ userprio command allows administrators to remove an entity from the accounting system in Condor. The -delete option to condor_ userprio accomplishes this if all the jobs from a given accounting group are completed, and the administrator wishes to remove that group from the system. The -delete option identifies the accounting group with the fully-qualified name of the accounting group. For example

condor_userprio -delete

Condor removes entities itself as they are no longer relevant. Intervention by an administrator to delete entities can be beneficial when the use of thousands of short term accounting groups leads to scalability issues.

Note that the name of an accounting group may include a period (.), and the period character has no special interpretation for group accounting. For group users, as described in the next section, a period does have special meaning.

3.4.7 Group Quotas

The use of group quotas modifies the negotiation for available resources (machines) within a Condor pool. This solves the difficulties inherent when priorities assigned based on each single user are insufficient. This may be the case when different groups (of varying size) own computers, and the groups choose to combine their computers to form a Condor pool. Consider an imaginary Condor pool example with thirty computers. Twenty computers are owned by the physics group and ten computers are owned by the chemistry group. One notion of fair allocation could be implemented by configuring the twenty machines owned by the physics group to prefer (using the RANK configuration macro) jobs submitted by the users identified as associated with the physics group. Likewise, the ten machines owned by the chemistry group are configured to prefer jobs from users associated with the the chemistry group. This routes jobs to execute on specific machines, perhaps causing more preemption than necessary. The (fair allocation) policy desired is likely somewhat different, if these thirty machines have been pooled. It does not tie users to specific sets of machines, but to numbers of machines (a quota). Given thirty similar machines, the desired policy allows users within the physics group to have preference on up to twenty of the machines within the pool, and the machines can be any of the machines that are available.

A quota for a set of users requires an identification for the set; members are called group users. Jobs to be negotiated for under the group quota specify the group user with the AccountingGroup job ClassAd, as described above.

The syntax for specifying a group user is

The group is a name chosen for the group. Group names are not required to begin with the string "group_", as in the examples "group_physics.newton" and "group_chemistry.curie", but it is a useful convention because group names must not conflict with user names. The period character between the group and the user name is a required part of the syntax.

Configuration controls the order of negotiation for groups and individual users, as well as sets quotas (preferentially allocated number of machines) for the groups. A declared number of virtual machines specifies the quota for each group (see GROUP_QUOTA_<groupname> in section 3.3.18). The sum of the quotas for all groups must be less than or equal to the number of virtual machines in the entire pool. If the sum is less than the number of virtual machines in the entire pool, the remaining machines are allocated to the none group, comprised of the general users not submitting jobs in a group.

Where group users are specified for jobs, accounting is done per group user. It is no longer done by group, or by individual user.

Negotiation is changed when group quotas are used. Condor negotiates first for defined groups, and then for independent job submitters. Given jobs belonging to different groups, Condor negotiates first for the group currently utilizing the smallest percentage of machines in its quota. After this, Condor negotiates for the group currently utilizing the second smallest percentage of machines in its quota. The last group will be the one with the highest percentage of machines in its quota. As an example, again use the imaginary pool and groups given above. If various users within group_physics have jobs running on 15 computers, then the physics group has 75% of the machines within its quota. If various users within group_chemistry have jobs running on 5 computers, then the chemistry group has 50% of the machines within its quota. Negotiation will take place for the chemistry group first. For independent job submissions (those not part of any group), the classic Condor user fair share algorithm still applies.

Note that there is no verification that a user is a member of the group that he claims. We rely on societal pressure for enforcement.

Configuration variables affect group quotas. See section 3.3.18 for detailed descriptions of the variables mentioned. Group names that may be given quotas to be used in negotiation are listed in the GROUP_NAMES macro. The names chosen must not conflict with Condor user names. Quotas (by group) are defined in numbers of virtual machines. Each group may assign an initial value for each group user's user priority factor with the GROUP_PRIO_FACTOR_<groupname> macro. If a group is currently allocated its entire quota of machines, and a group user has a submitted job that is not running, the GROUP_AUTOREGROUP macro allows the job to be considered a second time within the negotiation cycle along with all other individual users' jobs.

next up previous contents index
Next: 3.5 Startd Policy Configuration Up: 3. Administrators' Manual Previous: 3.3 Configuration   Contents   Index