Next: 3.13 Setting Up for
Up: 3. Administrators' Manual
Previous: 3.11 The High Availability
Contents
Index
Subsections
3.12 Quill
Quill builds and maintains a mirror database of a Condor job queue.
The condor_ quill daemon implements it,
and the condor_ q and condor_ history tools use it.
3.12.1 Installation and Configuration
Quill uses the PostgreSQL database management system.
Quill uses the PostgreSQL server as its back end
and client library,
libpq to talk to the server.
While Quill works and has been tested with PostgreSQL
versions 7.4 and beyond, we strongly recommend the use of version
8.1 or
later due to its integrated facilities of certain key database maintenance
tasks.
Obtain PostgreSQL from
http://www.postgresql.org/ftp/source/
Installation instructions are detailed in:
http://www.postgresql.org/docs/8.1/static/installation.html
Configure PostgreSQL after installation:
- Configure to accept TCP/IP connections.
For PostgreSQL version 8,
use the listen_addresses variable in
postgresql.conf file as a guide.
For example,
listen_addresses = '*'
means listen on any IP interface.
In PostgreSQL version 7,
this was accomplished by setting
tcpip_socket=true in the postgresql.conf file.
- Configure automatic vacuuming.
For PostgreSQL version 8.1 or later,
ensure that these variables with these defaults are
commented in and/or set properly in the
postgresql.conf configuration file:
# Turn on/off automatic vacuuming
autovacuum = on
# time between autovacuum runs, in secs
autovacuum_naptime = 60
# min # of tuple updates before vacuum
autovacuum_vacuum_threshold = 1000
# min # of tuple updates before analyze
autovacuum_analyze_threshold = 500
# fraction of rel size before vacuum
autovacuum_vacuum_scale_factor = 0.4
# fraction of rel size before analyze
autovacuum_analyze_scale_factor = 0.2
# default vacuum cost delay for
# autovac, -1 means use
# vacuum_cost_delay
autovacuum_vacuum_cost_delay = -1
# default vacuum cost limit for
# autovac, -1 means use
# vacuum_cost_limit
autovacuum_vacuum_cost_limit = -1
- Configure PostgreSQL to accept TCP/IP connections from
specific hosts.
Modify the pg_hba.conf file
(which usually resides in the PostgreSQL server's data directory).
Access is required by the condor_ quill daemon,
as well as the database users
``quillreader'' and ``quillwriter''.
For example, to give
database users ``quillreader'' and ``quillwriter''
password-enabled access to all databases on current machine from any
other machine in the network, add the following:
host |
all |
quillreader |
128.105.0.0 |
255.255.0.0 |
password |
host |
all |
quillwriter |
128.105.0.0 |
255.255.0.0 |
password |
Note that in addition to the database specified by
the configuration variable QUILL_DB_NAME,
the condor_ quill daemon also needs access to the database
"template1".
In order to create the database in the first place,
the condor_ quill daemon needs to connect to the database.
- Start the PostgreSQL server service. See the URL for the
installation instructions for the appropriate method to start the service.
- The condor_ quill daemon and client tools connect
to the database as users ``quillreader'' and
``quillwriter''.
These are database users, not operating system users.
The two types of users are quite different from each other.
If these data base users do not exist,
add them using the
createuser command supplied with the installation.
Assign them with appropriate passwords;
these passwords will be used by the Quill tools to connect
to the database in a secure way.
User ``quillreader'' should not be allowed to create
more databases nor create more users.
User ``quillwriter'' should
not be allowed to create more users,
however it should be allowed to create more databases.
The following commands create the two users
with the appropriate permissions,
and be ready to enter the corresponding passwords when prompted.
/path/to/postgreSQL/bin/directory/createuser quillreader \
--no-createdb --no-adduser --pwprompt
/path/to/postgreSQL/bin/directory/createuser quillwriter \
--createdb --no-adduser --pwprompt
In the case of PostgreSQL 8.1 or later, answer ``no'' to the question
about the ability for role creation.
- The condor_ quill daemon needs read and write access
to the database.
It connects as user ``quillwriter'',
who has owner privileges to the database.
Since this gives all access to the ``quillwriter'' user,
this password cannot be stored in a public place
(such as the condor_ collector).
For this reason, the ``quillwriter'' password is stored
in a file named .quillwritepassword in the Condor spool directory.
Appropriate protections on this file guarantee secure access to the database.
This file must be created and protected by the site administrator;
if this file does not exist as and where expected, the condor_ quill
daemon logs an error and exits.
After PostgreSQL is configured and running, Condor must also be
configured to use Quill since by default Quill is configured to be off.
- Add the file .quillwritepassword to the
VALID_SPOOL_FILES variable, since condor_ preen must
be told not to delete this file.
- Set up configuration variables that are specific
to the installation.
QUILL_ENABLED = TRUE
QUILL_NAME = some-unique-quill-name.cs.wisc.edu
QUILL_DB_NAME = database-for-some-unique-quill-name
QUILL_DB_IP_ADDR = databaseipaddress:port
# the following parameter's units is in seconds
QUILL_POLLING_PERIOD = 10
# the following parameter's units is in hours
QUILL_HISTORY_CLEANING_INTERVAL = 24
# the following parameter's units is in days
QUILL_HISTORY_DURATION = 30
QUILL_MANAGE_VACUUM = FALSE
QUILL_IS_REMOTELY_QUERYABLE = TRUE
QUILL_DB_QUERY_PASSWORD = password-for-database-user-quillreader
QUILL_ADDRESS_FILE = $(LOG)/.quill_address
Descriptions of these and other configuration variables are in
section 3.3.25.
Here are further brief details:
- QUILL_DB_NAME and QUILL_DB_IP_ADDR
- These two variables are used to determine the location of the database
server that this Quill would talk to, and the name of the database that
it creates. More than one Quill server can talk to the same database
server. This can be done by simply letting all the
QUILL_DB_IP_ADDR point to the same database server.
If more than one Quill server are sharing the same database
server, then the QUILL_DB_NAME variable for all of them should
be unique. Otherwise, there would be record overwriting and corruption
of job queue information.
- QUILL_NAME
- Each quill daemon in the pool has to be uniquely named.
- QUILL_POLLING_PERIOD
- This controls the frequency with which Quill polls the
job_queue.log file. By default, it is 10 seconds. Since Quill
works by periodically sniffing the log file for updates and then sending
those updates to the database, this variable controls the trade off between
the currency of query results and Quill's load on the system, which
is usually negligible.
- QUILL_HISTORY_CLEANING_INTERVAL and
QUILL_HISTORY_DURATION
- These two variables control the deletion of historical jobs from the
database.
QUILL_HISTORY_DURATION is the number of days
after completion that a given job will stay in the database.
A more precise definition is the number of days since the history ad got
into the history database; those two might be different,
if a job is completed but stays in the queue for a while.
All jobs beyond QUILL_HISTORY_DURATION will be deleted.
As scanning the entire database for old jobs can be expensive,
the other variable QUILL_HISTORY_CLEANING_INTERVAL
is the number of hours between two successive scans. By default,
QUILL_HISTORY_DURATION is set to 180 days and
QUILL_HISTORY_CLEANING_INTERVAL is set to 24 hours.
- QUILL_MANAGE_VACUUM
- Set to False by default, this variable determines whether Quill is to
perform vacuuming tasks on its tables or not. Vacuuming is a maintenance
task that needs to be performed on tables in PostgreSQL. The
frequency with which a table is vacuumed typically depends on the number
of updates (inserts/deletes) performed on the table. Fortunately, with
PostgreSQL version 8.1, vacuuming tasks can be configured to be
performed automatically by the database server. We recommend that users
upgrade to 8.1 and use the integrated vacuuming facilities of the database
server, instead of having Quill do them. If the user does prefer having
Quill perform those vacuuming tasks, it can be achieved by setting this
variable to ExprTrue. However, it cannot be overstated that Quill's vacuuming
policy is quite rudimentary as compared to the integrated facilities
of the database server, and under high update workloads, can prove to
be a bottleneck on the Quill daemon. As such, setting this variable to
ExprTrue results in some warning messages in the log file regarding this
issue.
- QUILL_IS_REMOTELY_QUERYABLE
- Thanks to
PostgreSQL,
one can now remotely query both the job queue and the
history tables. This variable controls whether this remote querying
feature should be enabled. By default it is True. Note that even if
this is False, one can still query the job queue
at the remote condor_ schedd daemon.
This variable only controls whether the database tables are remotely queryable.
- QUILL_DB_QUERY_PASSWORD
- In order for the query tools to connect to a database, they need to provide
the password that is assigned to the database user ``quillreader''.
This variable is then advertised by the condor_ quill daemon
to the condor_ collector.
This facility enables remote querying: remote condor_ q query tools first
ask the condor_ collector for
the password associated with a particular Quill database,
and then query that database. Users who do not have access to the
condor_ collector
cannot view the password, and as such cannot query the database. Again, this
password only provides read access to the database.
- QUILL_ADDRESS_FILE
- When Quill starts up, it can place its address (IP and port)
into a file. This way, tools running on the local machine do not
need to query the central manager to find Quill. This
feature can be turned off by commenting out the variable.
3.12.2 Four Usage Examples
- Query a remote Quill daemon on regular.cs.wisc.edu
for all the jobs in the queue
condor_q -name quill@regular.cs.wisc.edu
condor_q -name schedd@regular.cs.wisc.edu
There are two ways to get to a Quill daemon: directly using its name as
specified in the QUILL_NAME configuration variable, or indirectly
by querying the condor_ schedd daemon using its name.
In the latter case, condor_ q will detect
if that condor_ schedd daemon is being serviced by a database, and if so, directly query it.
In both cases, the IP address and port of the database server hosting the data of
this particular remote Quill daemon can be figured out by the QUILL_DB_IP_ADDR
and QUILL_DB_NAME variables specified in the QUILL_AD
sent by the quill daemon to the collector and in the SCHEDD_AD sent by
the condor_ schedd daemon.
- Query a remote Quill daemon on regular.cs.wisc.edu for all historical
jobs belonging to owner einstein.
condor_history -name quill@regular.cs.wisc.edu einstein
- Query the local Quill daemon for the average time spent in the queue
for all non-completed jobs.
condor_q -avgqueuetime
The average queue time is defined as the average of
(currenttime - jobsubmissiontime) over all jobs which are neither
completed (JobStatus == 4) or removed (JobStatus == 3).
- Query the local Quill daemon for all historical jobs completed since
Apr 1, 2005 at 13h 00m.
condor_history -completedsince '04/01/2005 13:00'
It fetches all jobs
which got into the 'Completed' state on or after the
specified time stamp. It use the PostgreSQL date/time
syntax rules, as it encompasses most format options. See
http://www.postgresql.org/docs/8.0/static/datatype-datetime.html#AEN4516
for the various time stamp formats.
3.12.3 Quill and Its RDBMS Schema
With only 7 tables and 2 views, Quill uses a relatively simple database
schema. These can be broadly divided into tables used to store job
queue information and those used to store historical information.
The job queue part of the schema closely follows Condor's ClassAd data
model. For example, each row in these tables describe an <attribute,value>
pair of the classad. Additionally, just as how Condor distinguishes a
ClusterAd from a ProcAd where the former stores attributes common to all
jobs within a cluster whereas the latter stores attributes specific to
each job, the schema also makes this distinction. Finally, numerical
and string valued attributes are stored separately.
Thus, there are four tables:
- ClusterAds_Str (cid int,
attr text,
val text,
primary key (cid, attr))
- ClusterAds_Num (cid int,
attr text,
val double precision,
primary key (cid, attr))
- ProcAds_Str (cid int,
pid int,
attr text,
val text,
primary key (cid, pid, attr))
- ProcAds_Num (cid int,
pid int,
attr text,
val double precision,
primary key (cid, pid, attr))
In addition to the <attribute, value>, each row contains the cluster-id
(cid) and in the case of the ProcAd tables, also the proc-id (pid).
Since each ClassAd would be split into potentially two tables (string
and numeric), there are views that unify them into a single entity in
order to simplify queries.
Here are the view definitions:
- Definition of ClusterAds view
CREATE VIEW ClusterAds as select cid,
attr,
val from ClusterAds_Str UNION ALL
select cid,
attr,
cast(val as text) from ClusterAds_Num;
- Definition of ProcAds view
CREATE VIEW ProcAds as select cid,
pid,
attr,
val from ProcAds_Str UNION ALL
select cid,
pid,
attr,
cast(val as text) from ProcAds_Num;
Finally, the job queue part of the schema also contains a table that
stores metadata information related to the job_queue.log file.
- JobQueuePollingInfo (last_file_mtime BIGINT,
last_file_size BIGINT,
last_next_cmd_offset BIGINT,
last_cmd_offset BIGINT,
last_cmd_type SMALLINT,
last_cmd_key text,
last_cmd_mytype text,
last_cmd_targettype text,
last_cmd_name text,
last_cmd_value text,
log_seq_num BIGINT,
log_creation_time BIGINT)
At all times, there is only 1 row in this table and it describes
information related to the last time Quill polled the job_queue.log file.
- last_file_mtime and last_file_size
The last modified time and size of the file.
- last_cmd_offset and last_next_cmd_offset
The offsets of the record last read from the file and its successive record.
- last_cmd_type
The command type (101, 102, etc.) of the record.
- last_cmd_key,
last_cmd_mytype,
last_cmd_targettype,
last_cmd_name,
and
last_cmd_value
Together, these attributes define the record itself. The key
refers to the combined "cid.pid" pair, mytype and target usually
contains Job and Machine respectively, and finally the name and
value contains the <attribute,value> pair.
- log_seq_num and log_creation_time
Together, these form the first record of the job queue log file. They
describe the sequence number of the next historical job queue log file
and the creation time of the current job queue log file respectively.
Together, they help Quill detect whether the schedd has rotated the
job queue log file and, if so, take appropriate actions.
The historical information on the other hand is slightly differently
designed. Instead of a purely vertical data model (each row is a
<attribute,value> pair), we have two tables that together represent the
complete job classad. Their schema is as follows:
- History_Horizontal (cid int,
pid int,
EnteredHistoryTable timestamp with time zone,
Owner text,
QDate int,
RemoteWallClockTime int,
RemoteUserCpu float,
RemoteSysCpu float,
ImageSize int,
JobStatus int,
JobPrio int,
Cmd text,
CompletionDate int,
LastRemoteHost text,
primary key(cid,pid))
- History_Vertical (cid int, pid int, attr text, val text, primary key
(cid, pid, attr))
Each historical job ad is divided into its horizontal and vertical
counterparts. This division was made because of query performance
reasons. While its easier to store ClassAds in a vertical table,
queries on vertical tables generally perform worse than those on
horizontal tables since the latter has lot fewer records. However, in
Condor, since job ads do not have a fixed schema (users can define their
own attributes), a purely horizontal schema would end up having a lot
of null values. As such, we have a hybrid schema where attributes on
which queries are frequently performed (via condor_ history) are put
in the History_Horizontal table and the other attributes
are stored vertically (just as in the Cluster/Proc tables above) in the
History_Vertical table. Also History_Horizontal
contains all the attributes needed to service the short form of the
condor_ history command (that is, without the -l option).
The resulting hybrid schema has proven to be the most efficient in
servicing condor_ history queries. The job queue tables (Cluster and
Proc) were not designed in this hybrid manner because job queues aren't
as large as history; just a vertical schema worked great.
3.12.4 Quill and Security
There are several layers of security in Quill, some provided by Condor and
others provided by the database. First, all accesses to the database
are password-protected.
- The query tools, condor_ q and
condor_ history connect to the database as user ``quillreader''.
The password for this user can vary from one database to another and
as such, each Quill daemon advertises this password to the collector.
The query tools then obtain this password from the collector and
connect successfully to the database. Access to the database by the
``quillreader'' user is read-only, as this is sufficient for the
query tools. The condor_ quill daemon ensures this protected access using the sql
GRANT command when it first creates the tables in the database. Note that
access to the ``quillreader'' password itself can be blocked by
blocking access to the collector, a feature already supported in Condor.
- The condor_ quill daemon, on the other hand, needs read and write access
to the database. As such, it connects as user ``quillwriter'',
who has owner privileges to the database. Since this gives all
access to the ``quillwriter'' user, this password cannot
be stored in a public place (such as the collector). For this
reason, the ``quillwriter'' password is stored in a file called
.quillwritepassword in the Condor spool directory.
Appropriate protections on this file guarantee secure access to the database.
This file must be created and protected by the site administrator;
if this file does not exist as and where expected, the condor_ quill
daemon logs an error and exits.
- The IsRemotelyQueryable attribute in the Quill ClassAd advertised
by the Quill daemon to the collector can be used by site administrators
to disallow the database from being read by all remote Condor query tools.
Next: 3.13 Setting Up for
Up: 3. Administrators' Manual
Previous: 3.11 The High Availability
Contents
Index
condor-admin@cs.wisc.edu