Today's scientific applications have huge data requirements, which continue to increase drastically every year. These data are generally accessed by many users from all across the the globe. This requires moving huge amounts of data around wide area networks to complete the computation cycle, which brings with it the problem of efficient and reliable data placement.
Stork is a scheduler for data placement. With Stork, data placement jobs have been elevated to the same level as Condor's computational jobs; data placements are queued, managed, queried and autonomously restarted upon error. Stork understands the semantics and protocols of data placement.
The underlying data placement jobs are performed by Stork modules, typically installed in the Condor libexec directory. The module name is encoded from the data placement type and functions. For example, the stork.transfer.file-file module transfers data from the file:/ (local filesystem) to the file:/ protocol. The stork.transfer.file-file module is the only module bundled with Condor/Stork. Additionally, contributed modules may be downloaded for these data transfer protocols:
ftp:// | FTP File Transfer Protocol |
http:// | HTTP Hypertext Transfer Protocol |
gsiftp:// | Globus Grid FTP |
nest:// | Condor NeST network storage appliance (see http://www.cs.wisc.edu/condor/nest/) |
srb:// | SDSC Storage Resource Broker (SRB) (see http://www.sdsc.edu/srb/) |
srm:// | Storage Resource Manager (SRM) (see http://sdm.lbl.gov/srm-wg/) |
csrm:// | Castor Storage Resource Manager (Castor SRM) (see http://castor.web.cern.ch/castor/) |
unitree:// | NCSA UniTree (see http://www.ncsa.uiuc.edu/Divisions/CC/HPDM/unitree/) |
The Stork module API is simple and extensible, enabling users to create and use their own modules.
Stork includes high level features for managing data transfers. By configuration, the number of active jobs running from a Stork server may be limited. Stork includes built in fault tolerance, with capabilities for retrying failed jobs, together with the specification of alternate protocols. Stork users also have access to a higher level job manager, Condor DAGMan (section 2.11), which can manage both Stork data placement jobs and traditional Condor jobs at the same time.
As with Condor jobs, Stork jobs are specified with a submit description file. It is important to note the syntax of the submit description file for a Stork job is different than that used by Condor jobs. Specifically, Stork submit description files are written in the ClassAd language. See the ClassAd Language Reference Manual for complete details. Please note that while most of Condor uses ClassAds, Stork utilizes the most recent version of this language, which has evolved over time. Stork defines keywords. When present in the job submit file, keywords define the function of the job.
Here is sample Stork job submit description file, showing file syntax and keywords. A job specifies a 1-to-1 mapping of a data source URL to destination URL.
// This is a comment line. [ dap_type = transfer; src_url = "file:/etc/termcap"; dest_url = "file:/tmp/stork/file-termcap"; ]
This example shows the ClassAd pairs that form the heart of a Stork job specification. The minimum keywords required to specify a Stork job are:
Additionally, the following keywords may be used in a Stork submit description file:
Stork places no restriction on the submit file name or extension, and will accept any valid file name for a Stork submit description file.
Submit data placement jobs to Stork using the stork_ submit tool. For example, after creating the submit description file sample.stork with an editor, submit the data transfer job with the command:
stork_submit sample.stork
Stork then returns the associated job id, which is used by other Stork job control tools.
Only the first ClassAd (a record expression within brackets) within a Stork submit description file becomes a data placement job upon submission. Other ClassAds within the file are ignored.
Use stork_ status to check the status of any active or completed Stork job. stork_ status takes a single argument: the job id. For example, to check the status of the Stork job with job id 3:
stork_status 3
Use stork_ q to query all active Stork jobs. stork_ q does not report on completed Stork jobs.
For example, to check the status all active Stork jobs:
stork_q
Active jobs may be removed from the job queue with the stork_ rm tool. stork_ rm takes a single argument: the job id of the job to remove. All jobs may be removed, provided they have not completed.
For example, to remove the queued job with job id 4:
stork_rm 4
In an ideal world, all data transfers succeed on the first attempt. However, data transfers do fail for various reasons. Stork is designed with data transfer fault tolerance. Based on configuration, Stork retries failed data transfer jobs using specified protocols.
If a transfer fails, Stork attempts the transfer again, until the number of attempts reaches the limit, as defined by the configuration variable STORK_MAX_RETRY (section 3.3.28).
For each attempt at transfer, the transfer protocols to be used at both source and destination are defined. These transfer protocols may vary, when defined by an alt_protocols entry in the submit description file. The location of the data at the source and destination is unchanged by the alt_protocols entry. alt_protocols defines an ordered list of alternative translation protocols to be used. Each entry in the list is a pair. The first of the pair defines the protocol to be used at the source of the transfer. The second of the pair defines the protocol to be used at the destination of the transfer.
The syntax is a comma-separated list of pairs. A dash character separated the pairs. The protocol name is given in all lower case letters, without colons or slash characters. Stork uses these strings to identify the protocol translation and transfer module to be used.
The initial translation protocol (specified in the src_url and dest_url entries) together with the list defined by an alt_protocols entry form the ordered list of protocols to be utilized in a round robin fashion.
For example, if STORK_MAX_RETRY has the value 4, and the Stork job submit description file contains
[ dap_type = transfer; src_url = "gsiftp://serverA/dirA/fileA"; dest_url = "http://serverB/dirB/fileB"; ]
then Stork will attempt up to 4 transfers, with each using the same translation protocol. gsiftp:// is used at the source, and http:// is used at the destination. The Stork job fails if it has not been completed after 4 attempts.
A second example shows the transfer protocols used for each attempted transfer, when alt_protocols is used. For this example, assume that STORK_MAX_RETRY has the value 7.
[ dap_type = transfer; src_url = "gsiftp://no-such-server/dir/file"; dest_url = "file:/dir/file"; alt_protocols = "ftp-file, http-file"; ]
Stork attempts the following transfers, in the given order, stopping when the transfer succeeds.
Condor DAGMan (section 2.11) provides high level management of both traditional CPU jobs and Stork data placement jobs. Using DAGMan, users can specify data placement using the DATA keyword. DAGMan can mix Stork data transfer jobs and Condor jobs. This capability lends itself well to grid computing, as data is often staged in (transferred) before processing the data. After processing, output is often staged out (transferred).
Here is a sample DAGMan input file that stages in input files using Stork transfers, processes the data as a Condor job, and stages out the result using a Stork transfer.
# Transfer input files using Stork DATA INPUT1 transfer_input_data1.stork DATA INPUT1 transfer_input_data2.stork DATA INPUT2 transfer_data # # Process the data using Condor JOB PROCESS process.condor # # Transfer output file using Stork DATA RESULT transfer_result_data.stork # # Specify job dependencies PARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD RESULT