
Ciel now comes equipped with a primitive distributed filesystem to make the process of starting jobs that depend on multiple files easier.

scripts/sw-start-job is a helper script that populates the DFS namespace at job startup time.

Its input is a filename which should contain JSON data describing the desired namespace as well as job startup parameters. Its format is as follows:

{"package":
	{
	 "name1": <filespec>,
	 "name2": <filespec>
	},
 "start":
	{
	 "handler": <handler_name>,
	 "args": <args_dict>
	}
}

Where <filespec> has the form: {"filename": <filename>, ["index": <boolean>]}. The square brackets around 'index' indicate it is optional, not that it is a list.

If "index" is not specified or is false, the file will be synchronously uploaded to the cluster master and a reference associated with the corresponding name.
If it is true, the file will be uploaded in chunks to the cluster's workers, an index created and a reference to that index returned. This is identical to the action the sw-load script.

At present this functionality is quite limited, but I plan to add more possible sources of files and to permit more fine-grained control of sw-load.

Looking up references against the DFS namespace varies from language to language, but to use Skywriting as an example, 'x = package("name1");' would retrieve either the reference or index-reference corresponding to that name.

The <handler_name> and <args_dict> parameters specify how the first task in the new job should be started. They are passed verbatim to the "init" executor, which will then spawn a task with the given handler and arguments. The required keys in args_dict vary from handler to handler, but again to give a Skywriting example, "handler": "swi", "args": {"sw_file_ref": <ref>}, where <ref> is a reference to a Skywriting file, would start the job by running the given file with no arguments or environment.

PACKAGE REFERENCES

It is frequently necessary to refer to references in the DFS namespace when passing the initial task args_dict. This can be done using the notation {"__package__": "name"}. To give a complete example, a very simple Skywriting script could be uploaded and run using:

{"package": {"swmain": "~/test.sw"}, "start": {"handler": "swi", "args": {"sw_file_ref": {"__package__": "swmain"}}}}

ARGUMENTS AND ENVIRONMENT VARIABLES

sw-start-job supports simple customisation of package descriptors: the constructs:

{"__env__": "<Unix environment variable name>", ["default": <default value>]} and {"__args__": <integer>, ["default": <default value>]}

can be used in both package and start dictionaries.

SIMPLER JOBS

This form is rather onerous to describe very simple jobs with no need for the distributed filesystem. Simple driver programs are supplied to run one-file Skywriting and SkyPy jobs: sw-job and skypy-job respectively. Both are simple wrapper programs that generate and run trivial package descriptors.

MORE COMPLEX JOBS

Naturally this form can't express arbitrarily complicated jobs; it's really just some simple macros to make it easy to write down short descriptions of moderately involved jobs. For more complicated tasks you should write a Python program that uses the innards of sw-start-job. For example, the Skywriting example above could be executed using:

from skywriting.runtime.util.start_job import submit_job_with_package

submit_job_with_package({"swmain": "~/test.sw"}, "swi", {"sw_file_ref": {"__package__": "swmain}}, <path-root>, <master-url>)

where path-root is the root directory against which to resolve any relative names in the package dictionary and master-url is an absolute HTTP URL corresponding to a Ciel master node.

That package also contains sub-functions which may be useful if you need to pull the process apart further. TODO: factor it better and document.

