To run the examples shipped with Ciel:

DEPENDENCIES:

1. Python 2.6
2. Python packages: pycurl, cherrypy
3. Java runtime environment available and in PATH

BUILD EXAMPLES:

In the repository root:

./build-all.sh

This will:
a. Retrieve Hadoop/Mahout JARs to ext/mahout-0.3
b. Compile Java examples found under examples/, leaving .class files in build/ .jar files in dist/
c. Generate input data for some of the examples, stored under data/

RUNNING A LOCAL CLUSTER:

cd scripts
./run_master.sh

In another terminal:

./run_worker.sh

To launch further workers:

WORKER_PORT=N ./run_worker.sh
where WORKER_PORT is not the default port of 9001

RUNNING EXAMPLES:

cd scripts
PYTHONPATH=../src/python ./sw-start-job ../src/packages/[eg-name].pack
where eg-name is one of:

java_test: Basic functionality test
pi: Compute pi. Data-independent.
grep: Searches for the word 'for' in the local dictionary, or another file if DATAREF is specified in the environment
wordcount-java: Counts all words in its input, by default using the local dictionary but setting DATAREF in the environment
kmeans: Computes kmeans clusters over some random data. Set KMEANS_VECTORS and KMEANS_CLUSTERS in the environment to override the default example data files. See the script (repo root)/gen-data.sh for an example of data generation.
pagerank: Computes pagerank given a random web graph. Set PAGERANK_GRAPH_FILE in the environment to override its source.
smithwaterman: Performs SW gene alignment using random input data

At present most examples output the name of a reference rather than an intelligible result. To improve.
