Computational Cluster Programs

Running a Batch Job on an ATS-Hosted Cluster

How to Submit a Batch Job

There are three ways to submit a batch job on an ATS-Hosted Cluster. They are listed here from the easiest to the most difficult.

Submit a batch job from the UCLA Grid Portal.
The UCLA Grid Portal provides a web portal interface to the ATS-Hosted clusters. Every user of an ATS-Hosted cluster can access the UCLA Grid Portal. To submit a batch job from the UCLA Grid Portal, click on the "Job Services" tab. There are two kinds of jobs that you can submit:
Generic Jobs
Use this page to submit a job that runs a program or script that either you or a colleague has written.

In the form provided supply the name of the executable, any job parameters, time limit, number of processors, etc. and click the Submit button.

Applications
Use this page to submit a commonly-used application. Normally, you are required to know less about an application that a generic job as the UCLA Grid Portal keeps track of the location of the executable and other information about the application. You normally must prepare an input file that the application will read and run. Some applications can present forms to you on the UCLA Grid Portal that you can fill in to create the input file if you are not that familiar with its format.

After you submit a job, you can monitor its progress from the UCLA Grid Portal and view and retrieve your output.

Use one of the Queue Scripts from the Head Node.
The queue scripts can be used to both prepare the command file which is required for SGE to submit a job and to submit a command file to run. The queue scripts can also be used to monitor your batch jobs.

Each queue script is named by the application or type of application whose jobs it submits. A queue script can be run either as a single command to which you provide appropriate options or as an interactive application which presents you with a menu of choices and prompts you for options.

For example, if you simply enter a queue script command such as:

job.q
without the name of a program to run, the queue script will enter its interactive mode and present you with a menu of tasks you can perform. One of these tasks is to build the command file, another is to submit a command file that has already been built, another is to list the jobs you have already submitted, etc.

Alternatively, if you enter:

job.q name_of_executable
the queue script will, build the command file, submit it to run and delete it.

The queue scripts discussed here are available on many of the ATS-hosted clusters. To find out which specific queue scripts are installed on any given cluster, log into the cluster head node and enter the command:

queue

Enter:

man queue
when logged into the cluster head node for more information.

The following queue scripts are available on most clusters:


Generate a SGE command file for your job and use the SGE commands directly.
There are two things you must do to submit a job:
  • Prepare an SGE job command file, containing the SGE keyword statements and shell commands to run the job.

    Each SGE keyword statement begins with #$ followed by the SGE keyword and its value, if any. For example:

    #$ -cwd
    #$ -o myjob.joblog
    #$ -j y

    Here, the first SGE statemet specifies that the current working directory is to be used for the job; the second SGE statement names the output file in which the SGE command file will write its standard output (normally the job log); and the third specifies that the standard error for the script file is to be merged with the standard output file.

    For a parallel MPI job you need to have a line similar to the one below:

    #$ -pe orte number_of_processors_requested

    For an OpenMP job or a job which combines OpenMP and MPI, you have to add:

    #$ -pe orteshm number_of_processors_requested
    SGE will ensure that the processors are assigned to as few nodes as possible so that shared memory can be used.

    ATS recommends users use the queue script generator command 'mpi.q'.

    In addition to the SGE keyword statements, the SGE job command file contains shell commands that initialize the environment for the job, invoke the program that the job will run and perform any post-job processing needed.

    The SGE keyword statements in a command file are called active comments because they begin with #$ and comments in a script file normally begin with #. The format of the SGE job command file, with examples, is documented by Sun Microsystems in the Sun Grid Engine User's Guide section on submitting batch jobs. Any qsub command line option can be used in the command file as an active comment. The qsub command line options are listed on the qsub man page. ATS support for SGE access groups does not allow qsub to accept redirected input (stdin) in some cases.

  • Issue the appropriate SGE commands from the head node to submit and monitor the job.

Submitting a Series of Jobs for Parametric Studies

Researchers often needs to conduct parametric studies to optimize or explore the effects of various parameters in their experiments. SGE has a job array option, which lets you submit all the jobs with a single command file and lets the scheduler execute the jobs as and when resources are available. This feature is called Job Arrays and is fully documented separately.

Batch Job Output Files

When a job has completed, the SGE command file output will be available in the stdout and stderr files that were defined in the SGE command file (name.joblog). Program output will be available in any files that your program has written. If you job was submitted via the queue scripts, stdout from the program will be found in the file name.output.