As of October 2009, the Hoffman2 Cluster has:
The current size of the Hoffman2 Cluster is more than 3400 cores and still growing. Since each node has either 4 or 8 cores, the queues have been set up to run either 4 or 8 processes or jobs per node.
The Hoffman2 Cluster has the following two types of queues:
Queues with 14 days limits and higher priority are only accessible by members of Research Groups that have contributed nodes to the shared Hoffman2 Cluster. These queues also have allocation rules that restrict the number of processors that can be used by each research group.
Each research group's allocation in higher priority queues (14 day limit) consists of either:
High Priority Queues Properties:
The purpose of these queues is to both harvest unused cycles, and allow members of research groups that have contributed nodes to run jobs on the extended shared Hoffman2 Cluster.
The 24 hour queues have access to ATS-contributed cores from the Base Shared Cluster, and research group equivalent cores that are not currently running jobs.
Only those research groups that have contributed nodes to the shared Hoffman2 Cluster can take advantage of processors that are part of another research group's idle contributed processors.
24 Hour Shared Queues Properties:
For Users of the Campus General Purpose nodes of the shared Hoffman2 Cluster:
The 24 hour queues are intended for parallel jobs submitted by those members of the UCLA community who have access to the Campus General Purpose Cluster. It is limited to the number of processors in that part of the Cluster.
Campus Queues Properties:
If your program, for some reason, absolutely requires more than 24 hours to run and cannot be stopped and restarted in the 24 hour time frame, you can make a special request to have it run for a maximum of either 3 or 5 days. Send your request by email to atshpc@ucla.edu. Include the following in your request:
ATS staff will respond to requests during normal business hours.
For all Users:
The Interactive queues are intended for interactive sessions, including licensed applications which ATS has purchased for general use.
Interactive Queues Properties:
The Sun Grid Engine (SGE) is the job management system used on the Hoffman2 Cluster to ensure balanced use of resources by matching job needs to available compute resources. SGE serves as the job scheduler. SGE knows which users are in which groups and enforces the queuing policies. Therefore it is important to specify your job's resource requirements correctly. SGE will pick the correct resources for its execution. Do not request more resources than your job requires because that may delay your job starting, and will defeat SGE's backfilling capability.
When you submit a job using any of the methods: from the UCLA Grid Portal, or via the queue scripts, or using the qsub command, request the number of wall clock hours of execution required, the type of job (for example high priority or interactive) and any needed applications. Your job will automatically be assigned to a queue as follows:
Queue a job will run in for a member of a research group that has contributed nodes to the shared Hoffman2 Cluster:
| High Priority Request: Is the number of cores requested by the job > the number contributed by the research group to the shared Hoffman2 Cluster? | Number of Hours Requested | |
| <=24 | >24 | |
| No | The queues with high priority (up to 14 days limit) in which this job will start. | The queues with high priority (up to 14 days limit) in which this job will start. |
| Yes | This job can never run. | This job can never run. |
| No Priority Request: Is the number of cores requested by the job > the number contributed by the research group to the shared Hoffman2 Cluster? | Number of Hours Requested | |
| <=24 | >24 | |
| No | The shared queues (24 hour) in which this job will start. | This job can never run. |
| Yes | The shared queues (24 hour) in which this job will start. | This job can never run. |
| Is this job asking for a licensed application ATS is providing? | Queue |
| No | The shared 24 hour queues. |
| Yes | The shared 24 hour or interactive queues. |
Programs that require more than 24 hours to complete and which have to be run in queues limited to 24 hours should checkpoint before 24 hours is up so they can be continued later.
October 2009