Questions in this FAQ:
As a user of an ATS-Hosted Cluster, you will have the following passwords:
Your cluster login IDs and passwords are independent of each other and of your grid portal username/password. For example, when you change your password on one of the ATS-Hosted Clusters, it changes on that cluster and that cluster only. Your passwords on the clusters can be, and probably are, different. There is only one grid portal password which is used by both the UCLA Grid Portal and the UC Grid Portal. If you request that the password you use for one of the grid portals be changed, you will have to use your new password when you login to either grid portal.
In addition to these passwords, everyone affiliated with UCLA has a UCLA Logon ID and Password. You are sometime asked to authenticate with your UCLA Logon ID and Password when requesting services via the web, even from ATS web sites. The UCLA Logon ID and Password is independent from any login ID/password or username/password combinations that ATS has issued to you.
Please see How to Change your Cluster Password If that doesn't fix it, please send email to accounts @ ats.ucla.edu
There are several things you can do:
If the size of an individual file does not exceed 100 MB, you can download it to your local machine, or transfer it to another cluster that you can access at UCLA from the UCLA Grid Portal.
For any size file, you can use the scp command to transfer a file or directory from one machine or system to another. For saftey reasons, as outlined in the Security Policy for ATS-Hosted Clusters, always scp from your machine to the ATS-Hosted cluster. NEVER scp from the ATS-Hosted cluster back to your local machine.
Once you have been notified that your login ID has been added to the Hoffman2 Cluster, login to your local machine and from your local machine's home directory enter the command:
tar -clpzf - * | ssh loginid@hoffman2.idre.ucla.edu tar -xpzf -
Replace loginid with your Hoffman2 Cluster loginid.
Note that this transfer will not copy any of the hidden (dot) files from your local home directory to your new home directory on the Hoffman2 Cluster. Since many of the dot files in your home directory are operating system version specific, it would not be appropriate or useful to transfer these files.
You can get the processor id's using the ps command and filter them using the grep command to select only the jobs you want to delete and feed the result to kill command.
ps -u loginid | grep myjob | awk '{print $1}' | xargs
ps -u loginid | grep myjob | awk '{print $1}' | xargs kill
Replace loginid with your loginid and myjob with the executable name.
When the myjobs script or qstat -u loginid shows you have jobs in an error state ("E", "Eqw", etc.) you can use the error_reason script to show you why. It will print the error reason line from qstat -j jobid output for all of your jobs that are in an error state.
error_reason -u loginid
Replace loginid with your loginid.
There is no printer directly associated with the Hoffman2 Cluster. If you have a printer attached to your local desktop machine, you can copy your file to your local machine and print your file locally. Recall that for security reasons you should issue the scp command from your local machine, and not from the Hoffman2 command line.
Here is a little script that you could save on a unix/linux machine that might make printing a text file easier. You might name this script h2print
scp loginid@hoffman2.idre.ucla.edu:$* .
lpr $*
where loginid is your Hoffman2 Cluster login ID. You can omit loginid@ if your userid on your local machine is the same as your Hoffman2 Cluster login ID. Note the period (.) at the end of the scp command line. Mark the script as executable with the chmod command:
chmod +x h2print
To print a Hoffman2 text file in your home directory, from your local machine's command prompt, enter:
h2print hoffman2_filename
where hoffman2_filename is the name of your text file on the Hoffman2 Cluster that you want to print.
The scp command will prompt you for your Hoffman2 Cluster password,
unless you have previously setup an rsa key pair on your local machine
with the ssh-keygen -t rsa command,
and appended a copy of the public key (id_rsa.pub) to
The qquota command will tell you what resources available to your userid are in use at the moment that the qquota command was run. The purpose of qquota is not to provide a complete list of the resources available to your userid. If no resources are in use at the moment, qquota will not return any information.
For example:
resource quota rule limit filter
--------------------------------------------------------------------------------
rulset1/10 slots=123/256 users @campus hosts @idre-amd_01g
"slots=123/256" means 123 slots or cores are in use by your group out of 256 of your group's total allocation. Enter man qquota at the shell prompt for more information.
The show_slots script will list the number of available and used slots for each queue or type of job (interactive, parallel 24 hours, parallel 14 days, serial 24 hours, serial 14 days, etc.). The queues are grouped by data center. Example:
IDRE Available Used 759, Total 2024 interactive 1177 8 parallel 24hours 1081 469 parallel 14days 881 98 serial 24hours 1081 66 serial 14days 889 118 MSA Available Used 1132, Total 1512 interactive 340 0 parallel 24hours 240 360 parallel 14days 148 536 serial 24hours 140 8 serial 14days 140 228
Not all available slots may be available for your jobs. Use the qquota command to see your group's used/total allocation. Note that the total number of slots in a data center also includes those which are disabled, or in an alarm state, or otherwise not ready to accept jobs.
The qstat command will list all the jobs which are running (r) or waiting to run (qw), in order by priority ("prior" column). If all jobs requested the same resources, this would also be the order in which they start running. In reality, some jobs will request more nodes or a longer run time which is not presently available, so SGE will "back-fill" and try to start jobs which require fewer resources that will complete without slowing down the start time of a job higher in the list.
If you are in a research group which has purchased nodes for the Hoffman2 Cluster, you can use the highp complex to request that your job run on your group's highp resources. It is guaranteed that some job submitted by someone in your research group will start within 24 hours. To see where your highp job is with respect to the waiting jobs that everyone else in your group has submitted, you can use the groupjobs script. It will display a list of pending jobs, or pending and running jobs, similar to regular qstat output but only for everyone in your SGE group. The job at the top of the list will in most cases start running before those later in the list. For help and a list of options, enter groupjobs -h
From the UCLA Grid Portal, you can use its "Disk Usage on Hoffman2" application. Click:
Job Services
Applications
Disk Usage on Hoffman2
Submit Job button
You do not have to make any changes on the application form in order for it to report on your home directory usage. View your job results as usual. Click:
Job Services
Job Status
After your job has completed and its status is Done, click the Stdout link in the Output column for your job. Your request runs as a job on Hoffman2 and will send you standard Sun Grid Engine job status email.
From the Hoffman2 Cluster login nodes, at the shell prompt, enter:
myquota
The myquota command will report the usage and quota for filesystems where your userid has saved files, including /u/scratch as well as your home directory. Use the myquota command instead of the quota command. The myquota command supports the BlueArc storage system used by the Hoffman2 Cluster.
The new OS includes a new version of the GNU compiler (gcc v. 4.4.4) and python (v. 2.6.5), accordingly any executable built against, or depending in any way from gcc and python, may need to be recompiled. Our default compiler is Intel but if you depend on gcc be aware that we are now supporting only version 4.4.4 (and the openmpi libraries version 1.4.4 built with this compiler). Also we now support solely python version 2.6.5 and most of the third party extension packages are being recompiled accordingly. If you need some specific python module which is not present let us know. Likewise we have attempted to maintain the system as close as possibly to what it was, however, you could expect some library dependencies to be broken as most libraries have substantially changed in this new OS version.
November 2011