Frequently Asked Question about the Hoffman2 Cluster

Questions in this FAQ:

  1. Which Password do I use to login?
  2. My program writes lot of scratch files in my home directory. This results in exceeding my disk space quota. What is the solution?
  3. How do I transfer my files from the Hoffman2 Cluster to my machine
  4. Is there a simpler way to copy all my files to my new Hoffman2 account?
  5. The ATS consultant sent me an email about lot of left over jobs running under my loginid. How do I delete them?
  6. I have a lot of jobs in error state E. How do I find out what the problem is?
  7. How do I print my output?
  8. What queues can I run my jobs in?

Questions and Answers

Which password do I use to login?

As a user of an ATS-Hosted Cluster, you will have the following passwords:

  • For each cluster you can access you will have a separate login ID and password.
  • You will have a single username and password that you can use to login to both the UCLA Grid Portal and the UC Grid Portal.

Your cluster login IDs and passwords are independent of each other and of your grid portal username/password. For example, when you change your password on one of the ATS-Hosted Clusters, it changes on that cluster and that cluster only. Your passwords on the clusters can be, and probably are, different. There is only one grid portal password which is used by both the UCLA Grid Portal and the UC Grid Portal. If you request that the password you use for one of the grid portals be changed, you will have to use your new password when you login to either grid portal.

In addition to these passwords, everyone affiliated with UCLA has a UCLA Logon ID and Password. You are sometime asked to authenticate with your UCLA Logon ID and Password when requesting services via the web, even from ATS web sites. The UCLA Logon ID and Password is independent from any login ID/password or username/password combinations that ATS has issued to you.

My program writes lot of scratch files in my home directory. This results in exceeding my disk space quota. What is the solution?

There are several things you can do:

  • If you are a member of a research group which has contributed nodes to the Hoffman2 Cluster, your PI can purchase additional disk space for use by the members of your group.
  • Each process in your parallel program can write to the local /work on the node it is running on. When the program finishes, you can copy the files off to a place where you have more space. Since /work is local to the nodes, using it is very efficient.
  • You can write to /u/scratch and you have 7 days after the job completes to copy the files somewhere else.

How do I transfer my files from the Hoffman2 Cluster to my machine?

If the size of an individual file does not exceed 100 MB, you can download it to your local machine, or transfer it to another cluster that you can access at UCLA from the UCLA Grid Portal.

For any size file, you can use the scp command to transfer a file or directory from one machine or system to another. For saftey reasons, as outlined in the Security Policy for ATS-Hosted Clusters, always scp from your machine to the ATS-Hosted cluster. NEVER scp from the ATS-Hosted cluster back to your local machine.

Is there a simpler way to copy all my files to my new Hoffman2 account?

Once you have been notified that your login ID has been added to the Hoffman2 Cluster, login to your local machine and from your local machine's home directory enter the command:

tar -clpzf - * | ssh loginid@hoffman2.idre.ucla.edu tar -xpzf -

Replace loginid with your Hoffman2 Cluster loginid.

Note that this transfer will not copy any of the hidden (dot) files from your local home directory to your new home directory on the Hoffman2 Cluster. Since many of the dot files in your home directory are operating system version specific, it would not be appropriate or useful to transfer these files.

An ATS consultant sent me an email about a lot of left over jobs running under my userid. How do I delete them?

You can get the processor id's using the ps command and filter them using the grep command to select only the jobs you want to delete and feed the result to kill command.

ps -u loginid | grep myjob | awk '{print $1}' | xargs
ps -u loginid | grep myjob | awk '{print $1}' | xargs kill

Replace loginid with your loginid and myjob with the executable name.

I have a lot of jobs in error state E. How do I find out what the problem is?

When the myjobs script or qstat -u loginid shows you have jobs in an error state ("E", "Eqw", etc.) you can use the error_reason script to show you why. It will print the error reason line from qstat -j jobid output for all of your jobs that are in an error state.

error_reason -u loginid

Replace loginid with your loginid.

How do I print my output?

There is no printer directly associated with the Hoffman2 Cluster. If you have a printer attached to your local desktop machine, you can copy your file to your local machine and print your file locally. Recall that for security reasons you should issue the scp command from your local machine, and not from the Hoffman2 command line.

Here is a little script that you could save on a unix/linux machine that might make printing a text file easier. You might name this script h2print

scp loginid@hoffman2.idre.ucla.edu:$* .
lpr $*

where loginid is your Hoffman2 Cluster login ID. You can omit loginid@ if your userid on your local machine is the same as your Hoffman2 Cluster login ID. Note the period (.) at the end of the scp command line. Mark the script as executable with the chmod command:

chmod +x h2print

To print a Hoffman2 text file in your home directory, from your local machine's command prompt, enter:

h2print hoffman2_filename

where hoffman2_filename is the name of your text file on the Hoffman2 Cluster that you want to print.

The scp command will prompt you for your Hoffman2 Cluster password, unless you have previously setup an rsa key pair on your local machine with the ssh-keygen -t rsa command, and appended a copy of the public key (id_rsa.pub) to ~/.ssh/authorized_keys on your Hoffman2 Cluster account.

What queues can I run my jobs in?

Running the qhelp application from one of the login nodes will provide you with a list of queues that you may run in and their resource limits (memory, cores, and time limits).

Supplying the "-v" argument to qhelp will also explicitly tell you what the core/slot limits are for your cluster user and/or any resource groups you belong to.

The qquota command will tell you what resources available to your userid are in use at the moment that the qquota command was run. The purpose of qquota is not to provide a complete list of the resources available to your userid.

For example:

resource quota rule limit                filter
--------------------------------------------------------------------------------
rulset1/10         slots=123/256        users @campus hosts @idre-amd_01g
        

slots=123/256 means 123 slots or cores are in use out of 256 available. Enter man qquota at the shell prompt for more information.