Computational Cluster Programs

Hoffman2 Cluster News and Announcements

Contents

January 2012

December 2011

November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 March 2010 February 2010 January 2010 November 2009

October 2009

September 2009

August 2009

July 2009

May 2009

March 2009

February 2009

December 2008

November 2008

September 2008

August 2008

July 2008

June 2008

May 2008

April 2008

January 2008


News Items

Winter 2012 Parallel Computing Classes

We are pleased to announce the parallel computing workshop series for this quarter. The topics include how to use the Hoffman2 Cluster, how to port a serial code to multi-core platforms using OpenMP, how to write distributed memory programs using MPI, and how to write GPU computational code.

The schedules and locations of these free classes are:

  • January 31 Using Hoffman2 Cluster, 11348 Young Research Library
  • February 2 Intro to Parallel Computing and OpenMP, 5907 Math Sciences
  • February 3 Introduction to MPI, 5907 Math Sciences
  • February 8 Introduction to CUDA, 5907 Math Sciences

The class descriptions are within the RSVP links at

http://www.idre.ucla.edu/

under "Upcoming Events". If you have questions, please email atshpc@ucla.edu.

TOP

UCLA Winter Closure December 22, 2011 - January 2, 2012

Hoffman2 will be up and fully operational during the UCLA winter closure, however there will not be any IDRE personnel onsite during this time. We will do our best to insure the cluster remains in operation but, depending on the issues that arise, we will not be addressing them until January 3rd.

TOP

Hoffman2 upgrades

Winter quarter downtime has completed successfully ahead of schedule. The Hoffman2 Cluster is available for login and is now accepting jobs.

IMPORTANT: There are new versions of compiler libraries. If you have programs written in gcc, python, etc., they will need to be re-compiled. Please see FAQ Re-compiling for CentOS 6? for more information.

  • The operating system has been upgraded to CentOS 6.

  • A new job scheduler, Univa Grid Engine 8.0.1 which is an upgrade to the Sun Grid Engine, was installed.

  • Network supervisor modules in the network core were upgraded. This effectively doubles the throughput and capacity of UCLA's research computing networks. In addition, additional port capacity on newly installed high-speed line cards will not only increase the performance and efficiency of existing high-performance storage and compute systems, but will also make it possible to integrate new equipment into the research computing environment.

  • Scratch space firmware has been upgraded. We expect increased performance and thoughput.

If you encounter any problems that might be due to the upgrades, please let us know as soon as possible. Send email to atshpc@ucla.edu

TOP

How to change the time request on pending long-running jobs

If your pending job requests more time than is available before the scheduled shutdown (see Hoffman2 Cluster Downtime Starts November 27th) it will will be deleted November 27th.

If your job can complete in less time, you may want to use the qalter command to change its h_rt value to something less than the remaining time to shutdown. Here's how to do it.

  1. Find out all the "-l" arguments that your job uses. At the shell prompt, enter:
    qstat -j jobnumber | grep 'hard resource_list'

    which returns something like:

    hard resource_list: h_data=4000M,h_rt=1209600,highp=true
  2. Use the qalter command to fix the job. At the shell prompt, enter:
    qalter jobnumber -l h_data=4000M,highp=true,h_rt=288:00:00

    where jobnumber is the job number, and
    where 288:00:00 is today's maximum highp h_rt value.

You can find today's maximum highp h_rt value with:

qconf -sq highp-queue-name

For example:

qconf -sq ats_msa.q | grep h_rt

You can find all of your pending (qw) jobs with:

myjobs -s p

or,

qstat -s p -u $USER

If you have a lot of jobs that need qalter, you can save a list of the job numbers, one per line, in a file named $USER.joblist and use the create-qalter-commands script to make those qalter commands. The script does not run the qalter commands, it just creates them. You may want to use it like:

create-qalter-commands > my.qalter.cmds

Check the script output:

more my.qalter.cmds

Run the qalter commands:

sh -x my.qalter.cmds

Note that the create-qalter-commands script uses today's maximum h_rt value. If your job doesn't start today, you will need to do the same thing tomorrow -- if your job can complete in the remaining time to shutdown.

If you have any questions or problems, please send email to High Performance Computing, atshpc@ucla.edu

TOP

Hoffman2 Cluster Downtime Starts November 27th

There will be a scheduled outage Sunday November 27, 2011 1PM through Wednesday November 30th 5PM of the Hoffman2 Cluster.

During this time we will install a new version of the operating system, CentOS 6, and a new job scheduler, Univa 8, which is an upgrade to the current Sun Grid Engine. We will migrate and expand /u/scratch to our newest, fastest Panasas storage. We will also perform extensive network maintenance and other maintenance.

Long-running job time will be dynamically reduced beginning 5PM November 13th. After that date, the job scheduler will not accept jobs that cannot complete before the outage begins. After a job is accepted, SGE will let it start only if it can complete before 1PM November 27th. Only express jobs will be accepted after 1PM Saturday November 26th.

The express queue will accept jobs until 11AM November 27th:

qsub -l express [other qsub options] your-executable-or-script

The interactive queues will be available until 11AM November 27th:

qrsh -l i [other qrsh options]

Jobs which have not completed before the outage will not be carried over to the new Univa job scheduler. Any job that did not start before the outage begins will be deleted.

We will make every effort to keep the downtime as short as possible. In all probability we will not need the entire time scheduled, but we want to insure we get all this work completed.

The next quarterly downtime, if necessary, is tentatively scheduled for the week of March 19-23, 2012.

If you have any questions or problems, please send email to High Performance Computing atshpc@ucla.edu

TOP

Fall 2011 Parallel Computing Classes

We are pleased to announce the parallel computing workshop series for this quarter. The topics include how to use the Hoffman2 Cluster, how to port a serial code to multi-core platforms using OpenMP, how to write distributed memory programs using MPI, and how to write GPU computational code.

The schedules and locations of these free classes are:

  • October 26, 2-4pm, Using the Hoffman2 Cluster, 5907 Math Sciences
  • October 27, 2-4pm, Introduction to Parallel Computing and OpenMP, 5907 Math Sciences
  • November 1, 2-4pm, Introduction to MPI, 5907 Math Sciences
  • November 9, 2-4pm, Introduction to CUDA, 5907 Math Sciences

The class descriptions are within the RSVP links at

http://www.idre.ucla.edu/

under "Upcoming Events". If you have questions, please email atshpc@ucla.edu.

TOP

Live web tutorial: Moving Data with Globus Online

Globus Online is hosting a webcast for UCLA cluster users on Wednesday, September 28, at 2 p.m. Pacific.

Register at https://www.globusonline.org/ucla-webcast/

The topic is high-performance data movement -- if your work requires transferring data to or from a UC Grid site, then you will want to know how to use Globus Online, the fast, reliable service for file transfer. Globus Online makes it easy for researchers to move data wherever they need it, not only among UC Grid sites but to any location. In this live webcast, attendees will learn how to move files between any two locations, including a UC Grid resource, campus server, or their own laptop. The session also will include a demo of the most frequently used features, tips and tricks, and Q&A with the Globus Online project lead. For more information or to register, visit https://www.globusonline.org/ucla-webcast/

TOP

SAMtools 0.1.17 and BWA 0.5.9

SAMtools SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. bcftools are included with the samtools installation. [SAMtools home page: http://samtools.sourceforge.net]

SAMtools version 0.1.17 is located in /u/local/apps/samtools/current/bin/ directory. Perl scripts are in /u/local/apps/samtools/current/misc/

To use samtools, at a shell prompt, enter:

module load samtools
samtools

bcftools The view command of bcftools calls variants, tests Hardy-Weinberg equilibrium (HWE), tests allele balances and estimates allele frequency. bcftools are located in /u/local/apps/samtools/current/bcftools/ See the README file there for more description.

For more information, see SAMtools manual page: http://samtools.sourceforge.net/samtools.shtml

BWA Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, bwa-short and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp. Both algorithms do gapped alignment. They are usually more accurate and faster on queries with low error rates. [BWA home page: http://bio-bwa.sourceforge.net]

bwa version 0.5.9 and perl scripts qualfa2fq.pl and solid2fastq.pl are located in:

/u/local/apps/bwa/current/

For more information, see BWA manual page: http://bio-bwa.sourceforge.net/bwa.shtml

If you have any questions or problems using samtools or bwa, please send email to ATS High Performance Consulting atshpc@ucla.edu

TOP

LONI Pipeline

The Pipeline Environment is a free workflow application for computational sciences. Pipeline users can navigate and discover existing end-to-end workflow solutions, develop new module descriptions and processing graphical protocols, execute complex heterogeneous analyses, take advantage of distributed hardware infrastructure and databases, validate and openly disseminate the provenance of their data and processing protocols. See:

http://pipeline.loni.ucla.edu/
Available Imaging and NGS Analysis Pipeline Workflows

There is a wide array of informatics and genomics modules and workflows that are already developed (e.g., BLAST, EMBOSS, mrFAST, GWASS, PLINK, R, MAQ, SAMtools, Bowtie, CNVer, QC, GATK, etc.) Additional workflows are constantly added with help from colleagues across the globe. See:

http://www.loni.ucla.edu/twiki/bin/view/LONI/Pipeline_GenomicsInformatics
Try the Pipeline Environment online

You can try the Pipeline workflows and computational environment via your java-enabled browser without any special software installation, hardware requirements or account privileges:

URLs
References

TOP

10Gb File Transfer with Globus Online and dtn1

The Hoffman2 Cluster production Endpoint ucla#dtn1 with its 10 gigabit ethernet connection is now available for Globus Online file transfer. Please see:

High-speed file transfer with Globus Online

The Globus Online organization will be hosting a webinar for us in the near future. We will announce it when the date has been set.

Here are a few sites that are part of Globus Online. If you have an account at one of these sites, you can use Globus Online to transfer data there.

  • Biomedical Informatics Research Network (BIRN) at UCI, UCLA, UCSD, UCSF, and many others across the country.
  • Earth System Grid (ESG)
  • Energy Sciences Network (ESNET) at Argonne, Lawrence Berkeley, and Brookhaven National Laboratories
  • National Energy Research Scientific Computing Center (NERSC)
  • National Grid Service (UK Academic Computing Grid) at many locations
  • Open Science Grid (OSG) at many locations
  • Structural Biology Grid (SBGRID) at many locations
  • TeraGrid (TG) resources
  • UCLA Hoffman2 Cluster (UCLA)

TOP

High-speed file transfer with Globus Online

The Hoffman2 Cluster is pleased to offer high-speed file transfer service through Globus Online. Globus Online is a software tool to transfer files across the web in a reliable, high-performance and secure way. It provides fault-tolerant data transfer for large files. It has a simple, user-friendly web interface. It is appropriate for transferring large files.

You can use Globus Online to transfer files either between your desktop machine and a remote machine like the Hoffman2 Cluster, or between two remote machines on which you have accounts. For example if you also have an account at Argonne National Lab, you can use Globus Online to transfer files between your Hoffman2 and Argonne accounts. The remote site must also be a Globus Online participant and have grid-enabled your account there. File transfer between private desktop machines is not available at this time.

You need to use your UCLA/UC Grid account in order to transfer files to or from the Hoffman2 Cluster.

The current demonstration service ucla#grid4 uses a GigE connection. We have 10Gb hardware on order for the production ucla#dtn1 service and will announce it in the near future.

See High-speed file transfer with Globus Online for information on how to setup and use Globus Online with the Hoffman2 Cluster.

TOP

Connecting to Hoffman2 with NX Client

You can use an NX client on your local machine to connect to the Hoffman2 Cluster and run Hoffman2 Cluster graphical applications. NX provides near-local speed application responsiveness over high-latency, low bandwidth links.

NX is a secure, compressed protocol for remote X Window System connections. It is an alternative to running an X Server on your local machine. An NX server runs on Hoffman2 Cluster login nodes; you run an NX client on your local machine.

There are free NX clients available from NoMachine (www.nomachine.com) for Windows, Linux, Mac OSX, and Solaris.

See NX Client for more information about using NX to access the Hoffman2 Cluster.

Documentation available at http://www.nomachine.com/documents.php

TOP

CERN ROOT 5.30.00

CERN ROOT program has been updated to version 5.30.00. Version 5.26.00 is also available.

ROOT is an object-oriented program and library developed by CERN. It is a framework for data processing. "It was originally designed for particle physics data analysis and contains several features specific to this field, but it is also used in other applications such as astronomy and data mining." [ http://en.wikipedia.org/wiki/ROOT (27 June 2011)]

To use the CERN ROOT C++ interpreter:

  1. Use qrsh to obtain a session on a compute node
  2. At the compute node shell prompt, enter: module load cern_root
  3. At the compute node shell prompt, enter: root

See How to Run CERN ROOT.

For more information, please refer to http://root.cern.ch An extensive Users Guide is available from that site.

TOP

Hoffman2 Cluster is back online!

The Hoffman2 Cluster is now open to login and jobs, and the queues for long-running jobs have been restored.

Some of the major tasks accomplished:

  • Temporary local storage on compute nodes (SGE $TMPDIR space) has been increased to at least 10GB per processor.
  • The /u/scratch filesystem has been moved to our newest and fastest storage and increased in size to 50TB.
  • Final replication of the BlueArc storage has completed. The BlueArc's firmware has been upgraded and it has been reconfigured.
  • The operating system on the compute nodes has been upgraded to Centos 5.6. Firmware and BIOS updates on the compute nodes.
  • Internal network extensions and hardware upgrades.

These changes should ensure the continued high performance and reliability of the Hoffman2 Cluster into the future. If you notice anything that doesn't seem correct please let us know. Thank you for your patience during this upgrade.

TOP

Hoffman2 Cluster Downtime Starts June 13 9PM

There will be a scheduled outage Monday June 13th 9PM 2011 through Wednesday June 15th 9PM of the Hoffman2 Cluster. We have delayed this maintenance until after the end of the current quarter so as to minimize inconvenience to our users.

During these 48 hours we will complete BlueArc incremental copying to permanent storage, upgrade the cluster operating system to Centos 5.6, enlarge the local storage space on compute nodes and perform other necessary maintenance.

The time for long-running jobs will be dynamically reduced beginning May 30th. After that date, SGE will not accept jobs that cannot complete before the outage begins. Once a job is accepted, SGE will let it start only if it can complete before 9PM June 13th.

The express queue will be available until 7PM June 13th for jobs:

qsub -l expresss [other qsub options] your-executable-or-script
The interactive queues will be available until 7PM June 13th for sessions:

qrsh -l i [other qrsh options]

If you have any questions or problems, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Spring 2011 Parallel Computing Classes

We are pleased to announce the parallel computing workshop series for this quarter. The topics include: how to efficiently use the Hoffman2 Cluster, how to port a serial code to multicore platforms using OpenMP, how to write distributed-memory programs using MPI, and how to use CUDA to run programs on graphics processors. These are free classes.

  • Using the Hoffman2 Cluster, April 27, 2 p.m.
  • Introduction to Parallel Computing and OpenMP, May 3, 2 p.m.
  • Introduction to MPI, May 5, 2 p.m.
  • Parallel Computing Lab, May 6, 2 p.m.
  • Introduction to CUDA, May 12, 2 p.m.

The class descriptions are within the Event Details links at http://www.idre.ucla.edu/ (under Upcoming Events). If you have questions, please email atshpc@ucla.edu

TOP

Hoffman2 File System Performance Issue, April 27th outage

Over the last few weeks the Hoffman2 Cluster has experienced intermittent storage performance dropouts caused by our BlueArc file server. We believe these issues were caused by an error in the BlueArc storage server firmware.

BlueArc Corp. has provided us with new firmware to fix this problem, and it has been installed. However we find that in order to restore system performance, we must reformat all of our storage. This is similar to "defragging" a hard drive on a laptop, but orders of magnitude larger. (See Hoffman2 Cluster Hardware - BlueArc for more information about the BlueArc storage server.)

BlueArc Corp. will be lending us additional hardware so that we can accomplish the reformat, and we expect to receive that some time next week. As soon as it is installed, we will begin duplicating files to it. This will not affect your use of the Cluster.

Wednesday April 27th we will close the Cluster to logins and jobs, so that no files will change while final incremental copying is done. The long-running (highp) queues will be shortened gradually so that by April 27th all but long-running jobs started today (April 15th) will have completed.

After completion of the April 27th outage, we expect to see an immediate improvement in file system performance. At this time we will also migrate the /u/scratch filesystem to our newest and fasted storage. We expect scratch performance to be much improved as a result. In addition, the size of the scratch filesystem will increase by 2.5 times.

Files will then be copied back to permanent storage on our BlueArc. We expect this to take several days, but again this will not affect your use of the Cluster. We will then close the Cluster for a second time to complete the second round of incremental copying.

We would like to reassure everyone that no files or data have been lost. The issue is file system performance, which has affected our high performance system as a whole.

TOP

Mathematica 8 available

Mathematica version 8 is now available on the Hoffman2 Cluster and the UCLA Grid Portal (http://grid.ucla.edu) for both job scheduler batch (math.q) and interactive use. It is available as both serial and parallel applications.

Mathematica is a scientific program which features mathematical computation, symbolic manipulation, a large collection of numeric functions, graphics and a high-level programming language.

Mathematica documentation is available online from the vendor Wolfram at http://reference.wolfram.com/mathematica/guide/Mathematica.html

Please see How to Run Mathematica on ATS-Hosted Clusters for detailed information.

TOP

New MATLAB v7.11

MATLAB has been upgraded to version 7.11 (R2010b). You should use matlab.q to rebuild any job scheduler command files (jobname.cmd) that you may have from previous versions.

MATLAB is a language for technical computing that combines numeric computation, advanced graphics and visualization, and a high-level programming language.

Please see How to Run Matlab on ATS-Hosted Clusters for more information. MATLAB documentation is available from the vendor at http://www.mathworks.com/access/helpdesk/help/helpdesk.html

If you have any problems or questions, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Amber11 available

Amber11 and Amber Tools v.1.4 are now available on the Hoffman2 Cluster for both job scheduler batch (amber.q) and interactive use. Amber11 is available as both serial and parallel applications through the UCLA Grid Portal

Amber Molecular Dynamics Package is a set of molecular mechanical force fields for the simulation of biomolecules and a package of molecular simulation programs.

The Amber 11 User's Guide is available from http://ambermd.org/doc11/Amber11.pdf Amber Tools User's Guide is available from http://ambermd.org/doc11/AmberTools.pdf

Please see How to Run Amber on ATS-Hosted Clusters for detailed information.

TOP

Hoffman2 app from Market.Android.com

The UCLA Hoffman2 Cluster free app for your Android phone is now available from http://market.android.com

Released last August, it features:

  • Submit Job and applications
  • Check All Job status including the one submitted from command line.
  • Touch-based Data Manager
  • Upload/Download/Delete/Email files
  • and more...

See Hoffman2 on Android Phone Instructions for more information.

TOP

New Default Job Time Limit

If you do not specify a time limit for your job or interactive session, the job scheduler will end it after two hours. The reason for this new limit is to protect the job scheduler and allow it to back-fill and schedule jobs more efficiently.

You should always specify a time limit for your jobs and interactive sessions. If you are using a queue script to create your job scheduler command file (job.q, matlab.q, mpi.q, etc.), it will prompt you for a time limit value. If you are using a native job scheduler qsub or qrsh command, add h_rt (or time) to your other parameters.

Example of submitting a job with an 8-hour time limit:

qsub -l h_rt=8:00:00,other-parameters

Example of starting an interactive session on the interactive nodes with a 4 hour time limit:

qrsh -l i,time=4:00:00,other-parameters

If you have any problems or questions about this new requirement, please send email to ATS High Performance Computing at atshpc@ucla.edu

TOP

Winter 2011 Free Parallel Computing Classes

We are pleased to announce the parallel computing workshop series for this quarter. The topics include how to efficiently use the Hoffman2 Cluster, how to port a serial code to multicore platforms using OpenMP, writing distributed-memory programs using MPI, and GPU computing using CUDA. The schedules and locations of these free classes are:

  • Feb 2, 2 to 4 p.m., Using the Hoffman2 Cluster, 5628 Math Sciences
  • Feb 8, 2 to 4 p.m., Introduction to Parallel Computing and OpenMP, 5628 Math Sciences
  • Feb 10, 2 to 4 p.m., Introduction to MPI, 5628 Math Sciences
  • Feb 11, 2 to 4 p.m., Parallel Computing Hands-on Lab, CLICC Classroom B (320B Powell)
  • Feb 16, 2 to 4 p.m., Introduction to CUDA, 5628 Math Sciences

To sign up for these free classes, please go to http://www.idre.ucla.edu and see the Upcoming Events column.

TOP

Express queue is production

The express queue, which was announced November 9th, has shed it experimental status. It has proved tremendously popular, accounting for over 90% of jobs run in December 2010 on the Hoffman2 Cluster. When you use express.q your jobs may start within a few minutes of submission.

It is no longer limited to evenings and weekends, so you do not have to tell qsub -w n when you submit a job to this queue.

The express queue is limited to serial jobs, or array jobs, or shared memory parallel jobs that run on a single node (shared parallel environment). You cannot request more than 2 hours of wall-clock time. Parallel multi-node jobs will not be able to use this queue. You cannot use this queue to run on your resource group's own nodes (highp).

How to request the express queue
  1. Make sure that each of your jobs will finish in 2 hours. Jobs submitted to this queue will be unconditionally terminated after 2 hours.
  2. To submit jobs to this queue, add "-l express" in addition to other parameters you typically use. Request time of no more than 2 hours.
  3. To direct an existing pending job to this queue, use
    qalter -l express JOB_ID
    where JOB_ID is its job id as shown by qstat -u $USER or myjobs script.
Sample job submission scenarios

For a simple serial job requesting one hour

qsub -l express,h_rt=1:00:00 myjob.cmd

For a simple array job requesting one hour per jobtask

qsub -l express,h_rt=1:00:00 -t 1:1000 myjobarray.cmd

If you do not request time, your job will run for maximum 2 hours set by the queue and will then be terminated.

If you have questions, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Queue scripts enhanced

The queue scripts that help you build a job scheduler command file (job.q, mpi.q, matlab.q, etc.) have been enhanced to support the recent job scheduler queue reconfiguration.

  • If you belong to more than one resource group which has contributed nodes to the Hoffman2 Cluster, you can direct your job to use a particular group's nodes. (-u and -rg resource-group command line options).
  • If your resource group owns nodes in both the IDRE and MSA data centers, you can choose where your job will run. (-u and -dc data-center command line options).

Additional support has been added to the job scheduler for parallel jobs that request a large number of cores. (-u and -n parallel-tasks command line options).

For more information, select the new Info function from an interactive queue script's menu, or enter man queue at the shell prompt, or point your browser at Queue Scripts.

If you have any problems or questions, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Hoffman2 Scheduled Downtime

There will be a scheduled downtime for Hoffman2 from Wednesday, December 15th, 8:00am through Thursday, December 16th, 12 noon. During this period we will be doing the following upgrades and fixes:

  1. Replace two failing electrical breakers that power portions of Hoffman2 in the MSA Data Center.
  2. Reconfigure the BlueArc storage system to maximize performance and allow for future storage service offerings.
  3. Add a new management capability to our Infiniband fabric.
  4. Apply patches to the Hoffman2 OS.
  5. Reconfigure some equipment to allow for future growth of the cluster.

We will make every effort to keep the downtime as short as possible. In all probability we will not need the entire time scheduled but we want to insure we get all this work completed.

In the future we will be posting a schedule for quarterly maintenance downtime so it will be easier for our users to plan for these outages.

Holiday Shutdown Information

Hoffman2 will be up and fully operational during the holiday shutdown however there will not be any ATS personnel onsite during this time. We will do our best to insure the cluster remains in operation but, depending on the issues that arise, we will not be addressing them until after January 3rd.

TOP

TeXLive version 2010 available

The 2010 version of TeXLive has been installed. TeXLive is a comprehensive TeX/LaTeX system. TeX is a document-typesetting system commonly used to create scientific and mathematical literature. LaTex is a document markup language and document preparation system for the TeX typesetting program.

To load texlive into your environment, at the shell prompt, enter:

module load texlive

This will load TeXLive version 2010 for your current login session. To load a specific version, use:

module load texlive/2009

or,

module load texlive/2010

To see help information, at the shell prompt, enter:

module whatis texlive

or,

module whatis texlive/2010

Please see http://www.tug.org/texlive/ for more information about TeXLive. www.tug.org is the authoritative site for all TeX documentation.

This faq may be a useful place to start to learn about TeX:

http://www.tex.ac.uk/cgi-bin/texfaq2html?introduction=yes#questions

TOP

R version 2.12.0 available

R and its extension packages haves been upgraded to version 2.12.0 To see all packages which are installed, at the R command prompt issue the command:

> library()

R is GNU S, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage (http://www.r-project.org/) for further information and documentation.

See How to Run R on ATS-Hosted Clusters.

If you have any questions or problems, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Experimental express queue

Over the past few months we have observed cluster utilization going down during weekends and nights. In order to offset that and boost the cluster's utilization rate, we have set up an experimental queue that will allow lots of short-time jobs to go through more quickly with minimum or no effects on the existing shared cluster queues. The experiment may be modified or terminated in the future depending on the cluster utilization statistics. Currently, this queue is scheduled to receive jobs between 6:00 PM and 6:00 AM and weekends.

At this time, the experiment is limited only to serial jobs or array jobs with wall-clock time limit of 2 hours. Parallel (multi-node) jobs will not be able to use this queue. If your jobs belong this category, you are encouraged to try it out. You are likely to observe substantially higher throughput.

  1. Make sure that each of your jobs will finish in 2 hours. [Jobs submitted to this queue will be unconditionally terminated after 2 hours.]

  2. To submit jobs to this queue, add -l express in addition to other parameters you typically use with the exception of highp complex. Also request for time less than 2 hours.

    Example Job submission scenarios:

    For a simple serial job requesting one hour

    qsub -l express,h_rt=1:00:00 -w n job.cmd

    For a simple array job requesting one hour per jobtask

    qsub -l express,h_rt=1:00:00 -t 1:1000 -w n jobarray.cmd

    If you don't request the hours, by default job will run for maximum 2 hours set by the queue and will be terminated. The -w n option asks SGE not to verify whether the calendar queue is on or off when you are submitting the jobs as it is on only during certain hours, but you should be able to submit jobs at any time to this queue.

  3. To direct your existing pending jobs to this queue, use

    qalter -l express JOB_ID
  4. where JOB_ID is the job id shown by qstat -u $USER or run myjobs

If you have questions, please email atshpc@ucla.edu

TOP

Upcoming Free Parallel Computing Classes

ATS is offering the following free 2-hour long classes during the Fall Semester. Individual registration is required for each class. To sign up for any of these classes, point your browser at http://www.idre.ucla.edu

October 27, 2-4 p.m. Using the Hoffman2 Cluster 5628 Math Science
November 1, 2-4 p.m. Introduction to Parallel Computing and OpenMP 5628 Math Science
November 3, 2-4 p.m. Introduction to MPI 5628 Math Science
November 8, 1-3 p.m. Parallel Computing Lab CLICC Classroom C (320 Powell)
November 9, 2-4 p.m. Introduction to CUDA 5628 Math Science

 

TOP

Hoffman2 on your Android Mobile Device

We are very pleased and proud to announce the Hoffman2 app which will run on your Android mobile device. It offers similar features to the Hoffman2 iPhone, iPod, iPad app which was announced in July.

You can use this app to submit and monitor the status of your Hoffman2 jobs, and view job output. It also has file manager functions which let you upload and download files.

For information on how to use the Hoffman2 Android app, please see:

http://www.ats.ucla.edu/clusters/hoffman2/android_app

You can download the Hoffman2 Android app free from the Google Marketplace app on your Android. Search for UCLA Hoffman2. Download the app called UCLA Hoffman2 Cluster.

You must have a Hoffman2 login id and a Grid account in order to use the Hoffman2 Android app. A Grid account is automatically created when you apply for a Hoffman2 login id. This is the same grid username and passphrase you use when you access the Hoffman2 over the web at http://grid.ucla.edu or http://portal.ucgrid.org

If you have any problems using the Hoffman2 app on your mobile device or find any bugs, we want to know about them. If you have suggestions for improvements or additional features, we want to know that too. Please send email to atshpc@ucla.edu

TOP

Quantum Espresso version 4.2.1 available

Quntum ESPRESSO v 4.2.1 is now available on the Hoffman2 Cluster for SGE batch execution. Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft). More information can be found at http://www.quantum-espresso.org/

Please see How to Run Quantum ESPRESSO on ATS-Hosted Clusters for detailed information.

The Quantum Espresso documentaion is available on-line at http://www.quantum-espresso.org/user_guide/user_guide.html

TOP

Home Directory 20GB Quota and Job Scratch Space

The quota on campus users' home directories has been increased to 20GB (gigabytes). It was formerly 10GB. To see your quota usage, at the shell prompt, enter:

myquota
Job Scratch Space

Please consider running your high I/O jobs from the /u/scratch directory. There is over 7 terabytes of global scratch space which is NFS-mounted over all compute and login nodes. Because /u/scratch resides on the faster fiber-channel-attached disks, for performance reasons writing to /u/scratch is a much better idea than writing to your home directory.

To use /u/scratch make a directory there named with your userid and place your files in it. Change into your /u/scratch/[userid] directory before running job.q, mpi.q or other queue scripts so that all file paths are set correctly in your SGE command file.

Here are some guidelines:

  • All files that need to be saved at the end of the calculation, including checkpoint files, should be written to /u/scratch.
  •  
  • Temporary files that are generated during the run and deleted at the end of calculation should be written to SGE's local $TMPDIR directory. Please see How to use SGE scratch directory for file I/O

There is a 2TB per user limit on the /u/scratch filesystem. Under normal circumstances, files stored in /u/scratch are allowed to remain there for 7 days. Any files older than 7 days may be deleted automatically by system cleanup routines.

If you have any problems or questions, please send email to atshpc@ucla.edu

TOP

Hoffman2 on your iPhone, iPod, iPad

We are pleased and proud to announce the Hoffman2 app which will run on your iPhone, iPod touch or iPad mobile device. You can use this native app to submit and monitor the status of your Hoffman2 jobs, and view job output. It also has file manager functions which let you upload and download files.

For information on how to use the Hoffman2 app, please see:

Hoffman2 on iPhone, iPod, iPad Instructions

You can download the Hoffman2 app free from the iTunes App Store (icon) on your iPhone, iPod touch or iPad, or click this link in email that has been sent to your mobile device: itms://itunes.apple.com/us/app/hoffman2/id380521367?mt=8

Using the Hoffman2 app requires iOS 3.1.2 or later on your mobile device. You also must have an UCLA/UC Grid account. This is the same grid username and passphrase you use when you access the Hoffman2 over the web at http://grid.ucla.edu or http://portal.ucgrid.org

If you are just curious, read about the Hoffman2 app at http://itunes.apple.com/us/app/hoffman2/id380521367?mt=8 Here is an excerpt from that description:

  • Submit Serial or Parallel Job to Hoffman2 cluster

  • Submit application to Hoffman2 cluster such as Matlab, Gaussian, Q-Chem

  • Save job for later submission

  • Check Job Status
    • list of jobs submitted from this device
    • list of Running/Pending jobs
    • get job stdout and stderr
    • get job details

  • File Manager
    • Download a file from Hoffman2 to this device
    • Upload file from this device to Hoffman2
    • Create a new directory on Hoffman2
    • Delete a file on Hoffman2
    • Delete an empty directory on Hoffman2
    • Email a file
    • View a local file

If you have any problems using the Hoffman2 app on your mobile device or find any bugs, we want to know about them. If you have suggestions for improvements or additional features, we want to know that too. Please send email to atshpc@ucla.edu

TOP

Hoffman2 is online

We are pleased to announce that the Hoffman2 is now back online and available for use.

As we announced several weeks ago, we have upgraded, replaced and/or moved a large number of nodes as well as upgraded the cluster's operating system and installed a new version of the Sun Grid Engine. You should not experience any problems with the system, but if you do, please let us know immediately at atshpc@ucla.edu

TOP

OpenMPI updated

OpenMPI has been upgraded to 1.4.2 and is available in the /u/local/intel/11.1/openmpi/1.4.2 directory.

You will need to recompile your code if you build your SGE command file with mpi.q

If you have any questions or problems, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

ABAQUS upgraded

ABAQUS has been upgraded to version 6.10

ABAQUS is a suite of general-purpose, nonlinear finite element analysis (FEA) programs for stress, heat transfer, and other types of analysis. ABAQUS runs a wide range of linear and nonlinear engineering simulations.

Please see How to Run ABAQUS on ATS-Hosted Clusters

Abaqus manuals are not available online. Contact atshpc@ucla.edu if you require the manuals. The consutants will reply during normal business hours.

TOP

Maple upgraded

Maple has been upgraded to version 14.

Maple is a symbolic and numeric mathematical program. Major areas include: algebra, calculus, differential equations, linear algebra, and statistics. Integrated visualization.

Please see How to Run Maple on ATS-Hosted Clusters

Maple documentation is available from the vendor at http://www.maplesoft.com/documentation_center/

TOP

MATLAB upgraded

MATLAB has been upgraded to version 7.9b

MATLAB is a language for technical computing that combines numeric computation, advanced graphics and visualization, and a high-level programming language.

Please see How to Run Matlab on ATS-Hosted Clusters

MATLAB documentation is available from the vendor at http://www.mathworks.com/access/helpdesk/help/helpdesk.html

TOP

Tecplot 360 upgraded

Tecplot 360 has been upgraded to version 2009.2.

Tecplot 360 is a program for the visualization and animation of scientific and engineering data. It includes support for CFD-type data.

Please see How to Run Tecplot on ATS-Hosted Clusters

Tecplot documentation is available from the vendor at http://www.tecplot.com/Support/Documentation.aspx

TOP

Hoffman2 Upgrade June 22-24, 2010

Hoffman2 will be taken offline for for maintenance and hardware upgrades from 12:01am June 22 through 11:59pm June 24, 2010. We understand that this outage affects your productivity and take that very seriously. However, we have come to a point where we have done everything we can and the remaining work that needs to be performed cannot be accomplished without taking the system offline. There are four aspects to the work we will be performing during the outage.

  1. We will be replacing and upgrading a large number of nodes in our IDRE data center. This will bring all of the nodes in the IDRE DC to at least 16GB and almost a third to 32GB. We will also be increasing the number of cores in the IDRE DC by almost 400.
  2. We will be bringing a newly installed row of racks for Hoffman2 expansion online in our Math Science data center. This new row will provide up to an additional 96 permanent nodes in the MSA DC and up to another 96 temporary nodes while we await the installation of a containerized data center in the fall. Additionally, we will be installing new shared cluster contributor nodes in the MSA DC at this time.
  3. We will complete the move of our BlueArc storage system from the IDRE DC to the MSA DC. This work has been ongoing behind the scenes for some time and included moving our tape robot as well. The final steps require us to have everyone off of the system to perform the final reconfiguration.
  4. We will be upgrading the operating system on the cluster to CentOS 5.5, adding an additional login node and making some other minor hardware adjustments.

Thank you for your patience and understanding,
UCLA Academic Technology Services

TOP

IDRE Offers Summer Computational Science Courses

IDRE will host three onsite computational science courses offered by the Virtual School of Computational Science and Engineering (VSCSE). Graduate students, post-docs and professionals from academia, government and industry can gain the skills they need to leverage the power of cutting-edge computational resources at these courses, which are being offered for a $100 per-course fee. Each class is five full days and will take place on the UCLA campus. Snacks and an evening reception will be provided; participants are responsible for travel and lodging costs. Scholarships are available. Register at http://www.idre.ucla.edu/vscse2010/

TOP

Queue scripts enhanced

The master queue script has been enhanced to better support people who are in more than one Sun Grid Engine access list. When you use a queue script like job.q to build an SGE cmd file, now it will ask you something like:

Enter the name of the resource group you want to use in this job
(campus cnsi, default cnsi)
<or quit>:

You can press Enter to accept the default.

The resource group that you choose lets the queue script provide appropriate values for default and maximum memory. It does not guarantee that your job will use those resources. For example if your job requests 1024M memory and 1 hour, it will run where those resources are available soonest.

There is a new command-line option -rg for use in non-interactive mode. For example:

job.q -rg resource_group [other queue options] myprogram-or-script

where resource_group is the name of an SGE access list.

If you have any questions or problems, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

job.q and jobarray.q scripts enhanced

The two queue scripts, job.q and jobarray.q, have been enhanced to allow you to use more memory or more cores for your serial jobs.

job.queue

When the job.q script asks you for a memory request for your job, it will show a larger maximum memory value. For campus users, the new maximum memory size is 8,192 megabytes. If you belong to a group which has purchased high memory nodes, it will be larger.

The job.q script can reserve more memory for your serial job by telling the Sun Grid Engine to reserve additional slots on a single node. If you request more than the default amount of memory (1024 megabytes for campus users), your job may wait longer before starting while SGE waits for additional slots to become available.

jobarray.queue

Array jobs are serial jobs or multi-threaded jobs that use the same executable but different input variables or input files, as in parametric studies. The jobarray.q script now supports multi-threaded shared memory. It now will ask you:

Enter the number of tasks for your job (1 ≤ n ≤ 8, default 1)
<or quit>:

If your program is not multi-threaded, this new option will not change anything in the Sun Grid Engine command file that the jobarray.q builds for you. You can accept the default value of 1 task per job.

There is a new command-line option -mt for use in non-interactive mode. For example:

jobarray.q -mt n [other queue options] myprogram-or-script

where n is an integer less than or equal to 8.

Please see Running a Batch Job on an ATS-Hosted Cluster

TOP

Package PLINK updated to version 1.08

The package PLINK has been updated to version 1.08. The new version is available in the /u/local/apps/plink/current directory. The older version is still available in the /u/local/apps/plink/1.06 directory.

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

See How to Run PLINK on ATS-Hosted Clusters

PLINK PDF documentation is available from http://pngu.mgh.harvard.edu/~purcell/plink/pdf.shtml

TOP

Interactive sessions and nodes

Interactive nodes that you can access from the login nodes with ssh are no longer available on the Hoffman2 Cluster. If you need to run your program interactively, you can request an interactive session from the Sun Grid Engine. Please see How to Get an Interactive Session through SGE.

You do not need to request an interactive session in order to run commercial programs like Matlab or Mathematica interactively, because they already have qrsh built into their startup scripts.

You can use maximum 8 slots (cores) at the same time in SGE interactive queue sessions. This could be 8 different sessions, or one or more parallel sessions.

If you have any questions or problems using qrsh, please send email to ATS High Performance Computing atshpc@ucla.edu

TOP

Power Outage February 11, 2010

To the Hoffman2 user community,

At approximately 7am on Thursday February 11, 2010 the Hoffman2 cluster suffered a power incident which affected all systems that reside in the IDRE Data Center which is located in the CNSI building. The result of this incident was all jobs running on nodes (n1-n269) were killed. Hoffman2's IDRE component was brought back online by 8 am.

This message is to share what we have learned as to the cause of the incident. After speaking to Facilities Management it appears the power incident affected much of the south end of campus and Westwood. One of the four power feeds that supply the campus from DWP suffered a transient fault which resulted in a voltage drop. A typical explanation for such an event is a palm frond briefly shorting a power line. The reason, apparently, that UCLA would suffer from an apparent DWP problem is the our co-generation facility is unable to provide sufficient power generation to cover the campus' needs during the peak daytime hours leaving the campus vulnerable to these types of events.

Systems that were protected by UPS were unaffected. However, the compute nodes of Hoffman2 are not powered from UPS protected circuits, unfortunately, as the cost would be prohibitive to cover them all.

For these reasons we continue to urge our users to check-point their code, especially for long running jobs. Check-pointing will allow you to re-start your job after an unexpected event like a power fault or even a node hardware failure without losing all of your work. If you need help adding check-pointing to your code contact atshpc@ucla.edu.

If you have any other questions or concerns regarding this incident or any other aspect of the Hoffman2 system please feel free to contact me directly at friedman@ats.ucla.edu

Thank you,
Scott Friedman
Manager, Research Computing Technologies

TOP

New myquota command reports on space, file use

There is a new command myquota which reports on the amount of space used, number of files saved and quota for your userid. At the shell prompt, enter:

myquota -pm

The myquota command has several options, to see them, enter:

myquota --help

We are using the BlueArc Storage Server to provide home directory and scratch directory space. Currently, there is a 10 gigabyte quota limit on your home directory, unless your research group has purchased additional storage. [quota limit raised to 20GB August 25, 2010.]

You are encouraged to run jobs from the /u/scratch directory where there is 6 terabytes of scratch space mounted over all compute nodes. To place files in /u/scratch, make a directory there named with your login id and place your files in it.

Because /u/scratch resides on the faster fiber channel attached disks, it is recommended that for performance reasons you tell your parallel jobs, especially those with high I/O requirements, write to /u/scratch instead of to your home directory.

Under normal circumstances, files stored in /u/scratch are allowed to remain there for 7 days. Any files older than 7 days may be deleted automatically by system cleanup routines.

Point your browser at http://www.ats.ucla.edu/clusters/hoffman2/data_storage for more information about data storage.

TOP

Hoffman2 Cluster Partial Outage Feb 9-10 2010

There will be a partial outage February 9th and 10th to upgrade the Math Science Data Center facility. The outage will affect only the Math Science Data Center. It will not affect the IDRE Data Center.

All users will be able to login and submit jobs during the outage. Jobs submitted to the IDRE Data Center nodes will run as usual. This includes all regular campus group accounts.

Jobs which need to run on the Math Science Data Center nodes -- those belonging to the bern, chemeng, cnsi, hongzhou, margulis, mori, rosenzwe and sfurlane groups -- may not run until after the upgrade has completed. Those jobs may run before the outage starts if there is sufficient time for them to complete before the outage starts. Any jobs running on the Math Science Data Center nodes at the time the outage begins will be terminated.

Interactive qrsh sessions on Math Science Data Center nodes will not be available during the outage. However all users will still be able to start interactive qrsh sessions of maximum 1GB/core on the IDRE Data Center interactive nodes.

The purpose of this outage is to provide for future growth in the portion of the Hoffman2 cluster located in the Math Sciences Data Center. The n2xxx nodes will be taken offline in order to perform relocation and electrical work. The outage of these nodes is scheduled for Tuesday, February 9, 2010 beginning at 12AM and continues through Wednesday, February 10, 2010. We will, however, bring the affected nodes back online as soon as the work completes.

TOP

SGE Upgrade

We are in the process of installing a new version of Sun Grid Engine on the Hoffman2 Cluster. You need to logout of any session you started before noon on Monday January 11th. When you login again you will automatically get the new SGE.

Jobs which are currently running will continue to run. However you will not be able to see their status with the SGE qstat command. The myjobs script will show both your old SGE jobs, if any, and your new SGE jobs.

You will need to resubmit any jobs which are waiting to run. Newly submitted jobs will be dispatched by the new SGE as soon as the new queues are ready; you may experience some delays today while we are in transition to the new system.

Interactive scripts which use qrsh, and the qrsh command itself, also will not work until the new queues are ready.

TOP

Gaussian 09 available

Gaussian 09, Revision A.02 is now available on the the Hoffman2 Cluster. It supports parallel distributed memory execution (Linda-parallelism) and shared memory execution (SMP-parallelism). GaussView 5 is also available.

Gaussian provides state-of-the-art capabilities for electronic structure modeling. GaussView is a graphical interface available for Gaussian. With GaussView, you can import or build the molecular structures that interest you, set up, launch, monitor and control Gaussian calculations, retrieve, view and visualize the results, GaussView 5 provides comprehensive support for importing and working with structures from PDB (protein data base) files.

See How to Run Gaussian on ATS-Hosted Clusters There are new queue scripts to invoke Gaussian 09.

See also How to Run GaussView on ATS-Hosted Clusters. There is new information about GaussView and how to run Gaussian from within GaussView.

Gaussian 09 and GaussView 5 documentation is available at http://www.gaussian.com/ Gaussian 09 Release Notes are available at http://www.gaussian.com/g_tech/rel_notes.pdf

TOP

Quantum ESPRESSO v 4.1.1 available

Quntum ESPRESSO v 4.1.1 is now available on the Hoffman2 Cluster for SGE batch execution. Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving and ultrasoft).

Please see How to Run Quantum ESPRESSO on ATS-Hosted Clusters for detailed information.

The Quantum ESPRESSO documentaion is available on-line at http://www.quantum-espresso.org/user_guide/user_guide.html

TOP

Stata 11 available

The Stata 11 is now available on the Hoffman2 Cluster for both SGE batch and interactive applications. Stata is a complete, integrated statistical package that provides a broad suite of statistical capabilities, complete data-management facilities, and publication-quality graphics.

Please see How to Run Stata on ATS-Hosted Clusters for detailed information.

Stata manuals are not available online. They may be borrowed from ATS Statistical Consulting http://www.ats.ucla.edu/stat/books/#Stata

TOP

LAMMPS version 7Sep09 is now available

LAMMPS version 7Sep09 is now installed on the Hoffman2 cluster in the /u/local/apps/lammps/current directory. The software can be run as a parallel application.

LAMMPS, Large-scale Atomic/Molecular Massively Parallel Simulator, is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. It was developed at Sandia National Laboratories, a US Department of Energy facility, with funding from the DOE. The developers of LAMMPS are Steve Plimpton, Paul Crozier, and Aidan Thompson who can be contacted at sjplimp, pscrozi, athomps at sandia.gov.

LAMMPS is an open-source code, distributed freely under the terms of the GNU Public License (GPL).

See How to Run LAMMPS on ATS-Hosted Clusters

LAMMPS documentation is available at http://lammps.sandia.gov/doc/Manual.html

TOP

Intel compiler upgrade

The default Intel c and fortran compilers on Hoffman2 are now version 11.1

For more information, see Languages and Compilers and the documentation links on that page.

TOP

Priority for your jobs

Note: This new job scheduling feature is available to people who are in a sponsored group which has contributed cores to the Hoffman2 Cluster. If you are not in a sponsored group, the new highp complex will not work for you; using it will cause qsub to reject your job.

Now you can tell the Sun Grid Engine to run your job on your own group's allocated cores. This modification, which adds "urgency" to the queue definitions, is designed to ensure that your group's allocated cores are given back to your group within 24 hours.

To use your group's own allocated cores, specify the new SGE complex highp either in your SGE command file or as an argument to qsub. Examples:

#$ -l highp,h_data=1024M,h_rt=24:00:00

or,

qsub -l highp [other arguments]

Currently both the new highp queues and the former queues are available to your jobs. After the old queues are disabled, if you don't specify the highp complex your job will be sent to the shared cluster nodes which have a maximum time limit of 24 hours.

How to decide whether to run your job on your group's own nodes or the shared cluster?
  1. If your job requires more than 24 hours, you must specify the highp complex.

  2. The qquota command will tell you how many cores are currently in use by your group. Example of qquota output:
    resource quota  rule limit   filter
    -----------------------------------------------------------------
    queue_limits/3  slots=67/72   users @groupname queues *d_4g_b.q
    rulset1/3       slots=85/144  users @groupname hosts @msa-amd_04g
    

    In the above example, there are 5 slots available in the group's own allocation and 59 available in the group's shared cluster allocation -- it does not mean there are any cores free. So in case your job specifies highp and requests 72 or fewer cores, the job will be released to run on your group's allocated nodes once your priority becomes first within your group. In case you do not specify highp, your job will go to the shared queue (max. 144 cores in this example) and may run or wait for other nodes to be free before it starts.

  3. If you rebuild your SGE command files with a queue script (e.g., job.q, mpi.q) then if highp is required, the queue script will supply it. The queue scripts will ask you if you want to use your own group's cores for jobs requesting 24 hours or less. For people who use the queue scripts in non-interactive mode, there is a new queue script command line option -u which you can use to request urgency (highp).

Back-filling

There is a back-filling feature in SGE which is enabled. In order for back-filling to work, you should specify your best guess of the time required for your job and not take the default of 24 hours or 14 days.

If you have any questions or problems, please send email to ATS High Performance Computing at atshpc@ucla.edu

TOP

GROMACS v4.0.5 installed

GROMACS version 4.0.5 is now installed on the Hoffman2 cluster in the /u/local/apps/gromacs/4.0.5 directory. The software can be run as a serial and/or parallel application.

GROMACS is an engine to perform molecular dynamics simulations and energy minimization. GROMACS is free software. The entire GROMACS package is available under the GNU General Public License.

See How to Run GROMACS on ATS-Hosted Clusters

GROMACS documentation is available at http://www.gromacs.org/Documentation/Manual

TOP

R version 2.9.1 is now available

R has been upgraded to version 2.9.1 and the following packages have been installed: MCMCpack, Rserve, abind, ape, coda, degreenet, ergm, gee, igraph, latentnet, network, networksis, rgl, scatterplot3d, shapes, sna, and statnet.

To see all packages which are installed, at the R command prompt, issue the command

> library()

R is GNU S, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage http://www.r-project.org/ for further information and documentation.

See How to Run R on ATS-Hosted Clusters

TOP

MATLAB 7.7 available

MATLAB has been upgraded to version 7.7 (Release R2008b).

MATLAB is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.

See How to Run MATLAB on ATS-Hosted Clusters

The Hoffman2 Cluster has licenses for some MATLAB toolboxes. Please see Matlab Toolboxes Available on the Hoffman2 Cluster for a list of available toolboxes.

More MATLAB licenses and more toolboxes are available from the San Diego Super Computer OnDemand service. This is available to you through the UCLA Grid. It is also accessible with the new gapp command at the Hoffman2 Cluster shell prompt.

MATLAB documentation is available on the web at: http://www.mathworks.com/access/helpdesk/help/helpdesk.html

TOP

Power Outage Saturday Sept. 12th

All power was lost to the IDRE and engineering buildings Saturday, September 12th, at about 10:30 AM. All nodes in the IDRE Data Center that were on utility power immediately went down. Power was restored 3 hours later, and by 2:30 PM all nodes and filesystems were back online.

Please send any problem reports to atshpc@ucla.edu

TOP

Hoffman2 Cluster major queue reconfiguration

All users must rebuild their current SGE command files using the appropriate queue scripts (for example: mpi.q, job.q, gaussian.q). This is a major reconfiguration of the queue structure due to increased compute resources.

The most important reasons are:

  1. Campus group users can only request 1GB per core. If your serial job needs more than 1GB, you must add the following SGE active comment to your SGE command file:
    #$ -pe shared 2
  2. There are new parallel environments. We are deprecating some of the old parallel environments and most of the current queues.

  3. We have a new interactive queue with a forced complex to make more interactive sessions available to you. In order to get this queue, campus group users must say:
    qrsh -l interactive
          or
    qrsh -l i

    Sponsored group users do not need to use "-l i" on their qrsh commands because it will use their sponsored group queues which do not have the forced complex.

  4. New nodes are available for CNSI group and some sponsored group users. If you are eligible to use the new nodes, you and your sponsor will receive more information.

We also have enabled qsub checking which will tell you immediately when you submit your job if it will never run, and not accept your job. This may happen because the resources it requests no longer exist. If you do not rebuild your SGE command file, you can expect to see this message:

Unable to run job: error: no suitable queues.

If you find any problems, please send email to atshpc@ucla.edu

TOP

OpenMPI update

OpenMPI has been upgraded to 1.3.3 and is available in the /u/local/mpi/openmpi/current directory.

The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

For further details please refer to: http://www.open-mpi.org/

As per release documentation, users should not be required to recompile their codes.

TOP

PLINK installation

PLINK version 1.06 is now installed on the Hoffman2 cluster in the /u/local/apps/plink/1.06 directory.

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

See How to Run PLINK on ATS-Hosted Clusters

PLINK PDF documentation is available from http://pngu.mgh.harvard.edu/~purcell/plink/pdf.shtml

TOP

Free Abaqus Student Edition download

Abaqus is finite analysis modeling, visualization, and process automation software. Abaqus SE (Student Edition) is available to academic students, professors and researchers from the vendor as a free download to your Windows XP Professional SP2, Windows XP Home Edition SP2, or Windows Vista personal computer.

You must register with the vendor, Dassault Systems, in order to be eligible for the free download (a "DS passport").

free download at http://campus.3ds.com/simulia/freese

Abaqus SE includes the core Abaqus products: Abaqus/Standard, Abaqus/Explicit, and Abaqus/CAE. The maximum model size is set to 1000 nodes for both analysis and postprocessing. Access to features requiring compilers (user subroutines, Abaqus Make, C++ ODB API), parallel execution, or add-on products has not been included. Replay and journal files are not available for Abaqus/CAE.

More information at http://www.simulia.com/academics/student.html

TOP

Hoffman2 System Upgrade

During last couple of days the cluster went through a major software as well as key hardware upgrade. The login systems and SGE server have been upgraded by brand new and powerful servers. The modifications were also done in order to allow this cluster to expend beyond the physical limits of current IDRE data center. The cluster is now fully operational and back in service.

Although all of the abovementioned upgrades are transparent to user, yet software stacks like MPI, compiler and OS/run time libraries are new. Therefore you are advised to recompile user codes in order to avoid any unforeseen problems during run time. The codes provided from system side are already been recompiled with the new software stack.

TOP

Hoffman2 Cluster Outage May 27 2009

The Hoffman2 cluster will be down for system and storage upgrades from 8:00AM - 8:00PM on Wednesday, May 27, 2009. You will be unable to login or access the cluster during this period. The cluster will be upgraded to Centos 5.3 with many additional application updates, a faster Grid Engine scheduler, and more.

The maximum time allotted to cluster queues will decrease daily as the outage date nears to prevent unexpected job termination. All jobs still running under the Sun Grid Engine, for example jobs you submitted through UCLA Grid or with a command-line queue script, will be terminated when the cluster is taken down. You will need to resubmit your jobs when the cluster comes back.

TOP

Spring 2009 Parallel Computing Classes

We are pleased to announce the IDRE high performance computing workshops for this quarter. There are three classes:

  • Introduction to Parallel Computing

    Monday, Apr 27, 10am-noon
    5628 Math Sciences (Visualization Portal)

  • Parallel Programming using MPI

    Monday, May 4, 10-noon
    5628 Math Sciences (Visualization Portal)

  • Parallel Computing Lab

    Monday, May 11, 1-3pm
    320B Powell (CLICC Classroom B)

To sign up, use: http://idre.ucla.edu or http://www.ats.ucla.edu/cfapps/events/classes/schedule.cfm

Description of class content is available at http://www.ats.ucla.edu/classes/classdesc.htm#hpc

TOP

GAMESS application Update

The current GAMESS version on Hoffman2 Cluster is "12 JAN 2009 (R1)" and is available in the /u/local/apps/gamess/current directory.

GAMESS is a program for ab initio molecular quantum chemistry. A wide range of quantum chemical computations are possible using GAMESS.

See How to Run GAMESS on ATS-Hosted Clusters.

GAMESS Documentation is available from http://www.msg.chem.iastate.edu/gamess/documentation.html

TOP

CPMD Application Update

The current CPMD version on Hoffman2 Cluster is 3.13.2 and is available in the /u/local/apps/cpmd/current/parallel/ib directory.

CPMD (Car-Parrinello Molecular Dynamics) is an ab initio Electronic Structure and Molecular Dynamics Program. The CPMD code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio Molecular Dynamics simulation.

See How to Run CPMD on ATS-Hosted Clusters

Point your browser at http://www.cpmd.org/ for further information about CPMD and the CPMD consortium.

TOP

FFTW upgrade

The FFTW library has been upgraded to version 3.2. It is installed on the Hoffman2 cluster under the following directory:

/u/local/apps/fftw3/current

FFTW is a C subroutine library for computing the Discrete Fourier Transform (DFT) in one or more dimensions, of both real and complex data of arbitrary size. There are three versions of the fftw3 library depending on precision.

See How to Use the FFTW Library on ATS-Hosted Clusters. For FFTW documentation, see http://www.fftw.org/fftw3_doc/

TOP

HDF5 upgrade

HDF5 has been upgraded to version 1.8.2. HDF5 (Hierarchical Data Format 5) Software Library and Utilities are a suite of data model, file format, API, library, and tools that makes possible the management of extremely large and complex data collections. HDF5 is installed at:

/u/local/apps/hdf5/current

See How to Use the HDF and HDF5 Libraries on ATS-Hosted Clusters. For more information about HDF5, see http://www.hdfgroup.org/HDF5

TOP

Trilinos upgrade

Trilinos has been upgraded to version 9.0.1 It is installed on the Hoffman2 cluster under the following directory:

/u/local/apps/trilinos/current

Trilinos is a set of sophisticated software tools, containing more than 20 library packages. The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems.

For more information about Trilinos, see http://trilinos.sandia.gov/ See How to Use Trilinos on ATS-Hosted Clusters

TOP

Paraview upgrade

Paraview has been upgraded to version 3.4.0. See How to Run ParaView on ATS-Hosted Clusters

ParaView is an open-source, multi-platform data analysis and visualization application. ParaView was developed to analyze extremely large datasets using distributed memory computing resources. See http://www.paraview.org for more information.

TOP

ATLAS upgrade

ATLAS has been upgraded to version 3.8.2. The ATLAS library is installed in: /u/local/apps/atlas/current

See How to Use the ATLAS Library on ATS-Hosted Clusters

ATLAS (Automatically Tuned Linear Algebra Software) provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK. When the ATLAS library is built and installed, it automatically optomizes its performance for whatever system it was built on. In this case, these routines are optimized for the cluster compute nodes.

TOP

Intel compiler upgrade

The default Intel compiler on Hoffman2 is now version 11.0. For more information, see Compilers and the Documentation links on that page.

TOP

October Downtime for Hoffman2 Cluster

The Hoffman2 cluster will be down for an upgrde of the BlueArc file server and maintenance on Thursday, Oct 9, 2008 between 10:00 AM and 3:00 PM. All queues will be turned off 24 hours in advance. Any running jobs at that time will be terminated. We are sorry for any inconvenience it may cause you.

TOP

Fall 2008 Parallel Computing Classes

The Institute for Digital Research and Education (IDRE) is offering the following (free) high performance computing workshops this quarter:

  • Sep. 30, 2-4pm: Introduction to parallel computing
  • Oct. 7, 2-4pm: Parallel computing using MPI
  • Oct. 14, 2-4pm: Parallel computing hands-on session
  • Oct. 21, 2-4pm: Advanced topics of MPI (*)
  • Oct. 28, 2-4pm: HPC tools and libraries (*)

(*)'s are new ones introduced this quarter.

For more information or to sign up, please see
http://www.idre.ucla.edu/ (under "Upcoming Events")
http://www.ats.ucla.edu/cfapps/events/classes/schedule.cfm

TOP

ABAQUS upgrade

ABAQUS has been upgraded to version 6.8. ABAQUS is finite element modeling, visualization, and process automation software. For a description of ABAQUS, download the Abaqus 6.8 brochure pdf file from http://www.simulia.com/products/abaqus_cae.html

The Hoffman2 Cluster has licenses for the following ABAQUS features: abaqus aqua cae design euler_lagrange explicit foundation parallel standard

See How to Run ABAQUS on ATS-Hosted Clusters. ABAQUS documentation is not available on the web. To view ABAQUS documentation, at the shell prompt, enter:

abaqus doc

TOP

Tecplot upgrade

Tecplot 360 has been upgraded to version 2008 release 2. Tecplot 360 is numerical simulation and computational fluid dynamics (CFD) visualization software that combines engineering plotting with advanced data visualization. It has a graphical user interface. It is not available in batch mode. It is available from UCLA Grid interactive tab.

To use Teplot 360, at the shell prompt, enter:

tec360

See How to Run Tecplot on ATS-Hosted Clusters. For more information, see www.tecplot.com/support/360/docs.aspx

TOP

PETSc available

ATS is pleased to announce that PETSc version 2.3.3-p13 has been installed on the Hoffman2 cluster under the following directory:

/u/local/apps/petsc/current

PETSc is a suite of data structures and routines for the scalable, parallel solution of scientific applications modeled by partial differential equations. It employs the MPI standard for all message-passing communication.

See How to Run PETSc on ATS-Hosted Clusters For more information see www-unix.mcs.anl.gov/petsc/petsc-as

TOP

Trilinos available

ATS is pleased to announce that Trilinos version 8.0.8 has been installed on the Hoffman2 cluster under the following directory:

/u/local/apps/trilinos/current

Trilinos is a set of sophisticated software tools, containing more than 20 library packages. The Trilinos Project is an effort to develop algorithms and enabling technologies within an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems.

See How to Run Trilinos on ATS-Hosted Clusters. For more information, see trilinos.sandia.gov

TOP

HDF5 upgrade

HDF5 has been upgraded to version 1.8.1. HDF5 (Hierarchical Data Format 5) Software Library and Utilities are a suite of data model, file format, API, library, and tools that makes possible the management of extremely large and complex data collections.

HDF5 is installed at:

/u/local/apps/hdf5/current

See "How to Use the HDF and HDF5 Libraries on ATS-Hosted Clusters". For more information about HDF5, see http://hdf.ncsa.uiuc.edu/HDF5/

TOP

NetCDF upgrade

NetCDF has been upgraded to version 4.0. NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

NetCDF is installed at:

/u/local/apps/netcdf/current

See "How to Use the netCDF Library on ATS-Hosted Clusters". For more information about NetCDF, see http://www.unidata.ucar.edu/software/netcdf/

TOP

Application Cluster

Over the past few weeks, you may have noticed an increase in the number of slots available in the serial queue. This increase was temporary, while the Infiniband cards were on backorder for the 114 nodes being added to the cluster.

Adding these nodes to the serial queue, allowed Hoffman2 users to get some use out of them and allowed us to do testing and ensure the nodes were running properly prior to adding the Infiniband cards.

These nodes will be shutdown on Tuesday, September 2, 2008 to have their Infiniband cards installed and will be added to the parallel queue that afternoon.

TOP

Hoffman2 Cluster Shutdown and Upgrade August 20, 2008

The Hoffman2 Cluster will be shut down for a short time between 7:00AM-7:00PM on Wednesday, August 20th for a major hardware upgrade to the BlueArc file server.

All campus queues will be disabled on August 18th. Shared-cluster queues will only accept requests that will be finished before August 20th. All jobs running on the cluster will be terminated August 20th.

TOP

Hoffman 2 Cluster Shutdown and Upgrade July 9-11, 2008

The Hoffman2 Cluster will be shut down between Wednesday, July 9 and Friday, July 11 to have additional electrical power installed. The BlueArc file server will also be reconfigured during this period.

This outage is extended to 11AM Friday, July 11th due to unforeseen problems encounted by UCLA Facilities during the power upgrade.

The power upgrade will allow us to more than double the size of the Hoffman2 cluster. We will be adding 115 nodes or 920 cores to the cluster. These nodes were purchased by 5 separate research groups for inclusion in the Shared Cluster portion of Hoffman 2.

TOP

Stata10 available

The multi-processor version of Stata10 is now available on the Hoffman2 Cluster for both SGE batch and interactive applications. Stata10 and Xstata are available as serial applications through the UCLA Grid Portal

Stata is a complete, integrated statistical package that provides a broad suite of statistical capabilities, complete data-management facilities, and publication-quality graphics.

Please see How to Run Stata on ATS-Hosted Clusters for detailed information.

Stata manuals are not available online. They may be borrowed from ATS Statistical Consulting.

TOP

GNU Scientific Library upgraded to version 1.11

The GNU Scientific Library has been upgraded to version 1.11 and is now the default version of GSL on the Hoffman2 Cluster. Point your browser at: GNU Scientific Library for information on how to run GSL on the Hoffman2 Cluster.

The old version 1.9 will remain available for a brief period.

TOP

OpenMPI 1.2.5 becomes the default MPI library

OpenMPI version 1.2.5 is now the default MPI library on the Hoffman2 Cluster, installed at: /u/local/mpi/openmpi/current

Like before, the default MPI compilers (mpiCC, mpic++, mpicc, mpicxx, mpif77, mpif90) are based on the corresponding Intel compilers (icc, icpc, ifort). The mpi.q script will use OpenMPI. The mvapich2.q script will access the former default.

TOP

Hoffman2 Cluster Outage - Monday Apr 14, 2008

Hoffman2 Cluster will be down for several hours on Monday April 14th. You will be unable to login or access the cluster during this period. This outage is to upgrade the firmware on the BlueArc file server.

All jobs running under the Sun Grid Engine, for example jobs you submitted through UCLA Grid, or with a command-line queue script, will be terminated at that time. You will need to resubmit your jobs when the cluster comes back.

TOP

Gnuplot 4.2.3 available

A new version of gnuplot 4.2.3 is now available. For information on how to run gnuplot, see How to Run gnuplot on ATS-Hosted Clusters

Version 4.2 contains a ton of new features, support for several new output devices, and improved performance when plotting large data sets.

Major additions include:

  • Text strings can be read and manipulated as normal data
  • New interactive terminal based on wxWidgets, pango and cairo
  • New 2D plot styles 'histogram' 'labels' 'image' 'rgbimage'
  • New 3D plot styles 'labels' 'vectors' 'image' 'rgbimage'
  • User control over color definitions and color use in plots
  • Improved font handling and text formatting
  • New syntax to handle string variables and string functions
  • Creation of animated gif sequences
  • Support for UTF-8 and other multi-byte font encodings
  • Japanese language documentation and internal help

Demo plots illustrating these and other features are online at:

http://gnuplot.sourceforge.net/demo_4.2/

(description from: http://www.gnuplot.info)

TOP

Hoffman2 Cluster Now Available

The new Hoffman2 Cluster is now installed and available for use.

TOP