ILRI Research Computing

Message Passing Interface (MPI): The Concept

The MPI interface is meant to provide essential virtual topology, synchronization, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few features that are language-specific. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each CPU (or core in a multicore machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec. http://www.open-mpi.org/software/ompi/v1.3/

If you are simply looking for how to run an MPI application, you probably want to use a command line of the following form:

        shell$ mpirun [ -np X ] [ --hostfile <filename> ]  <program>

       This will run X copies of <program> in your current run-time environment (if running under a
       supported resource manager, Open MPI’s mpirun will usually automatically use the correspond-
       ing  resource manager process starter, as opposed to, for example, rsh or ssh, which require
       the use of a hostfile, or will default to running all X copies on the localhost), scheduling
       (by  default)  in  a  round-robin  fashion  by CPU slot.  See the rest of this page for more
       details.

Installation

$ wget http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3.3 
$ tar xfz openmpi-1.3.3.tar.gz 
$ cd openmpi-1.3.3
$ ./configure
$ make && make install

HPC environments are often measured in terms of FLoating point OPerations per Second (FLOPS)

Condor

Machines sit idle for long periods of time, often while their users are busy doing other things. Condor takes this wasted computation time and puts it to good use. The situation today matches that of yesterday, with the addition of clusters in the list of resources. These machines are often dedicated to tasks. Condor manages a cluster's effort efficiently, as well as handling other resources. http://www.cs.wisc.edu/condor/downloads-v2/download.pl

Sun Grid Engine (SGE)

SGE is typically used on a computer farm or high-performance computing (HPC) cluster and is responsible for accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses. http://wiki.gridengine.info/wiki/index.php/Main_Page Basic usage of Grid Engine (commands)

SLURM: A Highly Scalable Resource Manager

SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. https://computing.llnl.gov/linux/slurm/

TORQUE Resource Manager

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S http://www.clusterresources.com/pages/products/torque-resource-manager.php

Platfrom LSF

LSF is implemented as a resource manager for the HPC together with SGE.

Table of Contents