==== Message Passing Interface (MPI): The Concept ====
----
The MPI interface is meant to provide essential virtual topology, synchronization, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few features that are language-specific. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each CPU (or core in a multicore machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called **//mpirun//** or **//mpiexec//**.
http://www.open-mpi.org/software/ompi/v1.3/
If you are simply looking for how to run an MPI application, you probably want to use a command line of the following form:
shell$ mpirun [ -np X ] [ --hostfile ]
This will run X copies of in your current run-time environment (if running under a
supported resource manager, Open MPI’s mpirun will usually automatically use the correspond-
ing resource manager process starter, as opposed to, for example, rsh or ssh, which require
the use of a hostfile, or will default to running all X copies on the localhost), scheduling
(by default) in a round-robin fashion by CPU slot. See the rest of this page for more
details.
=== Installation ===
----
$ wget http://www.open-mpi.org/software/ompi/v1.3/downloads/openmpi-1.3.3
$ tar xfz openmpi-1.3.3.tar.gz
$ cd openmpi-1.3.3
$ ./configure
$ make && make install
HPC environments are often measured in terms of FLoating point OPerations per Second (FLOPS)
==== Condor ====
----
Machines sit idle for long periods of time, often while their users are busy doing other things. **Condor takes this wasted computation time and puts it to good use**. The situation today matches that of yesterday, with the addition of clusters in the list of resources. These machines are often dedicated to tasks. Condor manages a cluster's effort efficiently, as well as handling other resources.
http://www.cs.wisc.edu/condor/downloads-v2/download.pl
==== Sun Grid Engine (SGE) ====
----
SGE is typically used on a computer farm or high-performance computing (HPC) cluster and is responsible for **accepting, scheduling, dispatching, and managing the remote and distributed execution of large numbers of standalone, parallel or interactive user jobs**. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses.
http://wiki.gridengine.info/wiki/index.php/Main_Page [[Basic usage of Grid Engine (commands)]]
==== SLURM: A Highly Scalable Resource Manager ====
SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. https://computing.llnl.gov/linux/slurm/
==== TORQUE Resource Manager ====
TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S http://www.clusterresources.com/pages/products/torque-resource-manager.php
==== Platfrom LSF ====
[[platform_lsf|LSF]] is implemented as a resource manager for the HPC together with SGE.