Latest News:

HPC Advisory Council Best Practices

The HPC-AI Advisory Council provides best practices, that through experience and research, have shown to improve clustering and applications productivity. The application best practices in this page include wide range of CPU architectures from over a decade of benchmarking for possible application comparison.
Visit our HPC-AI Community for additional hands-on procedures and discussions.

Latest updates:

Cases: Abaqus

The Abaqus software suite consists of three core products: Abaqus/Standard, Abaqus/Explicit and Abaqus/CAE. Abaqus/Standard is a general-purpose solver using a traditional implicit integration scheme to solve finite element analyses. Abaqus/Explicit uses an explicit integration scheme to solve highly nonlinear transient dynamic and quasi-static analyses. Abaqus/CAE provides an integrated modeling (preprocessing) and visualization (post-processing) environment for the analysis products. Abaqus is used in the automotive, aerospace, and industrial product industries. The product is popular with academic and research institutions due to the wide material modeling capability, and the program's ability to be customized. Abaqus also provides a good collection of multiphysics capabilities, such as coupled acoustic-structural, piezoelectric, and structural-pore capabilities, making it attractive for production-level simulations where multiple fields need to be coupled. The presentation provides information on Abaqus performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases:  ABySS

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, ABySS (Assembly By Short Sequences) was developed. ABySS is a parallel, paired-end sequence assembler designed for short reads, capable of assembling larger genomes and implemented using MPI. ABySS was developed at Canada's Michael Smith Genome Sciences Centre. The presentation provides information on ABySS performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases:  AcuSolve

AcuSolve is a leading general-purpose finite element-based Computational Fluid Dynamics (CFD) flow solver with superior robustness, speed, and accuracy. AcuSolve can be used by designers and research engineers with all levels of expertise, either as a standalone product or seamlessly integrated into a powerful design and analysis application. With AcuSolve, users can quickly obtain quality solutions without iterating on solution procedures or worrying about mesh quality or topology. The presentation provides information on AcuSolve performance and scalability, optimization options and profiling.

Cases:  Amber

Amber refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs) and a package of molecular simulation programs which includes source code and demos. The current version of the code is Amber version 11, which is distributed by UCSF. Amber is one of the most widely used program for biomolecular studies, with an extensive user base. It is being used to for classical molecular dynamics simulations (NVT, NPT, etc), force field for biomolecular simulations, combined Quantum Mechanics/Molecular Mechanics (QM/MM) implementation and more. The presentation reviews performance, profiling and optimization techniques for Amber.

Cases:  AMG

AMG is a parallel algebraic multi-grid solver for linear systems on unstructured grids. The AMG2013 driver provides linear systems for various 3D problems. AMG2013 is written in ISO-C.  It is an SPMD code which uses MPI and OpenMP threading within MPI tasks. Parallelism is achieved by data decomposition by subdividing the grid into logical P x Q x R (in 3D) chunks of equal size. In 2017, the Lawrence Livermore National Laboratory (LLNL) updated the AMG version under their github.

Cases:  AMR - AMR - Adaptive Mesh Refinement and MiniAMR - 3D stencil calculation with Adaptive Mesh Refinement (AMR)

AMR is a collection of three applications for solving a wide variety of problems that benefit from grids with adaptive, inhomogeneous spatial resolution. AMR is the product of the Center for Computational Sciences and Engineering at Lawrence Berkeley National Laboratory
This particular benchmark makes use of the HyperClaw application for solving a gasdynamic problem; it is written primarily in C++. The presentation provides information on AMR performance, potential optimizations, compliers and profiling. We would like to acknowledge the DoD (Department of Defense) High Performance Computing Modernization Program for providing access to the FY 2009 benchmark suite and John Bell from Lawrence Berkeley Laboratory for developing the application.

MiniAMR is a mini application for 3D stencil calculation with Adaptive Mesh Refinement (AMR).

Cases: b_eff

The effective bandwidth beff measures the accumulated bandwidth of the communication network of parallel and/or distributed computing systems. Several message sizes, communication patterns and methods are used. The algorithm uses an average to take into account that short and long messages are transferred with different bandwidth values in real applications.

For more details of the test, refer to here.

Cases: BiFrost

BiFrost applications is being used for simulating stellar atmospheres. To understand the details of the atmosphere it is necessary to simulate the whole atmosphere since the different layers interact strongly. These physical regimes are very diverse and it takes a highly efficient massively parallel numerical code to solve the associated equations. The code is subjected not in the public domain, more high level details on the application can be found here.

Cases: BQCD

BQCD (Berlin Quantum ChromoDynamics program) is a hybrid Monte-Carlo code that simulates Quantum Chromodynamics with dynamical standard Wilson fermions. The computations take place on a four-dimensional regular grid with periodic boundary conditions. The kernel of the program is a standard conjugate gradient solver with even/odd pre-conditioning. As a consequence all arrays are stored in an even/odd ordered fashion and the four indices are collapsed into a single one. The access to neighbours is handled by lists. The parallelization is done by a regular grid decomposition in the highest 3 dimensions. The values from the boundaries of the neighbouring processors are stored in the same array as the local values. The local values have indices 1, ..., volume/2. The boundary values have indices > volume/2. The memory for the arrays is dynamically allocated during initialization. Apart from rounding errors the program gives identical results for any grid decomposition (Author of the code: Dr. Hinnerk Stueben). The presentation reviews performance, profiling and optimization techniques for BQCD.

Cases: BSMBench

BSMBench is an open source supercomputer benchmarking tool derived from simulation code used for studying novel strong interactions in particle physics. Over traditional parallel benchmarking tools (e.g. Linpack), BSMBench has the advantage of being able to tune the ratio of communication over computation. Three examples are provided that show the performance of the system for a problem that is computationally dominated, a problem that is dominated by communication and a problem in which communication and computational requirements are balanced. The presentation provides information on BSMBench performance capabilities and the effect that different HPC cluster components (hardware and software) have on it.

Cases: CAM-SE

CAM-SE stands for Community Atmosphere Model – Spectral Element. It is widely used by climate scientists as the default atmospheric model in Community Earth System Model (CESM). It is also used for climate projections in Inter-governmental Panel on Climate Change (IPCC). CAM-SE is comprised of a dynamic core and a physics package. The dynamic core is called HOMME (High-Order Methods Modeling Environment), which sSolves for wind, energy and mass, which models the stratified, compressible, hydrostatic Euler equations on the sphere with the added multi-scale physics representing climate-related processes. The application is parallelized in MPI and hybrid OpenMP.


CASTEP is a full-featured materials modelling code based on a first-principles quantum mechanical description of electrons. It uses the robust methods of a plane-wave basis set and pseudo-potentials. Using density functional theory, it can simulate a wide range of properties of materials proprieties including energetics, structure at the atomic level and vibrational properties. In particular it has a wide range of spectroscopic features that link directly to experiment, such as infra-red and Raman spectroscopies, NMR, and core level spectra. The code is developed by the Castep Developers Group (CDG) who are all UK based academics.

Cases: CCSM 4.0

CCSM is a coupled climate model for simulating the earth's climate. CCSM composed of four separate models: atmosphere (CAM4), ocean (POP2), land surface (CLM4) and sea-ice (CICE4). CCSM was developed in cooperation with NFS, DOE, NASA, and NCAR. The presentation provides information on CCSM productivity, how to optimize and compile the code for highest performance and efficiency, and the effect HPC cluster components (HW and SW) have on CCSM performance.

Cases: CESM

CESM (Community Earth System Model) is a coupled climate model for simulating the earth's climate system. It is composed of several models (Earth's atmosphere, Ocean, Land surface, Sea-ice). CESM allows researchers to conduct fundamental research into the earth's past, present and future climate states. CESM1.0.3 supersedes CCSM4.0. The presentation provides performance, scalability and profiling overview of CESM.

Cases: ChaNGa

The cosmological simulation framework "ChaNGa" is a collaborative project with Prof. Thomas Quinn (University of Washington: N-Body Shop) supported by the NSF. ChaNGa (Charm N-body GrAvity solver) is a code to perform collisionless N-body simulations and can perform cosmological simulations with periodic boundary conditions in comoving coordinates or simulations of isolated stellar systems. ChaNGa can include hydrodynamics using the Smooth Particle Hydrodynamics (SPH) technique. It uses a Barnes-Hut tree to calculate gravity, with hexadecapole expansion of nodes and Ewald summation for periodic forces. Timestepping is done with a leapfrog integrator with individual timesteps for each particle. ChaNGa's uses dynamic load balancing scheme of the Charm++ runtime system in order to obtain good performance on massively parallel systems

Cases:  CFX

ANSYS CFX is a high-performance, general purpose CFD program that has been applied to solve wide-ranging fluid flow problems. At the heart of ANSYS CFX is its advanced solver technology, the key to achieving reliable and accurate solutions quickly and robustly. The highly parallelized solver is the foundation for an abundant choice of physical models to capture virtually any type of phenomena related to fluid flow.

Cases: COSMO

COSMO stands for Consortium for Small-scale Modeling (COSMO), which was formed in 1998 to develop, improve and maintain a non-hydrostatic limited-area atmospheric model. COSMO is a nonhydrostatic limited-area atmospheric prediction model that has been designed for both operational numerical weather prediction (NWP) and various scientific applications on the meso-β and meso-γ scale. COSMO is based on the primitive thermo-hydrodynamical equations describing compressible flow in a moist atmosphere. The model equations are formulated in rotated geographical coordinates and a generalized terrain following height coordinate. The presentations will describe COSMO profiling, performance and suggest ways for performance improvments.

Cases: CP2K

CP2K is a freely available (GPL) program, written in Fortran, to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and classical pair and many-body potentials. CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations, sources are freely available and actively improved. It is therefore easy to give the code a try, and to make modifications as needed. The presentation includes performance benchmarks and profiling of CP2K.

Cases: (CPMD) Car-Parrinello Molecular Dynamic

Car-Parrinello Molecular Dynamics (CPMD) is an ab initio electronic structure and molecular dynamics (MD) simulation software that provides a powerful way to perform molecular dynamic simulations from first principles, using a plane wave/pseudopotential implementation of density functional theory. The CPMD code has been used to examine systems including protein active sites, liquid-surface interactions, and surface catalysts. The ability to examine interactions on the nanoscale makes this approach ideal for studying systems where chemical and biological interactions are critical. The presentations provides recommendations for improving CPMD performance, scalability, and productivity as measured in jobs per day.

Cases:  Dacapo

Dacapo is a total energy program based on density functional theory that uses a plane wave basis for the valence electronic states. It describes core-electron interactions with Vanderbilt ultrasoft pseudo-potentials and performs molecular dynamics / structural relaxation simultaneous. Dacapo is an open-source code, maintained by Technical University of Denmark. The presentation provides information on Dacapo performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases:  Desmond

Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. The code uses novel parallel algorithms and numerical techniques to achieve high-performance and accuracy on platforms containing a large number of processors, but may also be executed on a single computer. The presentation provides information on Desmond performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases:  DL-POLY

DL-POLY is a general purpose classical molecular dynamics simulation software, developed at Daresbury Laboratory by I.T. Todorov and W. Smith. DL_POLY general design provides scalable performance from a single processor workstation to a high performance parallel computer. It can be compiled a parallel application code, provided an MPI2 instrumentation is available on the parallel machine. DL_POLY offers fully parallel I/O as well as a netCDF alternative (HDF5 library dependence) to the default ASCII trajectory file. It is supplied in source form under license. The presentation provides information on DL-POLY performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases: Eclipse Oil and Gas Reservoir Simulation

Reservoir simulation is a core technology used for most of the decisions undertaken in the upstream oil industry to predict plateau levels of fields, calculate the number of wells to be drilled, select well locations, estimate facility requirements, calculate reserves depletions, and design reservoir management strategies for recovering more oil and gas.

Schlumberger ECLIPSE reservoir simulation is one of the most used software solutions that allows engineers to predict and manage fluid flow more efficiently, with greater insight and better accurate modeling. The presentation provides recommendations for improving Eclipse performance, scalability, and productivity as measured in jobs per day, by exploring and profiling the software on HPC clusters.

Cases: FLOW-3D

FLOW-3D is a powerful and highly-accurate CFD software which provides engineers valuable insight into many physical flow processes. FLOW-3D is a standalone, all-inclusive CFD package, includes an integrated GUI that ties components from problem setup to post-processing.


Fluent is a leading CFD application from ANSYS that is being used for solving fluid flow problems. The broad physical modeling capabilities of Fluent have been applied to industrial applications ranging from air flow over an aircraft wing to combustion in a furnace, from bubble columns to glass production, from blood flow to semiconductor manufacturing, from clean room design to wastewater treatment plants.

Cases: GADGET-2

GADGET-2 is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory. GADGET uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use. GADGET computes gravitational forces with a hierarchical tree algorithm (optionally in combination with a particle-mesh scheme for long-range gravitational forces) and represents fluids by means of smoothed particle hydrodynamics (SPH). The code can be used for studies of isolated systems, or for simulations that include the cosmological expansion of space, both with or without periodic boundary conditions. In all these types of simulations, GADGET follows the evolution of a self-gravitating collisionless N-body system, and allows gas dynamics to be optionally included. Both the force computation and the time stepping of GADGET are fully adaptive, with a dynamic range which is, in principle, unlimited. The presentation provides info on GADGET performance, profiling and means for optimizations.

Cases: Graph500

The Graph500 is a rating of supercomputer systems, which focused on data intensive loads. The Graph500 Benchmark is based on a breadth-first search in a large undirected graph, to model of Kronecker graph with average degree of 16. The benchmark contains two computation kernels in the benchmark: (1) 1st kernel is to generate the graph and compress it into sparse structures CSR or CSC. The 2nd kernel does a parallel BFS search of some random vertices.

Case: GRID

GRID is a new physics code base. It is a data parallel C++ mathematical object library (, developed by Peter Boyle, Guido Cossu, Antonin Portelli and Azusa Yamaguchi at the University of Edinburgh. It is written in C++ and has extensive use of templates to allow for high-level abstractions.


GROMACS (GROningen MAchine for Chemical Simulations) is a molecular dynamics simulation package, primarily designed for biochemical molecules like proteins, lipids and nucleic acids. There is an ongoing development to extend GROMACS with interfaces both to Quantum Chemistry and Bioinformatics/databases. GROMACS is open source software released under the GPL. The presentation provides information on GROMACS performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases: Himeno

The Himeno  benchmark was created by Dr. Ryutaro Himeno, Director of the Advanced Center for Computing and Communication, RIKEN, Japan. The benchmark goal is to  evaluate performance of incompressible fluid analysis code. This benchmark program takes measurements to proceed major loops in solving the Poisson's equation solution using the Jacobi iteration method. Being the code very simple and easy to compile and to execute, users can measure actual speed (in MFLOPS) immediately. The presentation provides information on the benchmark performance, profiling and scalability.

Cases: HIT3D

HIT3D is a pseudo-spectral DNS code for simulation of homogeneous isotropic incompressible turbulence in 3-dimensional space. It is aspiring to be a standard code for DNS of isotropic homogeneous turbulence in triple-periodic box. The features of HIT3D include MPI framework, currently has been tested for Open MPI and MVAPICH on variety of NSF and DOE clusters. It also includes FFTW3 is used for Fourier transforms, as well as Lagrangian particles (tracers) to gather Lagrangian statistics. HIT3D is able to perform Large-Eddy Simulation (LES), with several models are implemented. The code is released under the GNU Public License.

Cases: HOOMD-blue

HOOMD-blue stands for Highly Optimized Object-oriented Many-particle Dynamics -- Blue Edition. It performs general purpose particle dynamics simulations on a single workstation, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many processor cores on a fast cluster. Object-oriented design patterns are used in HOOMD-blue so it versatile and expandable. Various types of potentials, integration methods and file formats are currently supported, and more are added with each release. The code is available free and open source, so anyone can write a plugin or change the source to add additional functionality. The HOOMD-blue development effort is led by the Glotzer group at the University of Michigan. Many groups from different universities have contributed code that is now part of the HOOMD-blue main package.

Cases: HPCC

HPC Challenge is a benchmark suite that measures a range of memory access patterns. The HPCC consists of basically 7 set of benchmarks: HPL (High Performance LINPACK) is a benchmark which measures the floating point rate of execution for solving a linear algebra. DGEMM measures the floating point rate of execution of DP matrix to matrix multiplication. STREAM is a simple synthetic benchmark program that measures sustainable memory bandwidth. PTRANS (parallel matrix transpose) is a test that exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network. RandomAccess measures the rate of integer random updates of memory (GUPS). FFT (Fast Fourier Transform package) measures the floating point rate of execution of DP complex one-dimensional Discrete Fourier Transform (DFT). Communication bandwidth and latency is a test suite that measures the latency and bandwidth of a number of simultaneous communication patterns. It is based on the b_eff (effective bandwidth benchmark).

Cases: HPCG

HPCG is a High Performance Preconditioned CG solver benchmark that performs a finite number of symmetric Gauss-Seidel preconditioned conjugate gradient iterations using double precision floating point values. The benchmark is composed of computations and data access patterns more commonly found in scientific applications, and the expectation is to drive computer system design and implementation in directions that will better impact performance improvement.

Cases: HYCOM

HYCOM is a primitive equation ocean general circulation model, evolved from the Miami Isopycnic-Coordinate Ocean Model. HYCOM provides the capability of selecting several different vertical mixing schemes for the surface mixed layer and the comparatively weak interior diapycnal mixing.

HYCOM is open source and jointly developed bythe University of Miami, Los Alamos National Laboratory, and the Naval Research Laboratory of Physics. The presentation provides information on performance optimization and application profiling. The Council would like to thank the DoD High Performance Computing Modernization Program for providing the application and benchmark cases.

Cases: ICON

ICON (ICOsahedral Non-hydrostatic General Circulation Model) is a new development initiated by the Max Planck Institute for Meteorology (MPI-M) and the Deutscher Wetterdienst (DWD). The goal of ICON is to develop a new generation of general circulation models for the atmosphere and the ocean in a unified framework. The ICON dynamical core solves the fully compressible non-hydrostatic equations of motion for simulations at very high horizontal resolution. The discretization of the continuity and tracer transport equations will be consistent so that mass of air and its constituents are conserved, which is a requirement for atmospheric chemistry. Furthermore, the vector invariant form of the momentum equation will be used, and thus, vorticity dynamics will emphasized. The presentations provide performance profiling for ICON, as well as consideration for performance optimizations.  

Cases: Lattice QCD

Lattice Quantum Chromo Dynamics calculations are solving fundamental problems in particle and nuclear physics with large-scale computer calculations. The presentation includes performance evaluation and MPI profiling information.

Cases:  LAMMPS

LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS is distributed as an open source code under the terms of the GPL. LAMMPS is distributed by Sandia National Laboratories, a US Department of Energy laboratory. Funding for LAMMPS development has come primarily from DOE (OASCR, OBER, ASCI, LDRD, Genomes-to-Life). The presentation provides information on LAMMPS performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases: (LS-DYNA) Automotive Crash Simulation

One of the most demanding applications of automotive design is crash simulation (full-frontal, offset-frontal, angle-frontal, side-impact, rear-impact and more). Crash simulations, while performed very early in the development process, are validated very late in the development process once the vehicle is completely built. The more sophisticated and complex the simulation, the more parts and details can be analyzed. Automotive makers increase their dependency for car crash simulations throughout the design process while reducing the need for real prototypes, thus achieving faster time to market with less cost associated with the design phase.

LS-DYNA is a general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems. It is widely used in the automotive industry for crashworthiness, occupant safety and metal forming and also for aerospace, military and defense and consumer products. The presentation provides a deep analysis on LS-DYNA performance and scalability on HPC clusters, and provide recommendations for improving its productivity.

Cases: MetaComp ICFD++

MetaComp ICFD++ is a part of MetaComp's CFD software suite ICFD++ can be used to simulate compressible and incompressible fluids and flows, unsteady and steady flows, large range of speed regimes including low speeds through subsonic, transonic, supersonic and hypersonic speeds, laminar and turbulent flows, various equations of state and more. For more info, see

Cases: miniFE

miniFE is a Finite Element mini-application which implements kernels that are the representative of the implicit finite-element applications. The application assembles a sparse linear-system from the steady-state conduction equation on a brick-shaped problem domain of linear 8-node hex elements. With that it solves the linear-system using a simple un-preconditioned conjugate-gradient (CG) algorithm.

Cases: MILC (MIMD Lattice Computation)

MILC is a QCD code developed by the MIMD Lattice Computation (MILC) collaboration. MILC performs large scale numerical simulations to study quantum chromodynamics (QCD), the theory of the strong interactions of subatomic physics. MILC is publicly available for research purposes and distributed under GNU General Public License.

Cases: MSC Nastran

MSC Nastran is a widely used Finite Element Analysis (FEA) solver that is being used for simulating stress, dynamics, or vibration of real-world complex systems. Nearly every spacecraft, aircraft, and vehicle designed in the last 40 years has been analyzed using MSC Nastran. The presentation provides information on MSC Nastran performance and efficiency, ways for productivity optimizations and the effect HPC cluster components (HW and SW) have on MSC Nastran performance.

Cases: MR Bayes

MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees. The presentation gives a summary of MrBayes performance, scalability, profiling and ways for optimizations.

Cases: (MM5) The Fifth-Generation Mesoscale Model

The Fifth-Generation NCAR / Penn State Mesoscale Model (MM5) is a limited-area, nonhydrostatic or hydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale and regional-scale atmospheric circulation. It has been developed at Penn State and NCAR as a community mesoscale model. Mesoscale Meteorology is the study of weather systems smaller than synoptic scale systems but larger than microscale and storm-scale cumulus systems (horizontal dimensions generally range from around 5 kilometers to several hundred kilometers).

The presentation provides information on MM5 performance capabilities and the effect of different HPC cluster components (HW and SW) on it, as well and power aware usage model.

Cases: (MPQC)The Massively Parallel Quantum Chemistry Program

MPQC computes properties of atoms and molecules from first principles using the time independent Schrödinger equation. The presentation include performance comparisons between different interconnect technologies, productivity results, power aware simulations and performance results with CPU frequency scaling.

Cases: (NAMD) Molecular Dynamics

NAMD is a parallel molecular dynamics code designed for high-performance simulations of large biomolecular systems and scales to hundreds of processors on high-end parallel platforms. NAMD was developed by the joint collaboration of the Theoretical and Computational Biophysics Group (TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at Urbana-Champaign, and is distributed free of charge with source code. The presentation provides information on NAMD performance capabilities and the effect of different HPC cluster components (HW and SW) on NAMD.

Cases: Nekbone

Nekbone is the basic structure and contains the user interface of the extensive Nek5000 software which is a high order, incompressible Navier-Stokes solver based on the spectral element method. Nekbone, on the other hand, solves a Helmholtz equation in a box, using the spectral element method. It solves a standard Poisson equation using a conjugate gradient (CG) iteration with a simple preconditioner on a block or linear geometry. It also exposes the principal computational kernel to reveal the essential elements of the algorithmic-architectural coupling that is pertinent to Nek5000.

Cases: NEMO

NEMO (Nucleus for European Modeling of the Ocean) is a state-of-the-art modeling framework for oceanographic research, operational oceanography seasonal forecast and climate studies. NEMO includes 4 major components: the blue ocean (ocean dynamics, NEMO-OPA), the white ocean (sea-ice, NEMO-LIM), the green ocean (biogeochemistry, NEMO-TOP) and
the adaptative mesh refinement software (AGRIF). NEMO is used by a large community: 240 projects in 27 countries (14 in Europe, 13 elsewhere), 350 registered users (numbers for year 2008). The presentation provides performance and optimizations options for NEMO.

Cases: NEMO5

NNEMO5 is the 5th edition of NanoElectronics MOdeling (NEMO) Tools of the Klimeck group. NEMO5 incorporates concepts and insights from the development of NEMO-1D, NEMO-3D, NEMO-3D-Peta and OMEN. The core capabilities lie in the atomic-resolution calculation of nanostructure properties. NEMO5 supports strain relaxation, phonon modes, electronic structure using the tight-binding model, self-consistent Schr?dinger-Poisson calculations, and quantum transport.

Cases: NWChem

NWChem is a computational chemistry package developed by the Molecular Sciences Software group of the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL). NWChem provides many methods to compute the properties of molecular and periodic systems using standard quantum mechanical descriptions of the electronic wavefunction or density. NWChem has the capability to perform classical molecular dynamics and free energy simulations and these approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations. The presentation includes performance data, comparisons between mathematical libraries, and power/performance results.

Cases: Octopus

Octopus is a pseudopotential real-space package aimed at the simulation of the electron-ion dynamics of one-, two-, and three-dimensional finite systems subject to time-dependent electromagnetic fields. The program is based on time-dependent density-functional theory (TDDFT) in the Kohn-Sham scheme. All quantities are expanded in a regular mesh in real space, and the simulations are performed in real time. The program has been successfully used to calculate linear and non-linear absorption spectra, harmonic spectra, laser induced fragmentation, etc. of a variety of systems. The presentation will review Octopus performance and ways to optimize its scalability.

Cases: OpenAtom

OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method. OpenAtom is written using the Charm++ parallel programming framework. It runs on a variety of architectures like PowerPC, Opteron and Intel-based systems.  The presentation provides information on OpenAtom productivity, how to optimize and compile the code for highest performance and efficiency, and the effect HPC cluster components (HW and SW) have on OpenAtom performance.

Cases: OpenFOAM

OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD software package produced by a commercial company, OpenCFD Ltd. It has a large user base across most areas of engineering and science, from both commercial and academic organizations. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics. The presentation provides information on OpenFOAM performance, how to optimize and compile the code for highest performance and efficiency, and the effect HPC cluster components (HW and SW) have on OpenFOAM performance.

Cases: OpenMX

OpenMPX (Open source package for Material eXplorer) is a software package for nano-scale material simulations based on density functional theories (DFT), norm-conserving pseudopotentials, and pseudo-atomic localized basis functions. Since the code is designed for the realization of large-scale ab initio calculations on parallel computers, it is anticipated that OpenMX can be a useful and powerful tool for nano-scale material sciences in a wide variety of systems such as bio-materials, carbon nanotubes, magnetic materials, and nanoscale conductors. The following presentation provides information on OpenMX performance, profiling and ways for optimizations.

Cases: OptiStruct

Altair® OptiStruct® is an industry proven, modern structural analysis solver for linear and non-linear structural problems under static and dynamic loadings. It is the market-leading solution for structural design and optimization. It helps designers and engineers to analyze and optimize structures, optimize for strength, durability and NVH (Noise, Vibration, Harshness) characteristics. It helps to rapidly develop innovative, lightweight and structurally efficient designs. It is based on finite-element and multi-body dynamics technology.


PARATEC - PARAllel Total Energy Code performs ab-initio quantum-mechanical total energy calculations using pseudopotentials and a plane wave basis set. It is designed to run on massively parallel computing platforms and clusters and was developed through a joint collaboration between LBNL, the Université Pierre et Marie CURIE, the University of Montreal and the University of Cambridge. The presentation provides information on PARATEC productivity, how to optimize and compile the code for highest performance and efficiency, and the effect HPC cluster components (HW and SW) have on PARATEC performance.

Cases:  Pretty Fast Analysis (PFA)

PFA is a software suite for analyzing large-scale molecular dynamics (MD) simulation trajectory data. PFA reads either CHARMM or AMBER style topology/trajectory files as input, and its analysis routines can scale up to thousands of compute cores or hundreds of GPU nodes with either parallel or UNIX file I/O. PFA has dynamic memory management, and each code execution can perform a variety of different structural, energetic, and file manipulation operations on a single MD trajectory at once. The code is written in a combination of Fortan90 and C, and its GPU kernels are written with NVIDIA's CUDA API to achieve maximum GPU performance. PFA is produced by research staff at the Temple University Institute for Computational Molecular Science. The presentation provides information on PFA performance and scalability.


PFLOTRAN is an application for modeling multiscale-multiphase-multicomponent subsurface reactive flows using advanced computing. The presentation provides information on PFLOTRAN performance, scalability, optimization scenarios and the performance acceleration that MPI collectives offloads technologies can bring to PFLOTRAN.

Cases: Quantum ESPRESSO

Quantum ESPRESSO stands for Open Source Package for Research in Electronic Structure, Simulation, and Optimization. It is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves and pseudopotentials (both norm-conserving and ultrasoft). The presentation provides information on Quantum ESPRESSO productivity, how to optimize and compile the code for highest performance and efficiency, and the effect HPC cluster components (HW and SW) have on Quantum ESPRESSO performance.


Altair® RADIOSS® is a leading structural analysis solver for highly non-linear problems under dynamic loadings. It is highly differentiated for scalability, quality and robustness, and consists of features for multiphysics simulation and advanced materials such as composites. RADIOSS is used across all industry worldwide to improve the crashworthiness, safety, and manufacturability of structural designs. The presentation includes profiling and performance characteristics of the application.

Cases: Relion

Relion (REgularized LIkelihood OptimizatioN) is an open-source program for the refinement of macromolecular structures by single-particle analysis of electron cryo-microscopy (cryo-EM) data. RELION (REgularized LIkelihood OptimizatioN) implements an empirical Bayesian approach for analysis of electron cryo-microscopy (Cryo-EM). RELION provides refinement methods of singular or multiple 3D reconstructions as well as 2D class averages. RELION is an important tool in the study of living cells. is a leading structural analysis solver for highly non-linear problems under dynamic loadings. It is highly differentiated for scalability, quality and robustness, and consists of features for multiphysics simulation and advanced materials such as composites. RADIOSS is used across all industry worldwide to improve the crashworthiness, safety, and manufacturability of structural designs. The presentation includes profiling and performance characteristics of the application.

Cases: RFD tNavigator

tNavigator is developed by the research and product development teams of Rock Flow Dynamics. It is designed for running dynamic reservoir simulations on engineers’ laptops, servers, and HPC clusters. Written in C++ and designed from the ground up to run parallel acceleration algorithms on multicore and many core, shared and distributed memory computing systems. tNavigator employs Qt graphical libraries, which makes the system true multiplatform. By taking advantage of the latest computing technologies like NUMA, Hyper-Threading, MPI/SMP hybrids, the performance of tNavigator by far exceeds the performance of any industry standard dynamic simulation tools.

Cases: SNAP

SNAP stands for SN (Discrete Ordinates) Application Proxy. It serves as a proxy application to model the performance of a modern discrete ordinates neutral particle transport application. SNAP is modeled off the LANL code which is called PARTISN. PARTISN solves the Linear Boltzmann Transport Equation (TE) which is a governing equation for determining the number of neutral particles in a multi-dimensional phase space. SNAP mimics the computational workload, memory requirements, and communication patterns of PARTISN.


SPECFEM3D simulates seismic wave propagation in sedimentary basin. It can be used to simulate seismic wave propagation in complex three-dimensional geological models such as: anisotropy, attenuation, fluid-solid interfaces, rotation and self-gravitation, and crustal and mantle models. The package is written in Fortran90 and based on MPI. SPECFEM3D is open source developed by Dimitri Komatitsch at University of Pau, France, California Institute of Technology and the Princeton University. The presentation provides information on SPECFEM3D performance capabilities and the effect that different HPC cluster components (HW and SW) have on it.

Cases: STAR-CCM+ and STAR-CD

CD-adapco is a leading global provider of full-spectrum engineering simulation (CAE) solutions for fluid flow, heat transfer and stress. CD-adapco core products are the technology-leading simulation packages, STAR-CCM+ and STAR-CD. STAR-CCM+ is an engineering process oriented Computational Fluid Dynamics tool that delivers the latest CFD technology in a single integrated environment. STAR-CD is an integrated platform for performing powerful multi-physics simulations, unrivalled in its ability to tackle problems involving multi-physics and complex geometries. The presentation provides information on both STAT-CD and STAR-CCM+ performance capabilities and the effect different HPC cluster components (HW and SW) has on it, as well as a power aware and productivity aware usage model.

Cases: VASP

VASP (Vienna Ab-initio Simulation Package) performs ab-initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set. The code is written in FORTRAN 90 with MPI support. Access to the code may be given by a request via the VASP website. The presentation provides information on VASP performance capabilities and the effect that different HPC cluster components (hardware and software) have on it.

The open64.diff file below contains the code changes for compiling VASP with the Open64 compiler. It works with Open64 4.5.2 compilers, ACML 5.2.0, and ScaLaPACK 2.0.2. There are some changes to the interface blocks of certain modules. It runs at comparable performance to the Intel compiler. One will need to run a command like “patch -p1 < open64.diff” to read the changes and that command will apply the changes.

The open64.makefile.diff is an example of the Makefile used to compile the build with VASP version 5.2.2. One will need to modify the directory paths in the Makefile and then run “make” to compile.

VASP Performance Benchmarks and Profiling:

Cases: Virtual Performance Solution (VPS)

VPS is a software package developed by the ESI Group. The product is originated from the well-known CAE modeling application PAM-CRASH. The application of VPS is primarily used in the automotive industry, which the application is used for crash simulation and designing of occupant safety systems. It simulates the performance of a proposed vehicle design and evaluate the potential for injury to occupants in multiple crash scenarios.

Cases: (WRF) The Weather Research and Forecast Model

The Weather Research and Forecast (WRF) Model is a fully functioning modeling system for atmospheric research and operational weather prediction communities. With an emphasis on efficiency, portability, maintainability, scalability and productivity, WRF has been successfully deployed over the years on a wide variety of HPC clustered compute nodes connected with high speed interconnects - the most used system architecture for high-performance computing. As such, understanding WRF dependency on the various clustering elements, such as the CPU, interconnects and the software libraries are crucial for enabling efficient predictions and high productivity. Our results identify WRF’s communication-sensitive points and demonstrate WRF’s dependency on high-speed networks and fast CPU-to-CPU communication. Both factors are critical to maintaining scalability and increasing productivity when adding cluster nodes. We conclude with specific recommendations for improving WRF performance, scalability, and productivity as measured in jobs per day. Because proprietary hardware and software can quickly erode cluster architecture’s favorable economics, we have restricted our investigation to standards based hardware and open source software readily available to typical research institutions.

For questions or comments, please contact