Category Archives: HPC

Announcing the HPC Advisory Council China Workshop 2011

The HPC Advisory Council will hold the 2011 China Workshop on October 25th, 2011, in conjunction with the HPC China conference in Jinan, China. The workshop will focus on HPC productivity, and advanced HPC topics and futures, and will bring together system managers, researchers, developers, computational scientists and industry affiliates to discuss recent developments and future advancements in High-Performance Computing.

Last year more than 300 attendees participated in the Advisory Council China Workshop 2010. This year we expect to reach 400 attendees. The preliminary agenda is now posted on the workshop web, as well as call for speakers and for sponsors. AMD, Dell, Mellanox and Microsoft already confirmed their sponsorship and we are grateful for that.

The workshop keynotes presenters are Richard Graham (Distinguished Member of the Research Staff, Computer Science and Mathematics Division, Oak Ridge National Laboratory, USA), Professor Dhabaleswar K. Panda (Ohio State University, USA) and Professor Rafael Mayo-Gual (University Jaume I, Spain). The workshop will feature many interesting topics and distinguished speakers. More info can be found on the workshop web –



HPC Advisory Council ISCnet FDR InfiniBand 56Gb/s World First Demonstration

The HPC Advisory Council, together with ISC, showcased the world’s first FDR 56Gb/s InfiniBand during the ISC’11 conference in Hamburg, Germany on June 20-22. The demonstration was part of the HPC Advisory Council activities of hosting and organizing new technology demonstrations at leading HPC conferences that demonstrate new solutions which will influence future HPC systems in term of performance, scalability and utilization. The 56Gb/s InfiniBand demonstration connected participating exhibitors on the ISC’11 showroom floor as part of the HPC Advisory Council ISCnet network. The ISCnet network provided organizations with fast interconnect connectivity between their booths on the show floor to demonstrate various HPC applications, and new developments and products.

The FDR InfiniBand network included dedicated and distributed clusters as well as a Lustre-based storage system. Multiple applications were demonstrated, including high-speed visualizations. The following HPC Council member organizations have contributed and are participated in the world’s first FDR 56Gb/s InfiniBand ISCnet demonstration: AMD, Corning Cable Systems, Dell, Fujitsu, HP, HPC Advisory Council, MEGWARE, Mellanox Technologies, Microsoft, OFS, Scalable Graphics, Supermicro and Xyratex.

I would like to thank all of the demo participants and you can see the network map below.




HPC Advisory Council Forms Worldwide Centers of Excellence

This week we announced the formation of the HPC Advisory Council Centers of Excellence. The HPC Advisory Council Centers of Excellence will provide local support for the HPC Advisory Council’s programs, local workshops and conferences, as well as host local computing centers that can be used to extend such activities.

“We are pleased to be named as one the inaugural HPC Advisory Council’s Centers of Excellence, covering HPC research, outreach and educational activities within Europe,” said Hussein Nasser El-Harake at the Swiss National Supercomputing Centre who serves as the Director of the HPC Advisory Council Center of Excellence in Switzerland. “As part of the HPC Advisory Council’s Center of Excellence, we look forward to advancing awareness of the beneficial capabilities of HPC to new users.”


HPC|GPU special interest subgroup releasing first results for NVIDIA GPUDirect Technology

The new HPC|GPU subgroup has been working recently to create first best practices around the new technology from NVIDIA – GPUDirect. Here is some background on GPUDirect: the system architecture of a GPU-CPU server requires the CPU to initiate and manage memory transfers between the GPU and the network. The new GPUDirect technology enables Tesla and Fermi GPUs to transfer data to pinned system memory that a RDMA capable network is able to read and send without the involvement of the CPU in the data path. The result is an increase in overall system performance and efficiency by reducing the GPU to GPU communication latency (by 30% as was published by some vendors). The HPC|GPU subgroup is first to release benchmarks results of application using GPUDirect. The application that was chosen for the testing was Amber, a molecular dynamics software package. Testing with 8 nodes cluster demonstrated up to 33% performance increase using GPUDirect. If you want to read more – check out the HPC|GPU page –




HPC Applications Best Practices

Wanted to let you know that we have extended the high-performance applications best practices to:


1. Extend the applications performance, optimization and profiling guidelines to cover nearly 30 different applications, both commercial and open source –


2. We have added the first case using RoCE (RDMA over Converged Ethernet) to the performance, optimization and profiling guidelines page. It is under the same link as in item 1


3. New – installations guides – for those who asked to get a detailed description on where to get the application from, what is needed to be installed, how to install on a cluster, and how to actually run the application – it is now posted under the HPC|Works subgroup – We will be focusing on open source applications, which sometime it challenging to really find this info. At the moment we have installations guides for BQCD, Espresso and NAMD, and more will come in the near future.


If you would like to propose new applications to be covered under the performance, optimization and profiling guidelines, or to be added to the installations guides, please let us know via

Best regards,


HPC Advisory Council Announces 2nd Annual China High-Performance Computing Workshop Program

For those who missed the announcement, our 2nd Annual China High-Performance Computing Workshop will be on October 27th, 2010 in Beijing, China in conjunction with the HPC China National Annual Conference on High-Performance Computing. The Call for presentations as well as workshop sponsorships are now open – The workshop will focus on efficient high-performance computing through best practices, future system capabilities through new hardware, software and computing environments and high-performance computing user experience.

The workshop will be opened with keynote presentations by Prof. Dhabaleswar K. (DK) Panda who leads the Network-Based Computing Research Group at The Ohio State University (USA) and Dr. HUO Zhigang from the National Center for Intelligent Computing (China). The keynotes will be followed by distinguished speakers from the academia and the industry. The workshop will bring together system managers, researchers, developers, computational scientists and industry affiliates to discuss recent developments and future advancements in High-Performance Computing.

And again – Call for Presentations and Sponsorships are now Open, so if you are interested, let us know. For the preliminary agenda and schedule, please refer to the workshop website. The workshop is free to HPC China attendees and to the HPC Advisory Council members. Registration is required and can be made at the HPC Advisory Council China Workshop website.


Gilad Shainer

New system arrived to our HPC center!

Recently we have added new systems into out HPC center, and you see the full list at

The newest system is the “Vesta” system (and you can see Pak Lui, the HPC Advisory Council HPC Center Manager  standing next to it in the picture below). Vesta consist of six Dell™ PowerEdge™ R815 nodes, each with four processors AMD Opteron 6172 (Magny-Cours) which mean 48 Cores per node and 288 cores for the entire system. The networking was provided by Mellanox, and we have plugged two adapters per node (Mellanox ConnectX®-2 40Gb/s InfiniBand adapters). All nodes are connected via Mellanox 36-Port 40Gb/s InfiniBand Switch. Furthermore, each node has 128 GB, 1333 MHz memory to make sure we can really get the highest performance from this system.


Microsoft has provided us with Windows HPC 2008 v3 preview, so we can check the performance gain versus v2 for example. The system is capable of dual boot – Windows and Linux, and is now available for testing. If you would like to get access, just fill the form on the URL above.



In the picture – Pak Lui standing next to Vesta


I want to thank Dell, AMD and Mellanox for providing this system to the council!



Gilad, HPC Advisory Council Chairman

ROI through efficiency and utilization

High-performance computing provides an invaluable role in research, product development and education. It helps accelerate time to market, and provides significant cost reductions in product development and tremendous flexibility. One strength in high-performance computing is the ability to achieve best sustained performance by driving the CPU performance towards its limits. Over the past decade, high-performance computing has migrated from supercomputers to commodity clusters. More than eighty percent of the world’s Top500 compute system installations in June 2009 were clusters. The driver for this move appears to be a combination of Moore’s Law (enabling higher performance computers at lower costs) and the ultimate drive for the best cost/performance and power/performance. Cluster productivity and flexibility are the most important factors for a cluster’s hardware and software configuration.

A deeper examination of the world’s Top500 systems based on commodity clusters shows two main interconnect solutions that are being used to connect the servers for creating those compute powerful systems – InfiniBand and Ethernet. If we divide the systems according to the interconnect family, we will see that the same CPUs, memory speed and other settings are common between the two groups. The only difference between the two groups, besides the interconnect, is the system efficiency, or how many of CPU cycles can be dedicated to the application work, and how many of them will be wasted. The below graph list the systems according to their interconnect setting, and their measured efficiency.


As seen, systems connected with Ethernet achieves an average 50% efficiency, which means that 50% of the CPU cycles are wasted on non-application work or are idle, waiting for data to arrive.  Systems connected with InfiniBand achieve an above 80% efficiency average, which means that less than 20% of the CPU cycles are wasted. Moreover, the latest InfiniBand based systems have demonstrated up to 94% efficiency (the best Ethernet connected systems demonstrated 63% efficiency).

People might argue that the Linpack benchmark is not the best benchmark for measuring parallel application efficiency, and does not fully utilize the network. The graph results are a clear indication that even for the Linpack application, the network does make a difference, and for better parallel application, the gap will be much higher.

When choosing the system setting, with the notion of maximizing return on investment, one needs to make sure no artificial bottlenecks will be created. Multi-core platforms, parallel applications, large databases etc require fast data exchange and lots of it. Ethernet can become the system bottleneck due to latency/bandwidth and CPU overhead due to the TCP/UDP processing (TOE solutions introduce other issues, sometime more complicated, but this is a topic for another blog) and reduce the system efficiency to 50%. This means that half of the compute system is wasted, and just consumes power and cooling. Same performance capability could have been achieved with half of the servers if they were connected with InfiniBand. More data on different application performance, productivity and ROI, can be found at the HPC Advisory Council web site, under content/best practices.

While InfiniBand will demonstrate higher efficiency and productivity, there are several ways to increase Ethernet efficiency. One of them is optimizing the transport layer to provide zero copy and lower CPU overhead (not by using TOE solutions, as those introduce single points of failure in the system). This capability is known as LLE (low latency Ethernet). More on LLE will be discussed in future blogs.

Gilad Shainer HPC Advisory Council Chairman

Cloud computing for HPC?

One of the interesting projects we are dealing with is the feasibility to use cloud computing for high performance computing. I remember a paper on using the Amazon EC2 for HPC, and the conclusion was that some GB of bandwidth are missing between the compute nodes… J  In the past, high-performance computing has not been a good candidate for cloud computing due to its requirement for tight integration between the servers’ nodes via low-latency interconnects.  Moreover, the performance overhead associated with host virtualization, a pre-requisite technology for migrating local applications to the cloud, quickly erodes application scalability and efficiency in an HPC context.  Furthermore, HPC has been slow to adopt virtualization, not only due to the performance overhead, but also because HPC servers generally run fully-utilized, and therefore do not benefit through consolidation.

Not all clouds are the same, nor will be, and while virtualization is needed for enterprise applications, yet for HPC clouds is not a must, and application provisioning can be done on a physical server granularity. Moreover, there are emerging virtualization solutions that reduce the overhead and enable native application performance.

The council had presented some of the first finding from the HPC cloud project at ISC’09 (posted on the advanced topics section at We have submitted a full paper for publication, and hope to post it on the web site soon.

Next phase of the project will be adding the virtualization aspect, in particular Xen and KVM, and explore the effects on application performance, as well as the system utilization and efficiency capabilities.  

Gilad Shainer,
HPC Advisory Council Chairman

Inauguration of 1st European Petaflop Computer in Jülich, Germany

On Tuesday, May 26, the Research Center Jülich reached a significant milestone of German and European supercomputing with the inauguration of two new supercomputers: the supercomputer JUROPA and the fusion machine HPC FF. The symbolic start of the systems were triggered by the German Federal Minister for Education and Research, Prof. Dr. Annette Schavan, the Prime Minister of North Rhine-Westphalia, Dr. Jürgen Rüttgers, and Prof. Dr. Achim Bachem, Chairman of the Board of Directors at Research Center Jülich as well as high-ranking international guests from academia, industry and politics.

JUROPA (which stands for Juelich Research on Petaflop Architectures) will be used Pan-European-wide by more than 200 research groups to run their data-intensive applications. JUROPA is based on a cluster configuration of Sun Blade servers, Intel Nehalem processors, Mellanox 40Gb/s InfiniBand and Cluster Operation Software ParaStation from ParTec Cluster Competence Center GmbH. The system was jointly developed by experts of the Jülich Supercomputing Center and implemented with partner companies Bull, Sun, Intel, Mellanox and ParTec. It consists of 2,208 compute nodes with a total computing power of 207 Teraflops and was sponsored by the Helmholtz Community. Prof. Dr. Dr. Thomas Lippert, Head of Jülich Supercomputing Center, explains the HPC Installation in Jülich in the video below.

HPC-FF (High Performance Computing – for Fusion), drawn up by the team headed by Dr. Thomas Lippert, director of the Jülich Supercomputing Centre, was optimized and implemented together with the partner companies Bull, SUN, Intel, Mellanox and ParTec. This new best-of-breed system, one of Europe’s most powerful, will support advanced research in many areas such as health, information, environment, and energy. It consists of 1,080 computing nodes each equipped with two Nehalem EP Quad Core processors from Intel. Their total computing power of 101 teraflop/s corresponds, at the present moment, to 30th place in the list of the world’s fastest supercomputers. The combined cluster will achieve 300 teraflops/s computing power and will be included in the rating of the Top500 list, published this month at ISC’09 in Hamburg, Germany.

40Gb/s InfiniBand from Mellanox is used as the system interconnect. The administrative infrastructure is based on NovaScale R422-E2 servers from French supercomputer manufacturer Bull, who supplied the compute hardware and the SUN ZFS/Lustre Filesystem. The cluster operating system “ParaStation V5″ is supplied by Munich software company ParTec. HPC-FF is being funded by the European Commission (EURATOM), the member institutes of EFDA, and Forschungszentrum Jülich.

Complete System facts: 3288 compute nodes ; 79 TB main memory; 26304 cores; 308 Teraflops peak performance.

Gilad Shainer,
HPC Advisory Council Chairman