What Is HPC?
High performance computing (HPC) operations use aggregated resources (like clusters of servers, workstations, or devices) to quickly perform computations. HPC systems also leverage parallel processing. This works by connecting individual machines and nodes, and then multiple components of single tasks can run in unison.
An HPC node typically has two or more Central Processing Units (CPUs), and each CPU contains multiple cores (processing chips). Individual nodes have their own memory and storage, which can be shared with the HPC system. Usually an HPC system contains between sixteen and sixty four nodes.
HPC systems often leverage Graphical Processing Units (GPUs), in addition to CPUs. Unlike CPUs, a GPU is built for performing a unique job. The specialized nature of GPUs quicker processing of components of applications or workloads, than that which can be achieved with CPUs alone.
How Does HPC Differ From Traditional Public Cloud Resources?
Cloud HPC is built to handle extremely extensive, and highly complex and specific problems in as little time as possible. GPU flavors, for example, provide graphic cards with highly powerful graphics processing units (GPUs) that are ideal for solving problems with many uniform processes, each of which require quick parallel processing. This is why GPU flavors are often applied for machine learning (ML) and artificial intelligence (AI) purposes.
Another great option is using on-demand, cloud-based field-programmable gate arrays (FPGAs). FPGAs offer greater flexibility. You can individually modify these programmable hardware cards to suit each unique process.
Options for HPC in the Cloud
Azure offers on-demand HPC solutions for enterprises. These services provide great performance especially for organizations already using Microsoft Windows platforms. Azure provides a user-friendly platform that integrates Windows systems with cloud-based HPC workloads. Azure HPC is offered for Software as a Service (SaaS (News - Alert)) customers as well as those using platform as a service (PaaS) offerings.
Amazon Web Services (AWS)
AWS, which is considered a cloud pioneer, has provided cloud HPC since 2006. AWS offers HPC mainly as an Infrastructure as a Service (IaaS) offering, which provides many storage capabilities. A notable capability is the ability to rent the computationally intensive workloads you run. This enables organizations to save huge amounts in billing and minimize waste.
GCP also offers HPC as an IaaS solution, but through a unique pay-per-minute strategy. GCP lets enterprise customers choose between open source systems like Hadoop and Google Cloud Dataflow—each can process and store data through this service. The cost-effective pricing model offered for HPC services enables both enterprises and SMEs to leverage cloud HPC.
Spectrum Computing lets you choose what infrastructure to use, including options for public, private, and hybrid cloud. This enables customers to use a remotely managed flexible system. IBM provides a wide range of out of the box solutions, built for enterprises, such as IBM High Performance Services for HPC, IBM Spectrum HPC, IBM High Performance Services for Analytics and more.
HPC Tips and Tricks
Plan your Cluster Around the HPC Workload
HPC workloads vary in their applications. This means you should configure clusters to optimize your operations. Many HPC applications, for example, break workloads down into multiple tasks, each running simultaneously. In some of these workloads, however, the compute nodes need to communicate more between themselves while processing. This typically requires specialized software and hardware.
The computational requirements of each workload determines the amount of compute nodes the cluster will need, as well as the hardware requirements for each node, and whether any special firmware or software should be installed. Additionally, you need a set of management nodes for keeping your systems running, as well as the required resources for maintaining security and implementing disaster recovery. Try to plan workflow optimization workflows in advance.
Prioritize GPU Configurations
GPUs, on their variety and the methods used for implementation, greatly impact the system. The majority of GPU manufacturers provide software that can integrate with the hypervisor, virtualizing GPUs and making them available to virtual machines (VMs). In the same way, hypervisors determine how virtual GPUs are implemented, how supported workloads run, as well as the GPUs.
GPU virtualization should be addressed individually for each unique purpose. However, configurations are generally applied using the following:
The majority of HPC workloads need to be configured using either 1 VM maps to 1 GPU or 1 VM maps to multiple GPUs, because these two options rely on pass-through technologies. These technologies enable VMs to directly communicate with physical GPUs, cutting any hypervisor overhead.
Take Server Configuration Seriously
Before deploying your HPC applications, you need to configure compute nodes to support a high-demand virtual environment. A critical area you should address is the system BIOS. You should configure each host server with the latest BIOS version, and optimize BIOS settings to run the hypervisor and its virtualized operations.
While you definitely need to configure your other server components, you should understand that HPC settings are never universal. In some cases, for example, you will need to enable memory interleaving, but only apply this in consideration with the system's hardware, any installed hypervisor, and all supported workloads. Take these factors into consideration when configuring all system settings.
Avoid Massive Short Jobs
The scheduler of HPC clusters prioritizes less-longer jobs over massive-short jobs. This is because there is extra overhead for each job, for job output staging as well as resource provision. Stacking many short jobs into one single longer job can help avoid these overhead costs.
Properly Size and Configure VMs
Configuring the right VM size is critical when planning a HPC environment. While some workloads require more CPU power, other workloads need more reserved memory, and others might even need both. When planning, you should decide how many virtual machines you can host on each cluster node, but without compromising the performance of the workload.
You should also figure out which guest operating system (OS) you want installed for each HPC workload, as well as how to configure the OS for maximized performance. Remember to also factor in for security policies and firewall rules.
Cloud HPC can provide organizations with greater computing power, at scale and at cost-effective pricing. However, to properly leverage HPC, you need to ensure you choose the right resources for your workloads, and continuously optimize for optimum performance and billing. Keep a constant eye on your operations and plan in advance.
Author Bio: Farhan Munir
With over 12 years of experience in the technical domain, I have witnessed the evolution of many web technologies, as well as the rise of the digital economy. I consider myself a life-long learner, and I love experimenting with new technologies. I embrace challenges with enthusiasm and outside-of-the-box mindset. I feel it is important to share your experiences with the rest of the world - in order to pass on the knowledge or let other folks learn from your mistakes or successes. In my spare time, I like to travel and photograph the world.