infoTECH Feature

January 14, 2015

Meet the New InfiniBand - the Data Center's Best Friend

By TMCnet Special Guest
Dr. David Southwell, CVO, Obsidian Strategics

NASA scientists created visco-elastic polyurethane foam to cushion astronauts’ re-entry into Earth’s atmosphere; when the patent was released to the public, it became Tempur-Pedic mattress foam. Teflon was first used in artillery shell fuses and in the production of nuclear material for the Manhattan Project before it became the standard for nonstick pans. The same principle of repurposing is emerging in the computing world as technologies from supercomputers find their way into data center architecture.

What Supercomputers Require Today

Yesterday’s room-sized supercomputers have been replaced by today’s version: huge clusters of servers bound by high-performance networks. Built for massively parallel large-scale simulations, the application work load is distributed across the server nodes which coordinate via messages passed across their shared communications fabric. The server nodes usually feature floating point heavy CPUs and GPU-based math accelerators and enjoy large main memories, but they are essentially just Linux servers.

Most supercomputers attach their storage to the same communications fabric, as is used for inter-processor communication. Storage must also be fast and parallel to facilitate large data set loading and also periodic checkpointing to save simulation state in case of a failure. The interconnect is thus a unified fabric carrying management, compute and storage traffic over a single fiber connection to each node.

In light of budgetary concerns, reducing cost per node is a way to get the most power for the money.

Commodity, standards-based hardware components are preferred for this reason. An open standard called InfiniBand (IB) has been the dominant cluster interconnect since its introduction, with specifications first published by an industry consortium that included Intel (News - Alert), IBM, HP and Microsoft in 1999. The InfiniBand Trade Association provides an informative InfiniBand primer here.

IB has many qualities in its favor, including extreme scalability, high bandwidth (100GBits/s per port),

low latency (sub microsecond end-to-end), extreme scalability and hardware offload that includes a very powerful feature called RDMA (Remote Direct Memory Access). RDMA allows data to flow “zero copy” from one application's memory space to that residing on another server at wire speed, without the intervention of the OS, or even the CPU, allowing data movement to scale with memory speeds not just CPU core speeds (which have stalled).

Changes to the Data Center

InfiniBand and supercomputing are related to data center design in this way: a well-balanced server farm design must balance compute, storage and network performance. Many elements are now converging to expose the legacy 37-year old TCP/IP Ethernet as the weakest link:

  • New fabric topologies are needed for current data center work flow requirements, which tend to strongly emphasize East-West traffic. Ethernet spanning tree limitations preclude efficient implementations such as “fat tree” featuring aggregated trunks between switches.

  • The effect of further multiplying the network performance requirements per socket is achieved as virtualization consolidates multiple virtual machines onto single physical machines. This pushes towards supercomputer-class loading levels.  For instance, a TCP/IP stack running over 1Gb Ethernet could require up to 1GHz worth of CPU – overlay 20 such machines on a single node and even many-core CPUs are saturated by the OS before the application sees a single cycle.
  • Data center managers are replacing rotating storage with Solid State Disks (SSDs) – and not just in their early critical applications such as database indexing and metadata storage.  Legacy NAS interconnects that were able to hide behind tens of milliseconds of rotating disk latency are suddenly found to be hampering SSDs and their microsecond-range response times. SSDs also deliver order of magnitude throughput increases, again stressing older interconnects.
  • Unified fabrics are highly sought after because they minimize network adapters, cables and switches. They improve many system-level metrics such as capital costs, airflow, heat generation, management complexity and the number of channel interfaces per host. Micro- and Blade-form-factor servers cannot afford three separate interfaces per node. Due to its lossy flow control and high latency, TCP/IP Ethernet is not a good match for high performance storage networks.
  • Many-core processors use billions of transistors to tile tens to hundreds of CPU cores per chip - and server chips are trending strongly in this direction.  It is easy to see that networking capability must be proportionately and radically scaled up to maintain architectural balance, or else the cores will forever be waiting on network I/O.

InfiniBand can tackle all of these challenges, while also offering smooth migration paths. For example, via IPoIB, InfiniBand can carry legacy IP traffic at great speed and, while this does not immediately expose all of the protocol’s benefits, it provides a bridge to more efficient implementations that can be rolled out over time.  Furthermore—and contrary to popular misconception—InfiniBand is actually the most cost-effective protocol in terms of $/Gbits/s of any comparable standards-based interconnect technology, and dramatically so if deployed as a unified fabric.

Not Your Father’s InfiniBand

IB’s attributes are clear, including performance and scalability, but does it have the depth needed for production deployments? It is true that the first InfiniBand implementations were limited and lacked proper security. They were only capable of short links between racks by the standard’s precise lossless flow control scheme. But as early adopters and innovators made use of IB, the resulting solutions have overcome these initial barriers, able to travel around the globe with multi-subnet segmentation and strong link encryption. This is not your father’s InfiniBand; organizations adopting this new technology are sure to gain from the ongoing, rapid innovations that the supercomputing community offers.

About the author: Dr. Southwell co-founded Obsidian Research Corporation. Dr. Southwell was also a founding member of YottaYotta, Inc. in 2000 and served as its director of Hardware Development until 2004. Dr. Southwell worked at British Telecom's Research Laboratory at Martlesham Heath in the UK, participated in several other high technology start-ups, operated a design consultancy business, and taught Computer Science and Engineering at the University of Alberta. Dr. Southwell graduated with honors from the University of York, United Kingdom, in 1990 with a M.Eng. in Electronic Systems Engineering and a Ph.D in Electronics in 1993 and holds a Professional Engineer (P.Eng.) designation.

For more on all the latest technology and trends happenings, be sure to register to attend ITEXPO, the business technology event that brings together service providers, enterprises, government agencies, resellers, vendors and developers to demo, discuss and network all the latest innovations changing the marketplace. ITEXPO (News - Alert) is being held January 27-30, 2015, at the Miami Beach Convention Center in Miami, Florida.   Stay in touch with everything happening at the event – follow us on Twitter




Edited by Maurice Nagle
FOLLOW US

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers