NASA scientists created visco-elastic polyurethane foam to cushion astronauts’ re-entry into Earth’s atmosphere; when the patent was released to the public, it became Tempur-Pedic mattress foam. Teflon was first used in artillery shell fuses and in the production of nuclear material for the Manhattan Project before it became the standard for nonstick pans. The same principle of repurposing is emerging in the computing world as technologies from supercomputers find their way into data center architecture.
What Supercomputers Require Today
Yesterday’s room-sized supercomputers have been replaced by today’s version: huge clusters of servers bound by high-performance networks. Built for massively parallel large-scale simulations, the application work load is distributed across the server nodes which coordinate via messages passed across their shared communications fabric. The server nodes usually feature floating point heavy CPUs and GPU-based math accelerators and enjoy large main memories, but they are essentially just Linux servers.
Most supercomputers attach their storage to the same communications fabric, as is used for inter-processor communication. Storage must also be fast and parallel to facilitate large data set loading and also periodic checkpointing to save simulation state in case of a failure. The interconnect is thus a unified fabric carrying management, compute and storage traffic over a single fiber connection to each node.
In light of budgetary concerns, reducing cost per node is a way to get the most power for the money.
Commodity, standards-based hardware components are preferred for this reason. An open standard called InfiniBand (IB) has been the dominant cluster interconnect since its introduction, with specifications first published by an industry consortium that included Intel (News - Alert), IBM, HP and Microsoft in 1999. The InfiniBand Trade Association provides an informative InfiniBand primer here.
IB has many qualities in its favor, including extreme scalability, high bandwidth (100GBits/s per port),
low latency (sub microsecond end-to-end), extreme scalability and hardware offload that includes a very powerful feature called RDMA (Remote Direct Memory Access). RDMA allows data to flow “zero copy” from one application's memory space to that residing on another server at wire speed, without the intervention of the OS, or even the CPU, allowing data movement to scale with memory speeds not just CPU core speeds (which have stalled).
Changes to the Data Center
InfiniBand and supercomputing are related to data center design in this way: a well-balanced server farm design must balance compute, storage and network performance. Many elements are now converging to expose the legacy 37-year old TCP/IP Ethernet as the weakest link:
InfiniBand can tackle all of these challenges, while also offering smooth migration paths. For example, via IPoIB, InfiniBand can carry legacy IP traffic at great speed and, while this does not immediately expose all of the protocol’s benefits, it provides a bridge to more efficient implementations that can be rolled out over time. Furthermore—and contrary to popular misconception—InfiniBand is actually the most cost-effective protocol in terms of $/Gbits/s of any comparable standards-based interconnect technology, and dramatically so if deployed as a unified fabric.
Not Your Father’s InfiniBand
IB’s attributes are clear, including performance and scalability, but does it have the depth needed for production deployments? It is true that the first InfiniBand implementations were limited and lacked proper security. They were only capable of short links between racks by the standard’s precise lossless flow control scheme. But as early adopters and innovators made use of IB, the resulting solutions have overcome these initial barriers, able to travel around the globe with multi-subnet segmentation and strong link encryption. This is not your father’s InfiniBand; organizations adopting this new technology are sure to gain from the ongoing, rapid innovations that the supercomputing community offers.
About the author: Dr. Southwell co-founded Obsidian Research Corporation. Dr. Southwell was also a founding member of YottaYotta, Inc. in 2000 and served as its director of Hardware Development until 2004. Dr. Southwell worked at British Telecom's Research Laboratory at Martlesham Heath in the UK, participated in several other high technology start-ups, operated a design consultancy business, and taught Computer Science and Engineering at the University of Alberta. Dr. Southwell graduated with honors from the University of York, United Kingdom, in 1990 with a M.Eng. in Electronic Systems Engineering and a Ph.D in Electronics in 1993 and holds a Professional Engineer (P.Eng.) designation.
For more on all the latest technology and trends happenings, be sure to register to attend ITEXPO, the business technology event that brings together service providers, enterprises, government agencies, resellers, vendors and developers to demo, discuss and network all the latest innovations changing the marketplace. ITEXPO (News - Alert) is being held January 27-30, 2015, at the Miami Beach Convention Center in Miami, Florida. Stay in touch with everything happening at the event – follow us on Twitter.