infoTECH Feature

June 23, 2016

Big Data at Your Doorstep

By Special Guest
Jeff Plank, CEO Agile Data Sites

Give me a computer strong enough and I will solve the universe

CERN has just released 300 terabytes of particle collision data to the public. This is the same data which eventually led to the confirmation of the existence of the Higgs boson, the missing particle in the standard model of physics. With this data, you could conceivably derive the signals from as yet undiscovered particles.

This is what big data looks like. It’s an almost impenetrable wall of ones and zeros that hides within it unimaginable meaning if only we knew where to look.

And it has gotten easier than ever to look. You could spin up an AWS cloud instance right now and begin blasting away at this CERN data with more computing power than was available to physicists in the entire 20th century. There is a Nobel (News - Alert) Prize lurking in that pile of numbers and all you need to do is load Hadoop on a powerful enough cloud and wait for the lightning to strike.

There is more useful data in the world right now than in the rest of human history combined. The biggest difficulty is figuring out what to do with the data once you’ve collected it. In the case of CERN, it means getting rid of a lot of data.

When the LHC smashes particles together it immediately gets rid of 99.999 percent of that data using an algorithm. Then another piece of software reduces that data from about 40 million events per nano second to about 200. All of that means that CERN only retains about 0.000000005 percent of the data that is produced.

This may seem like an isolated case but it’s not. Big data has an inherent weakness in that there are bound to be false positive signals on account of the sheer volume of possible correlations we can look at. The storage required for big data may be astronomical, but the real work is in the computation that it takes to make meaningful conclusions from that data.

You might think that the cloud is the best place to work through large data sets, but you may want to run the numbers first. Computation on the big data scale, that of terabytes and petabytes, requires a huge number of processor cores to work efficiently. Since cloud costs are computed based on the number of clock cycles it becomes incredibly cost prohibitive to run big data calculations on the cloud. Sure, if you are planning to run a one-off data set maybe it’s cheaper to go with the cloud rather than purchase hardware, but in most other situations it just doesn’t make economic sense.

That’s where Agile (News - Alert) colocation like Agile Data Sites can help. Let’s say that a company is planning to run a huge data set but only for a few months. They would need reliable power and cooling. They could build all of that infrastructure on-site or they could move into an agile colocation facility for a short term colocation contract. They could quickly spin up their servers and run the computation for those three months.

Then at the end of that term they could pack up their bags and leave having spent a fraction of the cost of the cloud without any long term contracts. Or maybe after three months they need five more racks, and three months later need ten fewer. Agile Data Sites and other flexible data center spaces can help to facilitate that kind of rapid deployment/ rapid decommissioning cycle. This can greatly influence your bottom line.

It's one thing to collect data. That is happening at a rate unseen in the history of the world. It’s another to use that data. Just like the researchers at CERN, you are going to need to sift through a world full of data to find the one conclusion you’re looking for. Be sure that you have enough power to find it, and don’t get sticker shock when you realize how much it cost you.
 

Jeff Plank is President/CEO of Agile Data Sites, LLC. He has more than 20 years experience in the hosting, colocation and managed services industry for both IBM (News - Alert) and AT&T. Jeff served as EVP and CTO for Directlink, taking the company from startup to an extremely successful data center services provider with responsibilities including managing operations, sales, marketing and overall organizational direction.




Edited by Stefania Viscusi
FOLLOW US

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers