For real-time visualization and analysis of massive data sets, researchers at MIT (News - Alert) have developed a new software tool, called MapD, to process big data significantly faster than is possible with traditional ways of using standard computers. Thus, it opens up new ways to visually explore everything from Twitter (News - Alert) posts to political donations, say the MIT researchers.
In essence, MapD uses a hybrid multi-CPU/multi-GPU architecture running across multiple nodes. This architecture allows for the massive parallelization of the data, analysis, and visualization of big datasets, resulting in several orders of magnitude improvement in processing speed. For such gains, MapD, or massively parallel database, stores data in the onboard memory of the graphics processing units (GPUs) instead of the central processing units (CPUs) used in conventional methods.
Per MIT Technology Review report, the prototype software tool is being demonstrated on tweets, whereby it can show a meme propagating in real-time on regional or world maps. The report suggests that many large-scale Twitter visualizations, including animated maps and charts, take several seconds or longer to process data before it can be displayed. With MapD, a user can adjust search terms and other parameters, such as time frame or geographical region, and see a new visualization instantly, without having to wait for each new map and animation to compute and load.
According to MIT’s computer science professor Samuel Madden, “the existing Twitter visualizations are ‘canned’ based on some previous computation of a map or picture, rather than being truly interactive.” He added, “We have built a new kind of database system. It will answer and also map every request by scanning through every tweet in the database, which can be done in just a few milliseconds.” Furthermore, he continued, “The system can keep up the pace even if the database has hundreds of millions of tweets.”
In fact, the technology was envisioned last year by Harvard graduate student Todd Mostak, who was frustrated by the slow processing of the social-media data sets. “By building a tool to explore data sets like this in a truly interactive fashion, with latencies measured in milliseconds rather than seconds or minutes, we hope to remove a computational bottleneck from the process of hypothesis formulation, testing, and refinement,” Mostak told MIT Technology Review’s chief correspondent David Talbot.
The report indicates that leading GPU maker Nvidia is planning to demonstrate MapD on more than one billion tweets using eight GPUs at an upcoming conference. The researchers are also planning to do a joint demo with Gnip, a reseller of social-media data from sources like Twitter, Foursquare (News - Alert), and Facebook.