Real-Time Big Data Analytics (RTBDA) has emerged as a new topic in big data discussions. RTBDA refers to one of the key aspects and value propositions of big data analytics, namely the ability to act either proactively or reactively in real time based on analysis of available information. This strategy is one of the cornerstones of many Internet/Over-The-Top (OTT) companies such as Amazon and Google (News - Alert).
These OTT players are a source of inspiration and frustration to telecom carriers that must come to grips with the increasing amount of traffic generated across the network with little or no revenue contribution.
In this piece, we will take a closer look at RTBDA, specifically in the context of telecom networks. Fortunately, the technologies required to implement such a strategy are available and in use; however they are not as effective as they could be.
In a very simplified way, one could say that big data analytics is composed of two parts that distinguish it from business intelligence or data warehousing and mining:
One of the challenges that big data analytics addresses is the need to process large disparate data sets that normally cannot be accommodated by a single database or server.
One solution to address this issue is the use of distributed, parallel processing where large data sets are distributed to multiple servers, whereby each server processes a part of the data set, in parallel. Big data analytics does not require a specific structure for the data but can work with both structured and unstructured data. Using Hadoop with MapReduce is an example of such an approach and can be credited as a driving force behind the current interest in big data.
Solutions can be found today for processing large amounts of data, but what is important in a big data perspective is that processing should be completed within a defined time frame. That time frame is now increasingly being associated with “real-time”.
RTBDA is relatively new, but it addresses the need to act proactively or reactively in real time. It is inspired by the capabilities of Internet content and services providers to understand what is happening, analyze the situation and take action in real time.
Defining “Real time” for Telecom
How real is “real time?” The answer depends on the context of what you are trying to achieve and the environment in which you are working. For some, seconds or microseconds are enough, and for others, real-time needs to be faster.
From a telecom point of view, this is an interesting question. It exposes a potential issue with current practices in telecom that need to be addressed if carriers are to succeed in tackling the challenges that OTT traffic is posing. The fact is that the current acceptance of what is “real time” in telecom may no longer be sufficient.
In the past, telecom networks were based on connection-oriented technology. Protocols and changes could only be applied centrally in a highly structured process; and the network did not change very much from one minute – or even one hour – to the next.
In this environment, it was sufficient to gather information from the network at regular intervals to know what was happening. The protocols that were used were also rich in management information, so a great deal of insight could be gathered from just one protocol header. Here “real time” can be defined in seconds or even minutes, which is why it is sufficient to collect Call Detail Records (CDRs) every 5 to 15 minutes to gain full insight.
However, today the situation has changed. With the migration to LTE (News - Alert), telecom carriers have completed the transition to packet networks based on Ethernet and IP which function in a completely different way compared to connection-oriented technologies and protocols.
First, the fundamental principle of IP networks is that the network takes care of itself. The network defines the path that traffic takes and reroutes that path depending on congestion and other conditions. This characteristic allows the network to react quickly to changes. The downside is that you cannot predict with certainty where traffic will be flowing. This challenge is not made any easier by the fact that Ethernet and IP protocols, by design, do not contain the same level of management information overhead that connection-oriented protocols provide.
Packet networks are also 'bursty' and dynamic in nature. They are designed to support multiple services consumed by multiple users sharing the same infrastructure. Over a long time period, it can look like the utilization of the network is quite low, but the reality is that traffic is transmitted in bursts, which can consume the entire bandwidth available. In such situations, the IP network is expected to react and ensure that this traffic is routed in a balanced way through the network. The bottom line is that changes can occur in the network from one IP packet or Ethernet frame to the next.
The fundamental issue on how telecom network management and data analytics is being performed today is that they both rely on CDRs, Event Detail Records (EDRs) and IP Detail Records (IPDRs) to understand what is happening in real time.
However, this definition of “real time” is anchored in the paradigm of the past when polling every few minutes was enough. When we consider that Ethernet frames in a 10 Gbps network can be transmitted with as little as 67 nanoseconds between each frame, we begin to understand what “real time” means in a packet network. It is not minutes; it is not even seconds. It is nanoseconds.
Real-Time Decision Making
To be clear, using CDRs, EDRs and IPDRs for big data analytics is a good idea, but it depends on what you are trying to achieve. Big data analytics can be used for two broad categories of decision-making:
Using detail records for better planning and optimization along with other structured and unstructured data sources is appropriate and valuable. These records are rich in information and useful trends and predictions can be generated based on this data. However, this information will never provide a complete picture until they are complemented by real-time information from packet networks that can provide exact details on what happened and when.
Needless to say, detailed records cannot be used for real-time decision-making. They are only collected every 5 to 15 minutes, which is not compatible with our understanding of what real-time should be in packet networks. For true real-time decision-making, it is necessary to continuously collect, store and analyze network information. To understand what is happening, all the relevant Ethernet frames and IP packets need to be examined in real time.
By capturing and storing network information in this manner, we not only enable the ability to analyze and act on this information in real time but also provide a source of detailed, reliable information on what and when an event happened in the network to complement other big data analytic activities.
Implementing RTBDA in Telecom
The real-time data collection layer can provide a constant stream of actionable information for decision-making. Both the TM Forum (News - Alert) and the IP Network Monitoring for Quality of Service Intelligent Support (IPNQSIS) project, part of the European CELTIC-Plus program, have researched this need as part of their respective work on customer experience management. The conclusion from both projects was that probes and appliances are critical to providing reliable, real-time insight into what is happening in the network.
Probes are traditionally data collectors that provide information to other management systems. Appliances, on the other hand, use the same technology but also analyze the information and can store the information locally. Appliances are typically focused on a specific task, such as performance monitoring, test and measurement, or security, and are often seen as fulfilling that very specific role. But, probes and appliances can also be used more strategically as sources of real-time data for big data analytics and as implementations of RTBDA strategies. The following provides a three-step view of how such an infrastructure could be implemented.
The first step involves deployment of appliances for data collection. The key requirement here is that all the Ethernet frames and IP packets need to be captured, in real time, at line speed with zero packet loss, no matter the conditions. This visibility ensures that a reliable stream of information is being collected.
It is also extremely important that each and every frame is given a unique time stamp, so that an accurate timeline can be established not only local to the appliance but also across multiple appliances. The accuracy and precision of these time stamps must be in the range of nanoseconds. For example, with only 67 nanoseconds between Ethernet frames in a 10 Gbps network, the time stamp resolution must be better than 67 nanoseconds. Otherwise two Ethernet frames would receive the same time stamp, making it difficult to distinguish which came first. In a 100 Gbps network, this time span reduces to 6.7 nanoseconds.
The combination of zero packet loss capture with nanosecond precision time stamping ensures that we have a reliable, accurate stream of data analysis information.
The second step is storing this information in real time. Many appliances provide capture to disk, which allows real-time data to be stored directly to a local hard disk on the appliance. Alternatively, this data can be forwarded to a Storage Area Network (SAN) or other location. The stored data can be used to build a historical timeline of what has happened in the network with precise details. It is possible to recreate exactly what happened, as it happened, using this information.
For data analytics, this history is a source of rich information. Data such as this can provide insight into usage and behavior trends. If the appliance has Deep Packet Inspection (DPI) capabilities, then usage of services, including OTT services, can be tracked and analyzed to provide usage patterns with respect to time, location and type of device.
This information alone provides a valuable resource for network and service optimization. New, attractive services can be defined that match users’ preferences. But, perhaps even more importantly, this information can be used to provide insight to OTT content service providers, so that carriers can offer compelling service offerings to these potential customers.
The third and final step is the potential to use real-time and stored data to enable real-time decision-making. Historical information captured to disk can be used to develop a profile of expected behavior. When data is compared with the real-time information on network activity, it is possible to detect unexpected events or anomalies. These issues can be a security threat, performance degradation or an opportunity to offer a customer a package extension or a complementary service.
From a RTBDA perspective, this capacity is very close to the types of abilities that OTT content and service providers have implemented. The ability to react in real time, based on an understanding of what is happening right now and comparing it to what has happened in the past.
Rethinking RTBDA in Telecom
It is time to reconsider what “real-time” means in modern telecom networks. It is also time to reconsider what sources are used for big data analytics. Telecom carriers must begin to consider the use of probe and appliance technology already in the network in a more strategic way to support RTBDA. By doing so they will not only provide a better source of information for planning decisions, but they will also create new opportunities to offer better services, not only to end users, but also to OTT service providers. This ability could finally address the issue of monetizing OTT traffic in telecom networks.
Daniel Joseph Barry is vice president of Marketing at Napatech (News - Alert) and has over 20 years experience in the IT and Telecom industry. Prior to joining Napatech in 2009, Dan Joe was Marketing Director at TPACK, a leading supplier of transport chip solutions to the Telecom sector. From 2001 to 2005, he was Director of Sales and Business Development at optical component vendor NKT Integration (now Ignis Photonyx) following various positions in product development, business development and product management at Ericsson (News - Alert). Dan Joe joined Ericsson in 1995 from a position in the R&D department of Jutland Telecom (now TDC). He has an MBA and a BSc degree in Electronic Engineering from Trinity College Dublin.