infoTECH Feature

February 10, 2021

Creating a Successful Data Privacy Initiative: How to Avoid Scalability Issues and Unnecessary Risks

By Special Guest
Balaji Ganesan, CEO and co-founder Privacera

Enterprises worldwide are under increased scrutiny to properly manage customer data. Responsible use of customer information and ensuring the privacy of petabytes worth of customer account data such as credit card, healthcare and social security numbers, membership points, credit scores, banking info and other PII and other confidential data is legally required by an increasing number of governments for mandates like GDPR in the EU, CCPA in California and LGPD in Brazil. And the cost for non-compliance does not come cheap. Above and beyond the loss of consumer confidence and brand reputation, monetary penalties range from $7,500 plus an additional $100 to $750 per California resident and incident. In Brazil, companies are fined up to two percent of their revenue for the preceding fiscal year, while Europe charges  €20 million Euros or up to four percent of the annual worldwide turnover of the preceding financial year. 

In an environment where consumer privacy regulation is becoming more stringent, companies are increasingly investing in data governance initiatives to alleviate the risks of non-compliance. The issue is these initiatives are complicated and take years to initiate as they involve multiple teams and systems. Worse, ensuring data compliance typically requires participation from a number of groups within the enterprise, including the offices of data privacy, security, and governance. As a result, the effort must reach across multiple data stores, both on-prem and in the cloud. A third dimension of data governance is the longevity of the solution itself as the data governance initiative must be able to withstand the test of time. 

Recognizing that data governance is a single, enterprise-wide initiative that must be thoughtfully selected, tested, and deployed at scale over a multi-year timeline, organizations must ensure the solution can adapt to regulatory changes without forcing an overhaul of existing systems and policies as IT and data teams do not have the resources to implement, and re-implement different governance solutions year over year. Before deciding which data governance approach is best for you, start by giving careful consideration to the following risk areas:

#1: Establishing a Solid Foundation

Choosing a data governance solution that performs at scale is a key criteria to ensuring success, because as data volumes grow, especially data that resides in the cloud, the scalability of a long term solution is of imminent importance. One way to avoid scalability issues is to assess the architecture of the solution. Just as one does when purchasing a house, you evaluate the strength of the foundation and how long it will last. Is it designed to support a house, on the hillside it was built on? Can it sustain years of soaking rain, piles of snow, or baking sun? Is it strong enough to build a second story on top as your family grows? Can you add a garage for extra storage? In short, was it built for what it was designed to do and, more importantly, can it continue to do what it was designed to do for 20-30-40 years? The same applies to the architecture of a data privacy solution. 

If you want to build a Hollywood-sized mansion, you are not going to start with a foundation designed to support a small house, no matter how modern the tiny version may be. When selecting a data privacy solution, it is important to evaluate options that are designed for the problem at hand, just as the foundation must match the requirements of the home. 

For example, some data access control solutions were originally built for another purpose. Data virtualization platforms fall into this category, as they were originally developed to provide data analysts and data scientists access to data from a number of sources. Data virtualization-based solutions connect to different data sources, allowing data access from a common logical access point. At first glance, this seems extremely desirable; however, in practice this does not work well, practically or technically. This is because data virtualization approaches struggle to disperse the information quickly to data scientists. This creates bottlenecks as all queries are required to go through the virtualized data layer, and eventually cost a small fortune in infrastructure and maintenance as they bloat with new data sources. 

#2: Understanding the true TCO

When migrating to the cloud, enterprises should evaluate the impact on existing teams and find ways to leverage existing infrastructure, processes and systems as much as possible. How much expertise, training and effort is involved whichever solution is selected? Does it require a complete overhaul of the team’s skill sets? For example, if a data virtualization architecture for data access control solution is utilized, IT must recreate metadata in the new solution, creating substantial overhead for teams who must then rewrite client applications in the new virtualization platform. How much time does the IT team have to create metadata for the petabytes of data being stored and/more migrated to the cloud?

So, while the objective of course is to minimize the workload on the IT teams tasked with implementing and maintaining a data access control solution, there are other considerations: 

  • Impact on Policies: If your company has invested significant time, effort, and resources in building its unique access control policies, it doesn't make sense to abandon these and recreate them. Enterprises need to be cognizant of how they can maintain consistency across old and new, on-prem and cloud environments without unnecessarily reinventing the wheel.
  • Impact on Data Scientists: Beyond the IT team’s efforts needed to manage policies, consider the impact on data scientists. A data access control platform based on virtualization technology forces your data analysts and scientists to rewrite all their queries from scratch to point to the virtualized layer. This is an unnecessary burden that can, and should, be avoided. 
  • Impact on Systems:  Overhead is a significant challenge for cloud environments too, as the volume of data rises rapidly and user queries against data cannot be anticipated. To effectively manage access to data in the cloud, companies need a platform with a native cloud integration that does not disrupt data being accessed.`

#3: Ensuring Extensibility

Another risk of any data privacy and governance initiative is extensibility. Since the solution must be able to be utilized over many years, it should be able to operate under increasing volumes, and adapt to new cloud services, data types and use cases without impeding scalability, performance or reliability. 

Mitigating Data Privacy Risk

As discussed in consideration #1, a solid foundation is one of the prerequisites for a stable long term solution capable of meeting governance needs today and for what lies ahead. When choosing a data access control solution, it is critical to plan for the future, and realize that what is being put in place today, will need to continue to be an integral part of systems and processes for years to come.  

However, this involves evaluating the architecture used for data access control. When migrating data to the cloud, the majority of data is stored in object stores like S3 and ADLS. Applications read this data using languages such as SQL, Python, Java, and others. Some solutions are limited by volume, while others are limited to data types. For example, some solutions are limited to only securing tabular data which constrains future growth and scalability. In solutions based on data virtualization, every new analytics service or data store, and additional feature requires the creation of a new virtualization capability to be created. So before jumping full-force into a data privacy solution that may outlive its usefulness, it's critical that organizations look to the future to ensure the solution, and ultimately the effort involved can stand the data privacy test of time.  

About the Author: Balaji Ganesan is CEO and co-founder of both Privacera, the cloud data governance and security leader, and XA Secure, acquired by Hortonworks. He is an Apache Ranger committer and member of its project management committee (PMC). To learn more visit or follow the company @privacera.

Edited by Maurice Nagle

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers