Resources / Blogs / Right to Forget

Right to Forget

AN IMPLEMENTATION OVERVIEW

Any organization that collects and stores EU resident data is subject to General Data Protection Regulation (GDPR). Examples of such organizations include Google, Facebook, and Amazon. The regulation places the obligation for responsible data handling with such organizations, and gives individuals a number of rights. All major geographies now have GDPR-like regulation or are in the process of passing one. GDPR-like regulation for India is called PDP (Personal Data Protection) Law

Right to Forget

One of the key rights that individuals have under these regulations is a “Right to Forget”. It empowers individuals to request erasure of all non-essential data, as defined by the regulation, related to them across all databases and data sources throughout the organization. Organizations are required to implement the request, and give an explanation if they don’t. If they are proven to be in violation of the right, for example by sending an innocuous marketing email, the organization is in violation of the regulation, and may attract severe penalty.

The implementation of ‘Right to Forget’ is complicated due to the fact that it might require modification of data in place (especially for large datasets), cover all datasets (has to be exhaustive), and the fact that data has been deleted has to be demonstrable.

This requires a system and process for obtaining and tracking requests from the data subject or individual, and implementing them using a fit-for-purpose mechanism with appropriate audit trails.

Hard Delete and Soft Delete

There are two major approaches that an organization can use to implement the ‘Right to Forget:’ Hard Delete and Soft Delete, based on whether the underlying data is modified or not. 

In all cases, the access to data is mediated through an API. We see three kinds of access paths:

  1. Default – Standard interface to the backend as defined by the implementation such as ODBC.

  2. Erasure API – API to receive the erasure requests from the consent management, implement them, and provide evidence for the same. 

  3. Virtualization API – API to proxy data requests, and erase records or parts of records as required without modification. The API should mimic the backend API such as ODBC. 

Hard deletes: As the name suggests, this approach would require the organization to delete the User’s data such that it can’t be reproduced once deleted. The copy of data at rest, whether in databases or files, is modified to eliminate the relevant records. Hard deletes can be further classified as metadata deletes or raw data deletes. In the case of the former, only the key lookup tables or metadata are deleted. This method is appropriate when the PII is completely factored out into a seperate table, and the rest of the data is useless without the right metadata records. 

Soft deletes: This approach does not modify the underlying data, but achieves the goal by introducing a data virtualization layer that sanitizes the data as it passes through. 

Hard deletes are harder to implement, and may have potential side effects such as invalidating derived artifacts such as models or constraint violations. Soft deletes are easier to implement as a ‘patch’ over an existing system. We expect to see both mechanisms in organizations based on cost and complexity of the implementation. It is unclear what is legally acceptable, but demonstration of intent to comply is critical. 

Common Sources of Data

Other Implementation Considerations

There are a few other considerations during implementation:

  1. Auditability: The system must maintain audit logs of changes to the data made, when, and ideally with a link back to the consent manager. These audit trails will likely have to be provided to the regulatory body. 

  2. Catalog: A comprehensive catalog of the data is required to implement the various erasure mechanisms. 

  3. Reporting: It is not enough to provide audit logs of the changes, it is necessary to format the output in a way that is accessible and acceptable to the legal and administrative stakeholders internally and externally. It is worth standardizing and automating the reporting structure given the repeated nature of the activity.

Download PDF

References

Related Blogs

November 24, 2022

What is the Metadata Economy?

We live in a hyper-digital world, and due to the nearly  infinite number of data sources that surround us, the volume of data generated collectively by individuals, applications and corporations is larger than ever. With such a monumental amount of data to sift through, two core principles have  become increasingly important: Metadata – Make it […]

Read More
November 10, 2022

Data Science Teams are Doing it Wrong: Putting Technology Ahead of People

Despite $200+ billion spent on ML tools, data science teams still struggle to productionize their data and ML models. We decided to do a deep dive and find out why.  Back in 1991, former US Air Force pilot and noted strategist John Boyd called for U.S. Military reforms after Operation Desert Storm. He noted that […]

Read More
November 3, 2022

MLOps – The CEO’s Guide to Productionization of Data [Part 2]

With data being touted as the oil for digital transformation in the 21st century, organizations are increasingly looking to extract insights from their data by building and deploying their custom-built ML models. In our previous article (MLOps – The CEO’s Guide to Productionization of Data, Part 1), we learned why and how embedding ML models […]

Read More