The General Data Protection Regulation (GDPR) legislated and passed by the European Union has sent ripples around the world, and depending on who you ask, this could either spell apocalypse, the workings of a nanny state, or a very positive step towards consumer privacy. The direct objective of such a ruling is to give control to individuals over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. The larger context over here is essentially a philosophical one — people have their right to privacy, as in the real world, so too in the digital realm. Closer home in India, while our IT laws lag a bit behind, the Supreme Court did make a landmark judgment in 2017 which unambiguously stated that privacy is a fundamental right to the citizens of India, and the Srikrishna Committee has tabled their draft of a data protection bill, that still needs refinement. Crucially, this will certainly have bearing on matters related to personal data and data privacy when the time comes. Now that we have the legalese out of the way, let’s explore what this means and entails for data scientists.
The Fundamental Conflict
Of late, the paths of data scientists and privacy advocates have diverged — there is a schism in methodology. Data Science has the goal of acquiring new data and finding new uses for existing data. Privacy advocates strive to minimize data collection while data scientists strive to maximize it. Privacy advocates are moving to decrease unexpected uses of data, while data scientists are incentivized to increase them. What has come to light in the past few years are the ways in which data scientists are able to cull insights and learnings from what was thought to be innocuous data. Projects have been carried out which link sensitive but anonymized data to specific individuals, reveal the gender and/or ethnicity of individuals based on social media likes, retrieve personal records of individuals based on camera footage on the street, fingerprint cell phones based on cell tower check-ins, etc. To comply with GDPR, both data scientists and privacy advocates need to work together to walk the tightrope of the rights to one’s privacy whilst ensuring there are no economic setbacks once data is more regulated.
What This Means For Data Scientists
The new laws will certainly require a whole new learning curve for data scientists in the following processes:
- Ability to collect data: Individuals will need to give express consent for what data are collected and will need to be informed as to why the data are being collected
- Ability To Use Data: It will become necessary to get express consent for each application of personal data
- Ability To Transfer Data To And From Third Parties: Stiff regulatory fines will certainly produce an environment where corporations are very reluctant to buy, sell or share data that may be personal. In addition, the right to privacy/erasure regulation may have strong implications on data sharing. In short, we can expect a drying up of certain data sources.
Why We Need To Look At This Positively
It’s hard not to get carried away by the enormous potential of big data, but, as so many data leaks have taught us, there are people that are the victims of these leaks. Real human beings, with rights to their own data and privacy. Often, newly-minted data scientists have played fast and loose with privacy and consent — think culling data from social media (relevant here: Facebook monetizes your 2FA phone numbers); but with the new regulations percolating through, data science teams will have to work with narrower data sets, but those that come with a higher degree of consent.
Data Science will have fixed boundaries around it again. While this might read to some as a more restricted playing field, it will channel the focus back on process, sound technique, tools, and relevance to business context. This move bodes well for the art of data science and authentic practitioners of the art. Their work will come to be appreciated more and we can expect recommendations, targeted offers and the like coming to them either with better focus and pertinence. Most importantly, questions around the ethics of data scientists, that could otherwise distract from the results they produce, will start to recede.
As it stands, a more promising, rich and ethical digital landscape is on the horizon.