Data Privacy: Don't Forget the Data Scientist
Anthony Tockar (CIPM, CIPP/E) is an IAPP-certified information privacy manager and information privacy professional (Europe). He is also an expert in differential privacy, his work earning him a citation in The Economist and other notable publications.
Data privacy is finally having its day, with Europe’s GDPR framework sparking substantial efforts to improve user privacy all over the globe. Meanwhile, AI research continues unopposed, with breakthrough upon breakthrough redefining the status quo, continually refining better approaches to squeeze maximum value from data. The best data scientists are riding this wave, and are well-positioned to contribute to the privacy debate. With the GDPR and related legislation stipulating that organisations address data privacy at the highest level, it is essential that the data scientist has a seat at the table.
Data Scientist as Decision Maker
The prevailing stereotype about data scientists is something akin to the cartoon on the right - good with numbers and predictive modelling but a poor communicator and socially awkward. As with all stereotypes, this is harmful and in many cases far off the mark. In order to be effective problem solvers, data scientists should have a good all-round understanding of their business, and should be competent at communicating their viewpoint.
The knowledge and perspective that a data scientist accumulates through study and practice provides an invaluable asset at the decision-making table. Being “data-driven” is more than just PR - to be a data-driven organisation you need fact-based leaders. Furthermore, with the latest disruptions brought to bear by big data, automation, AI, and of course, privacy attitudes, the data scientist is well-placed to understand the business impacts and help craft solutions.
Data Privacy in Practice - The New Paradigm
GDPR - everyone’s new favourite four-letter word. This graphic, from Tim Clements and the IAPP, provides a nice high-level overview of the regulation:
Of course, there are many interpretations of the above, and a lot of unanswered questions about how this will actually affect organisations. However, it is quite clear that the GDPR (and associated regulations expected to follow from other privacy regulators) represents a sea change for most businesses that handle personal data.
Now, more than ever before, businesses need to treat privacy as a first-class citizen. The potential financial and reputational penalties for negligence are so severe that privacy program management necessitates executive buy-in. This is even enshrined in the GDPR - a Data Protection Officer (DPO) must be appointed if the organisation is conducting regular, large scale processing on data subjects1. Even if this is not the case, businesses may still appoint a DPO, and in many cases are recommended to do so. The DPO acts as an independent expert in data protection, and reports to the highest management level2.
The DPO and the Data Scientist
This article from Lee Schlenker puts it succinctly: “Hiring a DPO with little knowledge of Data Science is likely to be as ineffective as it is counter-productive. The DPO must understand why and not just how the organisation is collecting personal and sensitive data.” The data scientist provides the why. The role of the data scientist is to use the data collected by the organisation to solve business problems - and thus can provide perspective on the utility, value and purpose of the data. Without this perspective and understanding, the DPO is forced to rely on information from staff who may not have a strong knowledge of the data, and may not understand the full picture.
It’s also worth mentioning that the perspective of many privacy professionals is that data is an organisational risk that needs to be monitored and controlled. This is fair, given the very real risk that processing sensitive data does entail. However, data scientists tend to hold a different view, seeing data as a valuable resource that can transform a business and contribute heavily to future profits. A DPO will be more effective by understanding both positions.
There is therefore a need for cross-domain knowledge - on both sides. While it is a lot to expect one professional to be an expert in both areas, a working knowledge of the other’s professional domain is not only desirable, but necessary. Aside from the aforementioned end-to-end expertise of the data scientist, there are several areas within data privacy that are driven by data science - for example, differential privacy, homomorphic encryption, and certain anonymisation methods have mathematical and statistical foundations that are best interpreted and implemented by data scientists. Also, there are elements within privacy legislation that pertain to data science - GDPR for instance made headlines with its “right to explain”3, and its focus on profiling as well as such precepts as data minimisation impacts the work of the data scientist. This should be understood by the DPO and communicated to business leaders where relevant.
Correspondingly, data scientists can benefit immensely by learning about privacy regulations, trends and practices. At Verge Labs we have written before about the changing environment in AI, with a greater emphasis on ethics, bias and fairness, interpretability and of course privacy. Understanding the privacy environment prompts data scientists to be more careful when performing their analyses, and espousing best practices (such as privacy by design) better protects their organisations from privacy failures. Finally, it promotes a more effective working relationship with the privacy team.
This final point is a key one - the DPO and the data scientist need to work together. Organisations should invest in cross-training, ensuring each professional understands the other’s respective domain. Only through collaboration can organisations best interests be realised, within the bounds of the mercurial backdrop of the privacy and AI landscapes.