Unpacking Non-Personal Data – Defining NPD

Unpacking Non-Personal Data – Defining NPD

Summary by Anubhav Khanna, Policy Associate, TQH

In partnership with The Print and Network CapitalTQH organised a series of online discussions on ‘Unpacking Non-Personal Data’, bringing together lawyers, tech policy experts, investors and political voices to deliberate on the proposed framework for non-personal data (NPD) governance. The first discussion focused on defining non-personal data. The second looked at the impact of the proposed provisions on the innovation ecosystem in India and the third deliberated on the question – is data a national resource?

For the first discussion on 6th August 2020 on ‘Defining Non-Personal Data’, we were joined by Nehaa Chaudhuri, Partner at Ikigai Law; Smitha Krishna Prasad, Director at Centre for Communication Governance at NLU Delhi; Ridhi Varma, Research Fellow, National Institute of Public Finance and Policy and Anubhutie Singh, Policy Analyst at Dvara Research. The session was moderated by Deepro Guha, Senior Policy Analyst at the Quantum Hub.

This piece captures the essential aspects of the first discussion. For details on other discussions in the series, please follow the links below:

Impact on the Indian Innovation Ecosystem
Is Non-Personal Data a National Resource?

Discussion 1: Defining Non-Personal Data

A. What are the legal implications of a wide definition of Non-Personal Data?

Smitha Prasad from NLU Delhi was of the opinion that the definitions put forth by the committee are very wide. She also mentioned that without full proof standards for anonymization, concerns around privacy are difficult to mitigate. This can have wide ramifications across the spectrum, from individual privacy to national security. She explained how the committee has also tried to break down non-personal data into two categories – NPD with a human contact element (eg. consumer shopping trends data, payments data etc.) and NPD without human contact (eg. weather reports, rainfall measures etc.). She explained that the committee has looked at various sources of data and has attempted to introduce the concept of community data. Although the report has tried to break NPD into various categories, there is a potential overlap in the way concepts have been defined. The committee has also borrowed from the concepts of the Personal Data Protection (PDP) Bill in terms of assigning various levels of sensitivity to data.  In the PDP, these concepts had taken the shape of general personal data, sensitive personal data and critical personal data. She pointed out that sensitivity can be attributed in two ways. One way sensitivity can be thought of is how much/how critical personal data can be derived from the shared NPD. Sensitivity can also be looked at from the lens of national security and societal welfare. Although the committee has tried to deal with sensitivity, it has left a lot of scope for questions.

B. How much anonymization is required to make Non-Personal Data truly anonymised? How big an issue is de-anonymization?

Anubhutie Singh from Dvara Research talked about how data that is automatically generated from machines, computers, supply chains etc. does not really pose privacy problems. For data involving human interaction, computer science literature proves that dynamic data cannot be completely anonymised and some degree of reversibility always persists. This is especially true for datasets with a high level of dimensionality. She said that under the ideal case data should be completely anonymised, however it is not possible given the current state of technology. In the case of static datasets, if not individuals, groups or communities can be identified. This can potentially accrue harm to a community through the targeted use of data. For dynamic datasets, research shows that it is possible to recreate large portions of datasets which again poses privacy concerns. Complete anonymization, given the current level of technology is difficult to achieve.

C. Given that there are so many actors involved in the data governance space, who is liable in case data is de-anonymised?

Ridhi from NIPFP said that the Data Protection Authority would be liable to frame the standards for de-anonymized data. However, she explained that it would be very difficult to lay down the standards of anonymization and these will need constant revision to keep pace with the fast-moving technology. She highlighted that the NPD report is largely silent on the issue of liability arising in the case of de-anonymization.

D. Given that big tech companies maintain both personal and non-personal datasets of consumers, what are the chances of the provisions of the PDP Bill and the NPD governance framework overlapping each other? How difficult is the segregation of non-personal and personal data?

Nehaa Chaudhary, partner at Ikigai Law highlighted how companies are incentivised to maintain personal data of consumers since each person has a unique consumer pattern and therefore it leaves little incentive for businesses to invest adequately in anonymization (even though in some cases anonymized data may be useful). There can always be a tussle between the individuals, the company and the government regarding the level of anonymization needed.

Therefore, processing data for sharing will also involve some additional value addition from the companies before it can be shared freely with other companies/entities. This forces a ‘value addition function’ on the companies, who may demand to be compensated for the extra investment and effort involved.

E. Wouldn’t difficulty in segregating personal and non-personal data bring in regulatory difficulties?

Smitha talked about past precedents that show ecosystems with multiple regulatory authorities getting encumbered with regulatory conflicts and tension amongst different authorities.

She also spoke about the jurisdiction of the newly proposed NPD Authority overlapping with the domains of intellectual property and competition laws, which already have multiple regulatory bodies overseeing them. Adding another regulatory body may add to confusion and regulatory conflicts, especially when the ambit of issues touched upon by NPD is very wide.

She also spoke about the challenges of building implementation capacity for the new NPD Authority in terms of training of officers, building anonymization infrastructure and the amount of time investment required in getting it up to speed.

F. What are the possible Intellectual Property implications of data sharing?

Nehaa elaborated upon the potential infringement of intellectual property rights which accompany the policy proposals of the NPD report. Datasets come under the ambit of the Indian copyright law and therefore making data sharing mandatory will infringe upon the protection offered to datasets which meet the standard for originality under the Indian copyright law. On trade secrets too, there can be a potential conflict with international agreements such as TRIPS that protect data rights. Nehaa also suggested that companies should not be deprived of their intellectual property since they have invested time, money and effort in building datasets.

G. Given that the committee’s report categorizes data into private non-personal data, community non-personal data and public non-personal data, the definitions may involve some overlaps. How will these definitions play out?

Anubhuti described what the proposed categories of NPD are – public, private and community. She spoke about the difficulties of practical implementation of categorization and the possibility of overlap in the definitions of these types. As these types of data are proposed to be regulated differently, such overlaps would cause great regulatory confusion. Taking the example of toll collection in PPP mode, she spoke about confusion around the ownership of data collected at the tolls – should it be with the public entity or the private? She also highlighted the ambiguity in the definition of the word “community” and the problems created by the fact that sub-groups within a community may not have their incentives aligned with the rest of the community.

H. The committee’s report talks about how a community may be harmed in case of a breach of community privacy. What is the concept of community privacy and how is the report trying to address it?

Ridhi spoke about how contravention of collective privacy may lead to community harm. She said that the NPD Report falls short of providing any concrete details on how collective privacy may be protected or how community data rights may be enforced.

I. Different platforms maintain data in different formats. Is it possible to have a common standard way of sharing data?

Smitha explained that mandatory sharing of data is aimed at fulfilling many purposes ranging from supporting businesses to national security concerns. Different purposes will need data in different formats. The report does not go into details about the contours of how data will be shared.

Audience Questions

Has any other country made a regulation around non-personal data sharing? Is there any precedent to what this committee has proposed?

Nehaa said that the model suggested by the NPD report seems novel and not exactly followed by any country till now. She emphasized on the need to have a voluntary data-sharing arrangement wherein terms of sharing are left up to the parties involved. In this context, she spoke of the importance of data marketplaces and their importance to serve the data sharing purpose.

Anubhutie added to the above by citing the EU NPD Regulations and an OECD Report that speaks of data sharing and the economics of the same. Nehaa added that the OECD Report being spoken about also warns against mandatory data sharing provisions.

What is the distinction between raw and processed data? Since the usage of data depends on the way in which it was collected, can it be argued that all data is processed in some way?

Smitha explained the difference between raw data and processed data based on the level of effort put into the data. She also said that definitions in the NPD Report do not make these distinctions very clear. She likened raw datasets to sets that have been put together with no particular purpose, but she added that practically no dataset would be collected without any particular purpose.

Will the provisions of non-personal data governance apply to data processors? Are they allowed to read the data that they store? How will they differentiate between personal data and non-personal data?

Anubhuti explained that data processors are entities which enter into an agreement with data fiduciaries. Hence, the data processor can be held to the same standard as a data fiduciary [the report is silent on the issue]. She added that a data processor might be characterised as a “data custodian” in the NPD framework.

Nehaa said that data processors do not exercise any control. Data processors offer infrastructure and tools to perform analytical work. Consequently data processors should not have the same obligation as data controllers/ fiduciaries. From a legal standpoint, the difference between processors and controllers is underpinned by law. Any transfer of obligation by contract happens, but it has to be a reasonable transfer such that the data processors are in a position to stand up to the obligations transferred. She said that the report has not made this distinction clear.

In the case of mixed datasets will the companies be required to segregate their data in personal and non-personal data? How execution heavy is this? Can there be cases where datasets cannot be separated?

Ridhi answered this question by bringing to our attention the various subsets of data suggested by the report – sensitive, critical, personal, community NPD etc., and how one dataset may fall under multiple heads. She said  that this is also likely to lead to regulatory confusion with regards to who the data trustee would be in case of a mixed data set.

Smitha added that personal data includes data which is personally identifiable. The PDP Bill also talks about the idea of inferred data. She gave the example of movies or TV shows which may be popular among a certain group of people and how the line between personal and non-personal data becomes blurred for such datasets.

Should the Personal Data Protection (PDP) Bill talk about non-personal data at all?

Nehaa opined that PDP should not be the place where you regulate NPD. There should be a negative definition for defining personal data such that it restricts the scope of the Bill. There is need for a marked distinction since the NPD and PDP are different paradigms. Anubhutie agreed with Neha while adding that there may be a better way to regulate NPD than the way the committee has tried to do.

How do we balance the interests of an individual’s privacy and that of a community? If I share my blood sample data publicly it will expose the genome data of my family. Can we balance it?

Ridhi acknowledged the concern around the overlapping rights and interests of multiple entities who can share the same dataset. She went to explain how data which may be non-personal from one standpoint may not be non-personal from another standpoint. She supported this by giving the example of IP numbers which are akin to non-personal data for individual consumers (who do not have access to the computing power to de-anonymize users), but can be personal data for service providers who have access to the technology to de-anonymize them and identify the users.

There are so many entities that have been proposed by the NPD committee. Do you see any issues with the way they are defined? What are the potential points of confusion?

Anubhutie responded with the example of a data trustee. She highlighted the challenge of identifying the most representative/appropriate data trustee for the community. She also spoke about the challenge of demarcating what a data business constitutes. She highlighted the part of the NPD Report which says that a business that does not deal with data can still volunteer to become a data custodian. She also highlighted potential issues around data trusts which are proposed as new infrastructure for holding shared NPD, but how these trusts will be structured is not clearly laid out.

Nehaa also pointed out the potential conflict of interest with the state acting as a data trustee. She said that the state in India is an amorphous entity ranging from a public sector unit to the central government. Such government entities have an interest in processing data for both commercial as well as delivery purposes. She pointed out that governments have increasingly become creators of competing platforms for various delivery based services such as e-commerce. For a data trustee to have direct interest in processing data poses a conflict of interest.