Unpacking Non-Personal Data: Impact on the Indian Innovation Ecosystem

Summary by: Abhinav Saikia, Senior Research Analyst, TQH

In partnership with The Print and Network Capital, TQH organised a series of online discussions on ‘Unpacking Non-Personal Data’, bringing together lawyers, tech policy experts, investors and political voices to deliberate on the proposed framework for non-personal data (NPD) governance. The first discussion focused on defining non-personal data. The second looked at the impact of the proposed provisions on the innovation ecosystem in India and the third deliberated on the question – is data a national resource?

The second discussion took place on 7th August, 2020 and focused on the question “Will access to data level the playing field for Indian start-ups? How do we drive the Indian Innovation Ecosystem?” The panel included prominent stakeholders in the tech policy landscape: Nikhil Narendran (Partner, Trilegal), Aditi Agrawal (Senior Research Associate, Medianama), Ganesh Kollegal (AVP – Government Affairs, Swiggy) and Navjot Kaur (Vice President, Fireside Ventures). The session was moderated by Aparajita Bharti, Founding Partner at TQH.

In this session, we discussed how the proposed data sharing mechanism would affect data-led businesses in India and the Indian innovation ecosystem. One of the key reasons highlighted by the committee for instituting a non-personal data regulatory system is to create a level playing field for Indian startups. In this session, we questioned the key assumption underlying this reasoning i.e. the lack of access to data is the major differentiator between Indian startups and foreign tech companies. We also discussed whether there are other solutions to correct the problems created by the imbalance in access to data among tech companies.

This short note highlights some of the key points made during the discussion. For details on other discussions in the series, please follow the links below:

Defining Non-Personal Data
Is Non-Personal Data a National Resource?

Discussion 2: Impact of the proposed NPD framework on the Indian Innovation Ecosystem

1. Is access to data the key differentiator between Indian start-ups and big tech?

Navjot Kaur from Fireside Ventures acknowledged that data is a differentiator between startups and big tech but was of the opinion that it not a key differentiator. She said data is only one of the tools used by an enterprise. The ability of an enterprise to respond to market demand and to solve problems of the customers are the key aspects that set it apart from other players in the pool.

Aditi from Medianama also concurred with Navjot. She indicated that the key differentiating points that provide a competitive edge to any budding startup over other enterprises include product quality, distribution, marketing infrastructure and lastly, data. She stressed that data sharing is not a one-size-fits-all approach i.e. merely sharing data would not ensure a level playing field. A data produced by one company might not be useful for another company when shared. Sufficient re-working and processing on the received data is required for it to be useful for the recipient entity. Therefore, data cannot be the sole differentiator between two business entities.

Nikhil from Trilegal made a very interesting point where he said that the startup ecosystem in India is marred by a number of issues that hinder the effective growth of small enterprises. Indian startups have to spend considerable amount of time and investment in obtaining clearances, complying with multiple tax laws, setting up telecom connectivity etc. He believes that the Gopalakrishnan Committee Report, instead of acknowledging these deep-rooted problems, looks to shift the blame on to a section of the market, in this case, Big Tech. He acknowledged that there are long-standing problems with Big Tech, however, the report unfortunately does not arm its findings and conclusions with appropriate justifications.

Ganesh from Swiggy responded to this with a very interesting analogy. He agreed that data is not the most important differentiator. He said, “Data is like an engine and when you want to buy a car, you buy the whole car and not just the engine.” He took the example of the state of California in USA, where the GDP is almost equal to the whole of India. California has a bustling start-up culture mainly because the ecosystem is conducive for these enterprises to grow. Therefore, for a thriving market, presence of a healthy ecosystem is equally important than say, sharing of data.

2. What are the legal precedents surrounding the sharing of non-personal data to create a level playing field? Have we seen a proposal like this in any other industry where market power is concentrated?

Nikhil drew a parallel with the telecom industry. He said that in the telecom industry there is some level of voluntary sharing of networks and mandatory interconnections. Mandatory interconnections refer to commercial and technical arrangements under which the telecom service providers (TSP) interconnect their equipment, networks and services so that customers have access to services of other TSPs. He, however, was of the opinion that the comparison between network sharing and data sharing is inaccurate as the sharing of data is much more complex compared to sharing of network or spectrums. Also, data is not a natural or limited resource.

3. If non-personal data is mandated to be shared, what will be the legal constraints on the use of such data? For example: should there be a limitation on the kind of products that can be built with this data to maintain fairplay? What if the entity receiving the data duplicates the product of the entity sharing the data?

Nikhil provides two perspectives here: the IP perspective and the privacy perspective. From an IP perspective, the entity has an IP right over databases or datasets that it creates but not on the raw data, as it can be argued that raw data does not fall within the ambit of IPR. In this respect, the definition of non-personal data is very critical in determining what kind of data can be considered an IP. The definition provided in the Committee report i.e. any data that is not personal is non-personal data, is at best ambiguous. While the Committee has considered this aspect and specified that IP and proprietary information need not be shared, there is already a lack of clarity in the current IPR regime in India on what constitutes as proprietary information. In India, proprietary information is, in any case, not afforded the kind of protection that is provided in more developed economies. These types of information are protected by contractual obligations and confidentiality agreements. In the absence of a robust IP regime in India, the envisioned NPD Authority would have to conduct extensive consultations on a case-by-case basis to determine which non-personal data can be termed as proprietary information. This will lead to inadequate protection of the IP of businesses and create a sense of uncertainty in the regulatory environment.

From the privacy perspective, he said that while non-personal data itself cannot be used to re-identify an individual, in many cases there are layers of non-personal data in a dataset. Such datasets can be processed by AI/ML technologies and can be used to re-identify individuals. Further, if data sharing is mandated, in the absence of a liability framework, the shared data stands a chance of being misused by nefarious actors. This can impact businesses as well as citizens. Therefore, if a legislation has to come in the future, the government needs to deliberate on what kind of protection and safe harbour it can provide to businesses so that there minimal or no chance of misuse of data.

Navjot was of the opinion that in addition to having a legal framework around data sharing provisions the question of incentives for the entity sharing data should also be looked into. She believes that an absence of an incentive-based data sharing mechanism distorts the free market and defeats the purpose of the report. She also presented a hypothetical scenario from an investor’s perspective. Say, that an investor is funding a business that is collecting car data. If under the proposed legislation, the business is mandated to share its data and the competitor uses this data, what is the benefit of actually putting resources behind collecting this data if the company does not get to use it to build a competitive edge?

4. If this proposal comes through, how will we make sure that data is not shared with shell companies who are later taken over by big companies for securing access to valuable data?

Nikhil stressed on the importance of having recommendations in the report on regulating future acquisitions and a framework to prevent misuse of this law. The report envisages an environment where a data business has to give the data to a startup. However, the report does not delve into what it means by a “startup”. There are many million-dollar valuation startups in the ecosystem. Moreover, if the law requires Big Tech companies to give data to such startups and if they later get acquired by competitor Big Tech companies, how will the law ensure that the data is not misused? How will the law ensure that the competitor does not come out with a product that is based on this data? It seems that the report is not really creating a level-playing field, rather it is distorting competitive mechanisms in the market. Nikhil felt that there should be wider consultations on this aspect of the report, and possibly a whitepaper should come out that takes into consideration all these intricacies associated with sharing of data and how it may impact competition.

Aditi deliberated on the definition of a data business. The definition of a data business is dependent on the volume of data processed; it does not have a monetary basis. The question here is how will that play out if the intended legislation is implemented. She took the example of two large companies, Naukri, an Indian job portal and Netflix, an international OTT platform. Naukri processes a lot of personal data as it goes through people’s resumes which may have phone numbers, addresses and other information related to the job that is being applied for. On the other hand, there is Netflix that hosts a lot of content-related data but when it comes to personal data, its portfolio is fairly limited. The repository of personal data may include the financial data and other basic information that is put in when an account is getting created as well as preferences of consumers. She posed a question on what should be the data sharing criteria for both these types of data enterprises. This is something that needs to be considered in the revised report or the legislation that comes out in the future. Secondly, as per the report, data will be assigned value on the basis of the value-add associated with it and the price of data will be determined by the market economy. This could inflate the value of certain datasets because of greater demand, set a dangerous precedent and harm the privacy of people in the longer run. This is something that the Committee should consider in its future deliberations.

Ganesh had a slightly different take on this. He said that every company collects or processes data based on their requirements, the services they offer, or how they want to enter the market. Just making datasets available is unlikely to yield a competitive edge to a player. The method of utilisation of datasets to roll out offerings in the market, the ability to innovate are much more critical factors than just access to raw/processed data.

5. What will be the impact of mandatory data sharing on foreign investment in the Indian market? Will it be healthy for the growth of domestic startups if foreign VCs reduce investments in the Indian market?

Ganesh said that an incentivised data sharing mechanism will fuel innovation and that can lead to a greater number of startups growing to become unicorns.

Navjot said that regardless of the fact that data sharing is mandatory or incentivised, every entity will have access to same type of customer/ intelligence. When its about raw data, how an entity performs depends on its capability and know-how, but if entities have access to a know-how or a process, it might lead to more competition. So, before answering the question on the impact on foreign investment, these critical aspects need to be addressed.

6. Do you think the competition law should be updated instead and that it is the right legislation to look at some of the market power issues identified in the report rather than having a separate legislation for NPD?

Aditi said that it’s the first time an attempt has been made to lay down a framework for regulating non-personal data, although some efforts resembling this have been made in the past. For instance, EU came out with a notification on data strategy in 2020 that talked about data sharing from government to private entities which is similar to the concept of public non-personal data that is envisaged under the report. However, for private-to-private sharing, the notification acknowledges that this falls under the realm of competition law and does not suggest a separate NPD Authority. The Gopalakrishnan report in comparison takes a huge leap of faith with respect to the fact that access to data gives a competitive edge to a business, the argument for which is not well-founded. She took the example of Alphabet Inc which has acquired around 200 companies since its inception. In most of these cases, the underlying technology and know-how of the companies, rather than data were the main USPs which led to them getting acquired by Alphabet. Hence, in the practical market environment and acquisition considerations, access to technology is the main factor that companies look for.

According to Nikhil, the report actually considers the state of dominance rather than the abuse of dominance as an issue. This sentiment is very anti-market. He concurred with Aditi and said that these issues should be best left to the prudent decision-making of the competition regulator (CCI) rather than having a separate legislation. The proposed framework does not create an incentive mechanism for all the players in the market and the model is mostly socialist in nature, where resources are taken from the haves and transferred to the have-nots.

He said that the government should set an example first and develop a use case by publishing government data. The government is the largest data processor in the country and all its institutions sit on large chunks of data. If this data is put out in the public, it will be greatly beneficial for the society. This exercise should be used as a learning experience to understand the legal, privacy and competition roadblocks before mandating data sharing by private entities.

7. Can the format in which non-personal data is shared also reveal information about the algorithm which is at the core of tech businesses?

Aditi said it would not reveal the algorithm but it might reveal the thinking process of the company. In terms of the legality of sharing of data, there is also an interesting contradiction between the NPD report – it says that source codes and algorithms are proprietary data and should not be shared – and the provisions of the latest version of the National Ecommerce Policy, which says that the government can ask for source codes and algorithms from business entities.

Ganesh said that when data is shared under the proposed law, it would be in an anonymized state and the shared datasets would have several attributes to them. Not all attributes are likely to be beneficial or relevant to all the receiving enterprises. While there might be a possibility of de-anonymization and reverse-engineering of data, this may not be worth the time and effort the company would be required to put in.

8. In the US, where most big tech companies are based, there are still a lot of startups which are founded and those that disrupt industries. How are they able to do it without access to non-personal data from big tech?

Ganesh pointed out that there is a lot of investment, funding and time that is given to research and innovation in the USA compared to a country like India. Going ahead, if there are more incentives that are provided to research and educational institutes, either by way of education policies or otherwise, it would have a significant positive effect on India’s innovation ecosystem.

Nikhil had a completely different view on this. He reiterated his previous point that the reason why startups are not flourishing in India is because of the various in-grained issues such as compliance hurdles, weak IPR regime, exit options and finally, the absence of robust listing mechanisms and criteria. These are some of the issues that need to be fixed first for the overall growth of the startup ecosystem.

Navjot, in addition to agreeing to both the views above, added a very unique and interesting perspective on the conversation. She said that the big question to address is if the enterprises have the technological capabilities to process and optimise data that we already possess. According to her, a lot of companies have huge amounts of data but have not been able to adapt the technology to process it fully.

9. Some startups are saying data is not the differentiator between big companies and them, but the ability to vertically integrate is, what is your view on this?

Aditi said that whenever we talk about digital industries, we have to acknowledge the fact that they have USPs that they can shape shift. For instance, an e-commerce platform can venture into fintech or OTT services etc. In that case, vertical integration can pose a challenge to the competitive market. However, she thinks that this is more of a competition question and should be dealt under the Competition Law.

10. If data is considered as a public good should the principle of rights of eminent domain extend over data, like it does with property?

Nikhil was of the view that data as a property is a very questionable proposition. He referred to Hon’ble Member of Parliament, Shashi Tharoor’s stance in the context of the Personal Data Protection Bill, that data should be treated as an asset and a property and every individual should have a right over that. However, the problem with data being treated as a property is that in India people don’t have a fundamental right to property. This right was taken away by the Kesavananda Bharati vs State of Kerala judgement. As per this, property can be taken away by the government after compensating the land holder. From a civil rights perspective and from a business perspective, it does not make sense to categorise a non-exhaustible resource, such as data, as property.

Nikhil acknowledged that the Committee has taken this into view while drafting the report and has delved into the subject of ownership of data.

11. Wouldn’t this create perverse incentives for orgs to remain small, below the threshold of data sharing? Even Indian startups might feel threatened that they may have to share data if they grow too big.

Navjot took cognizance of this concern and added that the concern is well-placed. She said companies are putting in too much resources, time and funding behind collecting and processing datasets using their sophisticated AI/ML technology but if eventually they will have to share this with other players in the market, they lose the competitive edge. Such mechanisms can disincentivize them from growing.

Aditi tackled the more foundational aspect of this question on how data thresholds will be set. She doubted if such a proposal can be operationalized. The report as it stands does not delve into how to categorize business basis the volume of metadata that is stored, how much data has been processed etc.

12. It seems the report is creating a scheme for statutory licensing of privately held data at graded pricing levels. Can such substantive changes be made by executive action and no legislative amendment?

Nikhil opined that the statutory licensing of data will need a new law either by way of the PDP Bill or a new legislation. Currently, as the legal landscape stands, there are no enabling provisions for statutory licensing to be notified through executive action.

The Quantum Hub

Public Policy Research & Advocacy

Unpacking Non-Personal Data: Impact on the Indian Innovation Ecosystem