What is Proxycurl?
Proxycurl is an API to scrape LinkedIn profiles at scale.
Proxycurl’s LinkedIn API can scrape 1 million LinkedIn profiles in a day. Not only does Proxycurl scrape the profiles of people, but it also scrapes profiles of companies and jobs. By allowing quick and efficient extraction of data from LinkedIn profiles, and then processing and sorting it so that it becomes insightful, Proxycurl has huge potential to power your organization with valuable data that you may use in the most relevant manner.
Essentially, the crawling service is perfect for those who care about being cost and time-efficient, as it can circumvent most (if not all) rate-limiting techniques employed by complex websites such as LinkedIn, bypass Recaptchas and Bot Detection and make real-time crawls of up to a whopping 1 million pages a day. However, in line with respecting the privacy settings of LinkedIn profiles, Proxycurl scrapes only public profiles. This means the data that Proxycurl can scrape, are the same data that Google scrapes and show in their search results to users in the EU as well.
Companies worldwide, including those in the European Union (EU), have recognized the value of web scraping. Nonetheless, despite being a common practice among many companies, web scraping continues to raise questions with regards to GDPR compliance. In this article, we seek to reconcile questions about GDPR compliance for EU companies.
What is the General Data Protection Regulation (GDPR)?
The GDPR is the core of Europe’s digital privacy legislation. It revolves around data, privacy, and consent issues, and was set up to make Europe “fit for the digital age”. It aims to give EU citizens more control over their data while simplifying the regulatory environment for data protection so both citizens and businesses can fully benefit from the digital economy.
Under the GDPR, organizations have to ensure that personal data is gathered legally and under strict conditions, and those who collect and manage it are obliged to protect it from misuse and exploitation, as well as to respect the rights of data owners.
Failure to comply with GDPR can result in hefty fines. To date, the largest GDPR fine issued is €50m. The French data protection watchdog, Commission Nationale de l'informatique et des Libertés (CNIL), issued the fine to Google after concluding that the search engine giant was breaking GDPR rules around transparency and having a valid legal basis when processing people's data for advertising purposes. Google is appealing the fine.
Who does the GDPR apply to?
GDPR establishes one law across the continent and a single set of rules which apply to companies doing business within EU member states. Hence, not only does the legislation apply to organisations in all member states in Europe, international organisations based outside the region but with activity on 'European soil' will still need to comply. This essentially means that almost every major corporation in the world needs a GDPR compliance strategy.
What is personal data under the GDPR?
Personal data under the GDPR includes any information related to an identified or identifiable natural person. This includes data such as names, addresses, and photos.
GDPR extends the definition of personal data such that IP addresses can fall under this umbrella. Sensitive personal data such as genetic and biometric data, which can be processed to uniquely identify an individual, could also constitute personal data.
Is it legal to scrape people’s profiles under the GDPR?
Under the GDPR, it is difficult to justify scraping personal data from a website without the consent of the data owner (the person’s whose data is being scraped), unless you can argue you have one of the below mentioned lawful reasons:
- Consent – The data subject consents to you having their data
- Contract – the personal data is required for the performance of a contract with the data subject
- Compliance – necessary for compliance with a legal obligation Vital Interest, Public Interest, or Official Authority – Note: This is typically only applicable for state-run bodies where access to personal data is in the public’s interest
- Legitimate Interests – necessary for your legitimate interests. The key elements of the legitimate interests provision can be broken down into a three-part test (Purpose test, Necessity test, and Balancing test) under Article 6(1)(f) of the GDPR.
Hence, it seems in most cases, web scraping could be deemed illegal without prior consent.
As such, we shall refer to legal precedence and examine the case of Bisnode, a Sweden-headquartered European digital marketing company in Europe, to draw further insights regarding the legal position on the matter.
Legal precedence with GDPR cases and using publicly scraped data
In examining the court’s position on web scraping and GDPR compliance, we look to the legal position taken in the case of Bisnode.
Poland’s national Personal Data Protection Office (UODO) issued its first fine under Europe’s GDPR to Bisnode, which has an office in Poland after the company failed to comply with data subject rights obligations set out in Article 14 of the GDPR.
Bisnode had obtained a variety of personal data from public registers and other public databases of millions of entrepreneurs and business owners — including their names, national ID numbers, and legal events related to their business activity. However, some of these contact data Bisnode had scraped did not appear to have been “public data”. UODO maintained that Bisnode knew about its obligations under Article 14. Yet, Bisnode made a conscious decision not to directly inform the majority of people whose personal data it had obtained for business purposes on “cost grounds” alone — when it should have accounted for its legal obligations related to data acquisition as a core component of business costs.
The UODO decision required Bisnode to contact the six million people it had not already reached out to, to fulfill its Article 14 information notification obligation, giving the company three months to comply. From the Bisnode case, it is clear the GDPR has greater strength in enforcing data protection compared to other top-line fines, and GDPR’s accompanying orders could rearrange business practices.
Given the contemporary nature of web scraping, there has yet to be a clear legal consensus on the matter. Despite the increasingly favorable legal position for web scrapers, there remains an element of unpredictability regarding the legal protection available to them. Depending on how Bisnode’s appeal pans out, there could be big and potentially costly implications for data scrapers across various industries. However, each case might be different and have its specifics. There is certainly no guarantee that UODO’s decision will lead to a de facto ban on covert commercial data scraping. The escalation of the Bisnode case Europe’s top court (Court of Justice of the European Union) would help clarify the legal position regarding publicly scraped data.
What can you legally do with scraped profile data?
In our opinion, there are a few things you can do with scraped LinkedIn data while staying GDPR compliant.
You can
- Analyze and deliver results based on meta (profile) data
- Seek permission and use a LinkedIn profile directly
- Work with company and jobs data
Analyze and deliver results based on meta (profile) data
Google scrapes profile data and has these profiles stored as-is. However, they do not act on these data and only use profile data to deliver search results.
As a recruitment company, you can provide a search result of links to these LinkedIn profiles. What you cannot do is show the profile data within your application. In other words, your application should be a referrer and not a repository of data.
Seek permission and use a LinkedIn profile directly
Alternatively, you can seek permission from the user! For example, a user can be asked to provide his LinkedIn Profile URL for which you can ask for his/her permission explicitly to use data from the LinkedIn profile.
Work with company and jobs data
There are company and jobs data on LinkedIn as well. These data are not personal data and you have free rein to do whatever you want with them.
In summary, you can stay GDPR compliant as long as you stay creative and respect personal privacy.
This article is not legal advice. Your company should seek your legal counsel.