Please note that this article discusses the legal case of hiQ v. LinkedIn regarding web scraping. However, it does not constitute legal advice and the opinions expressed are solely those of the author. For advice tailored to your specific project, country, or application, please consult with your local legal counsel.
You are probably trying to scrape LinkedIn. You might have found the ways of scraping LinkedIn, and you might be concerned about their legality. In this article we aim to debunk some myths and affirm some facts, all while looking at a highly-watched case amongst the web scraping enthusiasts and tech companies.
TLDR - is scraping LinkedIn legal or not?
The short answer is: it is legal to scrape LinkedIn properly. However, scraping LinkedIn a certain way could be illegal.
- Deal with public LinkedIn profiles only?
- Scrape LinkedIn profiles with fake logged-in accounts?
Let's start with some definitions.
Definition of web scraping & its legality
Web scraping is the act of extracting desired information and data from websites in order to make data mining more efficient and systematic. A web scraper is a bot that automatically and systematically goes through a target website and extracts specific information from it. This dataset can then be used for a variety of applications such as to automate hiring, leads generation, investment research or sales and marketing automation.
Why is web scraping a heated topic between companies and scrapers?
As standalone entities, web scrapers and crawlers are not illegal. You could scrape your own website without any repercussions whatsoever. However, lines are blurred should you choose to scrape another’s website, without their explicit permission or in disregard of their Terms of Service (ToS). This is where things become a little tricky. Freely extracting data from another site could be argued as trespassing or theft.
For web scraping companies, web scraping is seen as integral for their survival; their business model revolves around taking data and processing them for their clients' needs. But some website owners (or companies being scraped) find web scraping ultimately harmful to their business. It could be because scrapers are infringing on copyrights and trademarks, or because they slow down the servers, negatively impacting their revenue streams.
The case that cemented the arguments around web scraping - hiQ v. LinkedIn
A timeline of events
It all started in 2017 (officially) when LinkedIn sought legal intervention to stop hiQ from scraping data from their platform, and sent hiQ a cease-and-desist letter.
Why did we say (officially) though? Because, as we’ll cover below as we talk deeper about the case, some facts came to light as far back as 2014 during this long-drawn court case that suggest LinkedIn already knew but didn’t take action against hiQ. Take note of this event as it is a key fact that steered the decision of the court later.
Continuing the story, upon receipt of the cease-and-desist letter, hiQ sought legal injunction against LinkedIn - ELI5: they legally prevented LinkedIn from stopping their scraping activities - and hiQ won. This was a big win for web scrapers. The case so far took place only at the district court (lower court), and would come to involve more courts in the years to come.
Fast forward 2 years later to 2019, LinkedIn appealed the decision with the Ninth Court - a court higher than the district court - and LinkedIn got rejected. Another huge win for web scrapers. This essentially meant that in the eyes of the court it is legal to scrape data and utilize the publicly available data.
As simplistic as the timeline sounds so far, it is. In between the major events, LinkedIn continued to dig for evidence. And then they made a move again in 2021.
In 2021, the Supreme Court (the highest court in the US) granted LinkedIn’s petition to overturn the Ninth Circuit's ruling, repealed the previous decisions made by the lower courts, and passed the case back to the Ninth Court.
In April 2022, interestingly the Ninth Court reaffirmed the original injunction made by hiQ in 2017 that prevented LinkedIn from stopping hiQ's scraping activities, despite the Supreme Court's overturn. It went back to square one.
However in a flurry of events the next few months, in August 2022 the California district court withdrew the injunction.
And in November 2022, the Court struck the gavel and gave their verdict: hiQ breached LinkedIn’s User Agreement on data scraping and through the creation of fake accounts obtained users data - see LinkedIn official announcement here.
In December 2022, both parties agreed to a private settlement after having their motions for summary judgement denied (we’ll come to this in a bit), without going to a full trial.
What can we glean from the Court's decision?
Back to what we mentioned at the start, about the "official" start of things: The Court discovered evidence through multiple LinkedIn employees’ emails that they were aware of hiQ's scraping activities back in 2014. But no action was taken by LinkedIn until as long as 3 years later in 2017. This is an important fact that favoured hiQ greatly throughout the case. The emails even showed that the employees musing about hiQ's scraping of LinkedIn data, and LinkedIn even sent some employees to hiQ business conferences.
In the eyes of the law, this meant that LinkedIn gave up its right to enforce its rule against hiQ all this while, and LinkedIn's lack of actions gave hiQ the impression that they could continue to scrape data from its website.
The Computer Fraud and Abuse Act (CFAA) is a key statute often referenced in the case that addresses hacking and accessing of computer "without authorization," which LinkedIn often cited against hiQ. However, in the concluding light of the whole case, the Court did not explicitly state that web scraping of publicly available data is prohibited under CFAA. Rather, it was hiQ’s continued scraping of LinkedIn data even after LinkedIn’s cease-and-desist letter that constituted access of data “without authorization”.
Essentially, web scraping is legal.
Another key insight we can glean concerning whether web scraping is legal or not is from Nov/Dec 2022 when the California district court refused to grant both parties a motion for summary judgement. A motion of summary judgement is a request by parties to ask the court to make a decision on the case without going to a full trial, which will mean a much longer and more expensive legal dispute. The fact that the Court refused to grant the motion for summary judgement in this case meant that both hiQ and LinkedIn did not have a very strong case. In the interest of this article, the case of LinkedIn against web scraping wasn’t strong.
However a very clear evidence against hiQ was their creation of fake accounts to access otherwise-inaccessible users' data. This, together with hiQ continual scraping despite LinkedIn’s revocation towards their access were the key contributors to the Court’s verdict, not the nature of web scraping itself. Clearly fraudulent activities are a big no-no in web scraping. You will likely lose out just like Mantheos. They not only created fake accounts, but used fake debit cards to subscribe to LinkedIn Sales Navigator.
Another legal battle on web scraping - Facebook v. Power Ventures
Before hiQ v. LinkedIn, there was another high-profile case between Facebook and Power Ventures. Kicked off in 2008, the Facebook v. Power Ventures lawsuit nearly mirrors LinkedIn v. hiQ. Both narratives involved tech titans litigating smaller players for web scraping, citing a violation of the CFAA, eventually ending up at the Ninth Circuit Court of Appeals. The two cases share several other similarities, albeit polar opposite outcomes.
Power Ventures served as an “all your friends in one place” platform, allowing users to access their accounts such as AOL, Facebook, LinkedIn, Twitter, Myspace, etc. from a single location. Part of the lawsuit’s focus was on Power Ventures’s scraping of content from Facebook users and into the Power Ventures interface. While Facebook did not own the rights to its users’ profile data, it did own copyright claims to the arrangement and creative design of the website. According to Facebook, Power Venture’s scrapers operated in a manner that involved wholly copying the entire site in order to extract user data.
In February 2012, the district court for the Northern District of California found Power Ventures guilty of making unauthorized copies of Facebook’s site, amongst other allegations.
In 2016, the Ninth Circuit held that Power Ventures violated the CFAA, on the grounds of failing to respect Facebook’s cease-and-desist letter and explicit request to revoke Power Ventures’s access to its system. This was similar to what happened in hiQ v. LinkedIn.
Here, web scraping isn't explicitly illegal either, but the use of Facebook's copyrighted interface and continual access to data despite Facebook's revocation is illegal.
Since the case and the Ninth Circuit’s arguably poorly reasoned ruling, there has been a large influx of attempts to use the CFAA to threaten competitors. The most prominent example is the hiQ v. LinkedIn case that we've looked at.
Common arguments by web scrapers
“It’s public data anyway.”
Despite the data being public, the "creative arrangement" of data can be copyrighted.
“Facts cannot be copyrighted. However, the creative selection, coordination, and arrangement of information and materials forming a database or compilation may be protected by copyright. Note, however, that the copyright protection only extends to the creative aspect, not to the facts contained in the database or compilation.” - cendi.gov
Hence, it is important to be mindful of how data is scraped. Copying a website wholesale could be considered a copyright violation. In the US, copyrighted material is protected by the Digital Millenium Copyright Act (DMCA).
“I have no plans to publish, distribute or sell this data. It’s strictly for my personal use”
You’d still have to be compliant with websites’ ToS. For example, Facebook’s policy.
“This isn't any different from manually collecting data anyway.”
Well... You’re not wrong but other players might not be pleased if you are, for example, causing a strain on their bandwidth or taking a slice of their revenue pie. Regardless, anything that leads to a loss for them could result in a lawsuit for you, so tread with caution.
“Isn’t Google a crawler too?
Over the decades and a first-mover advantage, Google has positioned its reputation as one of the titans of industry. It has amassed deep enough pockets to face the financial repercussions of web crawling. Essentially, Google is large enough to deal with the legal system.
What now? Can I scrape LinkedIn?
Having read all that, it is clear that scraping LinkedIn, despite its treasure trove of data, is extremely difficult and pose serious legal repercussions if not done cautiously.
Here’s some general advice for you on how to proceed with caution:
- When in doubt, consult your lawyer before proceeding.
- Use an API instead if one is available (Proxycurl APIs are fully legal, fully compliant to major data privacy regulations such as GDPR, CCPA, and SOC 2).
- Asking for permission is probably a good idea.
- Respect the website’s Terms of Service.
- Respect their
Web scraping is legal, if you do it right
In conclusion, web scraping is legal and in no violation of CFAA as it is. Issues only arise when one "trespasses" onto another’s domain to extract data without permission and when the destination being scraped takes actions against you. While there is an increasingly favorable legal position for web scrapers, it's important to note that these types of cases are relatively new, so there is still a significant amount of unpredictability.
If you are looking to source data from LinkedIn quickly and at scale, it's better to use a tool like the Proxycurl API. This way you'll be well-protected against legal action from LinkedIn, like what happened to hiQ and Mantheos, while leaving the technical difficulties and legality of scraping to us, and you building your applications.