Scraping LinkedIn at scale is a hard problem
You are probably reading this article because you are seeking a solution to fetch fresh LinkedIn profile data at scale. In this article, I will introduce and provide a cursory review of the top 5 LinkedIn scraping API services for ease of your comparison. For each LinkedIn Scraping Service, I will also link you to a detailed review of the service.
Scraping LinkedIn at scale is hard. And for most people, you do not want to do it yourself.
For one, you might want to outsource the risks of scraping LinkedIn profiles. For example, we know that with the HiQ Labs VS LinkedIn Corp case, scraping publicly accessible LinkedIn profiles is considered legal. But from the Mantheos Pte Ltd VS LinkedIn Corp case, we learnt that scraping private LinkedIn profiles would put you in an unsafe situation.
Another reason you might be seeking a LinkedIn scraping service is that scraping LinkedIn is hard, and you want the data fast. In fact, for most people, you do not want to be building a LinkedIn scraper unless your time is cheap. There are two primary ways. First, there is an illegal way of scraping LinkedIn profiles with a LinkedIn account. Ignoring that it is not legal, you must bypass Recaptchas, implement a daily rate limit, parse raw API data, etc. Then there are publicly accessible LinkedIn profiles which are shown to search engines for indexing. First, you must figure out a method to fetch a publicly accessible LinkedIn profile consistently without being stopped by a LinkedIn authwall. Then you perform tests to verify that the method scales beyond ten profiles within a reasonable time. After that, you have to parse the data yourself.
So, it's not easy. And here's the best part. LinkedIn is a moving target. That means whatever scraper that you have built might not work tomorrow. So, do you want to maintain a LinkedIn scraper in perpetuity? Probably not.
So, here are the top 5 LinkedIn Scraping API services that I will be reviewing today, and hopefully, one of them might be useful to you:
How I will be reviewing the Top 5 LinkedIn Scraping API Services
I will be judging these services by the following attributes:
- The freshness of data is self-explanatory.
- The richness of data refers to the vastness of data points that will be returned to you.
- Scalability refers to how many profiles you can scrape, and how fast it takes.
- Pricing refers to the cost per profile.
- Developer Friendliness refers to the ease a software engineer can use the service.
- Stability refers to how stable the service has been through the years.
- Compliance refers to the legal work done to ensure that the data you fetch from the service is legally compliant.
These are the findings:
|Freshness of data||⭐⭐⭐⭐⭐||⭐⭐⭐⭐||⭐⭐||⭐⭐||⭐⭐⭐⭐⭐|
|Richness of data||⭐⭐||⭐⭐⭐⭐||⭐⭐⭐⭐||⭐⭐⭐⭐⭐||⭐⭐⭐⭐|
Bright Data's core business is in renting residential IP addresses. Bright Data figured they could charge more if they built a web crawler service on top of their network of residential IP addresses. And that is precisely what they did with their Data Collector product which features a LinkedIn Scraper. Bright Data's LinkedIn Scraper is priced at $0.05 / LinkedIn profile. Bright Data requires a subscription plan.
- Bright Data has a community-maintained scraper. That means you do not have to worry about parsing HTML data. And you can get structured data from LinkedIn profiles directly.
- The prices are reasonable.
- The company has been around for years and is legally compliant.
- Bright Data has a community-maintained scraper. Because the parsing code relies on the goodwill of community contributions, if LinkedIn changes anything on their site, customers will need to await the community to keep the scraper up to date.
- Has a 80% success rate per API call.
For a deeper dive into BrightData, see our in-depth review on Bright Data's LinkedIn Profile Scraper.
Proxycurl provides an enrichment API for people and companies. Their API is well-maintained with their comprehensive API documentation.
- Manages multiple methods to scrape LinkedIn profiles.
- Has been around since 2014.
- Enriches LinkedIn profiles with data from external sources.
- Fair cost of $0.01 / profile.
- Rate limit of 300 requests/minute.
- Caches profiles for up to 29 days.
Proxycurl is very much focused on powering products centred around fresh data on people and businesses. Given that it works with large Enterprise customers, It understands the need to stay compliant and stable. Most of the work under the hood is in keeping the service stable as it seeks to be a plumbing infrastructure for companies to fetch fresh data without building a dedicated scraping team.
Proxycurl API is fast and reliable, and well-documented. You will come to Proxycurl for its LinkedIn scraping service but end up staying for every other API endpoint, such as its Work Email Lookup Endpoint which will fetch verified work email profiles from any LinkedIn profiles.
CoreSignal's core product is its bulk datasets.
CoreSignal's real-time API does not scrape LinkedIn profiles in real-time. What it does is that their API service reads their database and returns cached profiles.
- Has a dataset that is bigger and richer (For example, CoreSignal has 91M Company Profiles VS LinkedIn's 17M Company Profiles)
- Does not scrape LinkedIn profiles live. Real-time API reads their database.
- Stale data (3-4 months old)
For a deeper dive, see our in-depth review on CoreSignal.
Let's get it out of the way. PeopleDataLabs (PDL) does not scrape LinkedIn profiles. They also do not offer a service that scrapes LinkedIn profiles. PeopleDataLabs is not a LinkedIn scraping service. Their core business lies in buying and combining a myriad of datasets from data vendors, packaging them up in an API and upselling the data.
- Enrich profiles with extra data points such as personal emails and phone numbers
- Has search functionality via API
- Serves stale profile data
For a deeper dive, see our in-depth review on PeopleDataLabs.
Phantombuster is a scraper tool that sits on top of your LinkedIn account to scrape LinkedIn profiles. In other words, to use PhantomScraper, they will need you to provide LinkedIn credentials (in the form of a cookie), so they can automate scraping on your account.
Phantombuster works for small-scale LinkedIn scraping for personal use-cases. It works for an account executive, and you are looking to automate your LinkedIn workflow by scraping your connections for a cold email campaign. Still, your account might get blocked because there is a maximum limit to the number of profiles that can be viewed daily.
- Scrapes Private LinkedIn profile (if you do not care about the legal liabilities)
- Has UI to read-from/write-to spreadsheets
- 100% API success rate
- Bring your own LinkedIn account
- At serious legal risk. See Mantheos's lawsuit
- 7 hours for 1000 profiles
For a deeper dive, see our in-depth review on PhantomBuster.