Tutorial: How to crawl millions of LinkedIn private profiles
LinkedIn sued Mantheos, a real-time LinkedIn Scraping API service, on 1st February 2022 for fraudulently scraping millions of profiles. In a way, Mantheos is a competitor of ours, and I have always been curious about how they can scrape private LinkedIn profiles at scale. Finally, I understood how they do it thanks to the legal docket of LinkedIn's lawsuit against Mantheos and their founders.
I am sure I am not the only one curious, so I decided to write this article to tell you how to crawl millions of private LinkedIn profiles the way Mantheos did (or is still doing.).
To scrape million of private LinkedIn profiles like Mantheos, you will need:
- Residential IP proxies. There are plenty of services for which you can procure such proxies.
- Empty debit cards with privacy features or fake names (According to LinkedIn, Mantheos founders used many debit cards with fake names)
Once you have the above:
- Shuffle through the residential IP addresses and create fake LinkedIn accounts. One account per residential IP. Remember to stick to the same corresponding IP for each new account created for the following steps.
- For each new LinkedIn account, sign up for a trial for a LinkedIn Sales Navigator trial with the blank debit card.
- Use LinkedIn's internal API to fetch private LinkedIn profile data. You can use a library like this which has reverse-engineered LinkedIn's internal API.
- Rotate between the new accounts and scrape as many profiles as you can. When accounts get banned, rinse and repeat steps 1-4.
- Profit (?)
Why use a private debit card (with fake names?)
LinkedIn accounts upgraded with a premium sales navigator come with a higher threshold of API rate limits before the account gets blocked/banned.
Why do you need residential IP proxies?
LinkedIn uses IP addresses as a fingerprint to identify an account to be a unique person.
What is next?
Wait for LinkedIn to sue you. But in all seriousness, don't do this. Scraping private LinkedIn profiles with such a fraudulent method violates all kinds of laws and agreements. The data would not be GDPR or CCPA compliant.
Use Proxycurl instead. We scrape public LinkedIn profiles only, complying strictly with PDPA and CCPA regulations. Besides, you get so much more than just LinkedIn profile data. With Proxycurl, you get funding data of companies, contact info of individuals, etc.
Switch to Proxycurl today, and don't get yourself sued by LinkedIn. Instead, send us an email at hello@nubela.co, and we will be happy to match your package with Mantheos.