Since we launched LinkDB, we have received a barrage of requests for a company profile dataset. We understand that pairing people with companies will help our customers understand questions like:
- How many employees does a company have?
- What is the makeup of roles in a company?
It is only natural we made crawling company profiles exhaustively a priority. But why work if you can get anything for free? So let's talk about the elephant in the room.
PeopleDataLabs (PDL) offers a "free company dataset."
It is true, PDL offers a free company dataset, and I was curious:
- Why is PDL offering this dataset for free?
- How many companies do they have?
- What fields does this dataset have?
- Is this dataset any good?
I put my spy hat on, went over to their website, and gave away personal information, including my phone number, and received this email shortly.
How many companies does the free Company dataset have?
I picked the CSV dataset dump in the email, and a file named
free_company_dataset.csv.zip began to download. I unpacked it, and I ran the following wc command to find out how many lines of companies there are:
There we have it. PDL's company profile dataset has 12.25M company profiles.
Next, I wanted to find out what fields this dataset have:
The column labels of the CSV file are:
Not bad. It does have the most important fields, except the timestamp for the last point of update.
How old is PDL's company profile dataset?
profiles, and make statistical inferences.
I extracted the first 999 companies from the dataset, and threw it into a Bulk Linkedin Company scraping script that I opened-sourced here. This script uses Proxycurl's Linkedin Company Profile API endpoint to scrape and enrich a Linkedin Company Profile URL if it is valid.
Out of 999 companies, there were only results for 835 companies.
16.4%, or 164 out of 999 companies provided in the dataset, are not valid on Linkedin.
Extrapolating that, 2,010,382 companies are dead in free PDL's company dataset.
I conclude that this dataset is super old.
Why is PDL offering you an outdated Company Profile Dataset for free?
Because you are an ideal customer interested in big datasets, they can collect personal and contact information about you to further upsell you.
Our turn - 17M companies in Proxycurl's LinkDB, our profile database
What about our dataset?
In January, we commissioned a crawl of all public Linkedin company profiles. I am happy to share that we have 17+M company profiles available now in LinkDB. Proxycurl's Linkedin Company Profile API endpoint was employed to accomplish this feat.
These company profiles were updated just a few days ago and are up-to-date at the point of writing. And they will stay up to date because we will not stop refreshing them.
Fields in Proxycurl's Company Profile Dataset
The following fields represent companies in our dataset:
Yes, our dataset has a lot more fields.
In summary: Proxycurl VS PeopleDataLabs - Company Profile Dataset
|Proxycurl Company Profile Dataset||PDL Company Profile Dataset|
|17M profiles||12.25M profiles|
|Last updated on 25th January 2021||Last updated many years ago|
|Standard fields +
|0% DEAD profiles||16.4% DEAD profiles|
|Monthly data updates||No updates|
Proxycurl's Global Company Profile Dataset is available now.
- Please don't take my word for it. Try it yourself. If you register and log into Proxycurl, you will access LinkDB, our PostgreSQL server, which contains the Proxycurl's Global Company dataset. Make a few queries and sample the data for yourself :)
- Yes, we do sell a snapshot of our global company dataset. Keen? Please send me an email to [email protected].