NinjaPear API V2 - Now 400% faster Learn more

LinkDB: A Comprehensive Professional Social Network Dataset of Over 485 Million People and Company Profiles
proxycurl

LinkDB: A Comprehensive Professional Social Network Dataset of Over 485 Million People and Company Profiles

Update, 2026: Proxycurl and LinkDB have since been sunset. I am the founder behind Proxycurl, and I now work on NinjaPear, where we are building a new generation of ethically sourced B2B data products that do not rely on LinkedIn scraping. I am preserving this article because the underlying technical ideas, use cases, and query patterns are still useful if you are studying how large public-profile datasets were structured and used.

Don't let the naysayers get to you and tell you this is the era of GPT3 and Cryptocurrency. I will argue that we are in a new age of big data-driven companies. For example, why wait for the right candidate to apply for a role in a company when you can search through the entire LinkedIn profile dataset to source the "perfect candidates"? Big data is similarly embroiled in seeking investment alpha, training an AI model, or automating a sales prospecting process. We have the perfect datasets for some of such big-data needs.

LinkedIn is a wildly popular business platform used by many companies for various use cases. However, conducting menial searches is a tedious and repetitive task that is not cost-effective. While large companies with thousands of employees may be able to cope with this tedious task, this may not be the case for your business if you are a mid-sized company or a startup.

A Postgresql Database With 401M+ People & Company Profiles

LinkDB was an exhaustive dataset of publicly accessible LinkedIn members and companies. The database contained the profiles of more than 401+M people and companies separated by regions globally. The regions included countries like the United States, Canada, the United Kingdom, Israel, Singapore, Australia, New Zealand, and Ireland. We had snapshots of lots of other countries too.

Google and Bing scrape public LinkedIn profiles and index publicly accessible LinkedIn profiles as part of their search results. In the same way, you could have the power of the LinkedIn dataset powering your product. Read further to know all the details of our LinkDB database product.

The LinkDB Dataset

To provide flexibility in terms of usage and size, the LinkDB dataset had been separated into regions. We had various classes of data snapshots. These were:

  1. People profiles, 170+MM profiles of people based in the US
  2. People profiles, 8.7M profiles of people based in Canada
  3. People profiles, 876000+ profiles of people based in Israel
  4. People profiles, 15+M profiles of people based in the UK
  5. People profiles, 1.7+M profiles of people based in Singapore
  6. People profiles, ~6M profiles of people based in Australia
  7. People profiles, ~1.5M profiles of people based in Ireland
  8. People profiles, ~1.5M profiles of people based in New Zealand
  9. People profiles, ~9.2M profiles of people based in Germany
  10. Company profiles, 17+M profiles of global companies
  11. Other than the countries listed above, we also had a growing list of datasets of other countries. Email us at [email protected] for a breakdown of our profiles by countries.

This is how a Person Profile looked like from within LinkDB:

{
  // elided for brevity
  ...,
  "first_name": "Jeff",
  "last_name": "B.",
  ...,
  "experiences": [
    {
      "starts_at": {"day": 1, "month": 8, "year": 2018},
      "ends_at": null,
      "company": "Illinois Mutual",
      ...
    },
    ...
  ],
  "accomplishment_courses": [
    ...
  ],
  "accomplishment_honors_awards": [
    ...
  ],
  "accomplishment_organisations": [
    ...
  ],
  "accomplishment_patents": [
    ...
  ],
  ...
  // elided for brevity
}

You could find the entire schema of people and companies profiles in our API docs under Person Profile Endpoint and Company Profile Endpoint.

Who Is LinkDB For?

Proxycurl LinkDB database was a high-quality database, thus we also sought customers that were serious about getting a quality database for their business’ needs. To prevent unnecessary and damaging reselling of our database, we proactively worked only with customers that were, some examples:

  • Companies that had received Venture Capital of at least USD1M
  • Companies with more than 50 US/EU employees
  • Located in a country with a strong rule of law
  • Case by case basis

With millions of profiles accessible immediately at your fingertips, there were tons of use cases and industries that could benefit from this comprehensive LinkedIn database of LinkDB. Be it if you just started building a company or startup and needed to kickstart your growth immediately, and a mature company that needed to revive and rejuvenate your operations, here are some use cases:

3 types of use cases for Proxycurl LinkDB Database

Sales & Marketing: Scale Prospecting & Lead Generation

With millions of profiles available through LinkDB, you could kickstart your outreach immediately and not waste a single second scrambling to find leads, and gain a headstart straight away.

Apart from direct outreach, these profiles proved to be a valuable advertising boost to be fed into platforms such as Facebook & LinkedIn to scale your advertising beyond these profiles.

Recruitment: Find The Best Candidates For Hiring

Find the perfect candidates from a complete dataset of LinkedIn profiles of people within a region of interest so that your HR department or business could spend less time and money sourcing potential employees. With our database of millions of profiles, you could filter for candidates down to the very last details.

Investing: Alternative Data For Investments & Venture Capital Firms

Use data metrics, such as how fast an employee count of a company is growing, to make investment decisions to gain significant alpha against your peers. With LinkDB, you could have data on past and present employees of companies, and tons of alternative data could be gleaned from here, from tracking employee count to determine growth rate and prospects of a company, to the future condition of the economy.

For VC, easily track stealth startups to spot the next big thing.

And more. There were all kinds of businesses and industries that used our LinkedIn profiles database, try them out for yourself to see if it suited you.

Sneak Peek: Sample Search Queries on LinkDB Database

To use LinkDB, you required a basic knowledge of SQL, JSON query functions and operators, and SQL/JSON path expressions. Let us start with a basic example.

SELECT
  id,
  parsed_data->>'first_name' AS first_name,
  parsed_data->>'last_name' AS last_name
  FROM profile
  LIMIT 10;

Find Me All Software Engineers In San Francisco

To find all software engineers in the database, you would write:

SELECT
    id,
    parsed_data->>'first_name' AS first_name,
    parsed_data->>'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'Software Engineer')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "title": "Software Engineer", "location": "San Francisco"}]}';

The above query uses text search and JSON path syntax. Using the more SQL-like pattern, you can rewrite the above query like so:

SELECT
  id,
  parsed_data->>'first_name' AS first_name,
  parsed_data->>'last_name' AS last_name,
  jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM PROFILE
WHERE EXISTS
    (SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE
       (exp->>'title' ilike '%Software Engineer%') AND
       (exp->>'location' ilike '%San Francisco%') AND
       exp->>'ends_at' is null
    );

Find Me All Apple's Employees

To retrieve all employees of the company Apple, you write the following SQL query:

SELECT
    id,
    parsed_data->>'first_name' AS first_name,
    parsed_data->'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.company_linkedin_profile_url == "https://www.linkedin.com/company/apple/").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'https://www.linkedin.com/company/apple/')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "company_linkedin_profile_url": "https://www.linkedin.com/company/apple/"}]}'

OR

SELECT
    id,
    ... --same as above
FROM profile
WHERE EXISTS
(SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE (exp->>'company_linkedin_profile_url' ilike '%linkedin.com/company/apple%') AND exp->>'ends_at' is null
    )

See the few other code samples from our LinkDB product page, including finding Stanford Computer Science graduates.

In-Depth Features Of Proxycurl LinkDB LinkedIn Database

6 features of Proxycurl LinkDB database

1. Data Freshness

Apart from our LinkDB LinkedIn database product, our other key products were APIs to power our customers’ businesses and applications. Both products were closely linked in terms of data freshness, being: each time a real-time API was made to scrape a LinkedIn profile, we updated the LinkDB dataset with the results of the API call. This meant you wouldn't have to worry about stale data for your purchase because this method kept data on LinkDB constantly updated to the count of up to millions of profiles a day.

Upon purchase, you could opt to keep your LinkDB database updated with optional quarterly updates for a nominal fee. You could choose not to as well and the database was all yours.

2. Data Schema

The LinkDB LinkedIn database that had millions of people and company profiles, had tons of data points for each individual person and company. These were some data points available in the database:

• People Profiles

first and last name, profile picture, personal phone numbers, personal emails, work emails, occupation, work industry, Github, Facebook & Twitter profile IDs, LinkedIn profile headline & summary, country, city & state, work experience, education, languages, organizations, publications, honors & awards, patents, courses completed, projects, test scores, volunteer work, certifications, connections, activities, articles, groups, skills, inferred salary, gender, birth date, interests, related LinkedIn profiles, LinkedIn recommendations, similarly named LinkedIn profiles, etc.

• Company Profiles

company name, description, website, tagline, profile picture, industry, categories, size, HQ & other locations, including country, city, state, street and postal code, company email, company phone number, Facebook & Twitter ID, company type, founded year & date, specialties, stock symbol, IPO status, Crunchbase rank, acquisitions & exits, funding rounds, funding amount, investors count, company LinkedIn updates, LinkedIn profile follower count, similar companies, etc.

3. File Format

We shipped the data in Parquet file format. Apache Parquet, a column-oriented data file format, is designed for efficient data storage and retrieval and provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. This meant you got to save space on cloud storage while gaining increased data throughput and performance. You could get 20,000 random profiles of people in the US in Parquet format to try out and see if it suited your workflow.

The compliance of handling personal data is a crucial aspect to consider when conducting data sourcing efforts. Any mistakes can result in severe consequences, potentially damaging your company, case in point LinkedIn is not shy in pursuing legal suits when a company is threatening the data of LinkedIn members, but we know it's more about their business and revenue being threatened.

Proxycurl ensured compliance with major legal standards, such as CCPA and GDPR, and was in the process of becoming SOC 2 certified. Our strict policies secured users' data and allowed you to concentrate on expanding your business while we took care of your data needs.

The good news is, despite the various legal suits taken by LinkedIn against companies, there had been strong legal precedents, even from the Supreme Court track of cases, backing companies that obtained data from LinkedIn’s public site.

5. LinkedIn Public, Not Private, Profiles Data

Proxycurl LinkDB database had only data of public LinkedIn profiles, we did not scrape private profiles. Scraping data from LinkedIn public profiles can be challenging, but attempting to scrape private profiles is not only a huge undertaking, it also poses significant legal risks. As the name suggests, private profiles are meant to be kept confidential, and are only visible to other LinkedIn members who have a connection with the profile owner. Scraping private profiles illegally or inappropriately can impact LinkedIn's revenue, and they are likely to take action to protect their interests.

6. Pricing

Our prices were one of the most competitive ones out there, or at least what our customers told us. Our LinkDB LinkedIn datasets could be segmented by country level for person profiles, for company profiles there was only a global database, with transparent pricing. For pricing of datasets by the country level, reach out to us at [email protected] and we'll provide the pricing of datasets that you're looking for. We also offered multiple financing options, enabling you to pay in monthly or quarterly installments.

Besides the pricing all laid out, you could even test the quality of our database for purchase consideration. You could get some sample data with people and company profiles of various regions globally.

Get Real Profiles With Proxycurl LinkDB Database Today

That’s the full walkthrough of our LinkDB LinkedIn Database for you. On its own you could already achieve a lot, or you could pair it up with our other API products to return even more structured data or to automate your applications.

We had provided some sample data of 10,000 profiles each of various regions if you wanted to experiment with the dataset in your local development environment. You could load the sample dataset into a relational database and check out the sample queries above for guidance.

Your business or applications can benefit immensely from the right dataset. LinkedIn is an excellent source of data on potential customers, companies, investments, and trends.

If you are here because you used Proxycurl or LinkDB in the past, the next chapter is NinjaPear. It is a different product direction, and intentionally so. We are building ethically sourced B2B intelligence products that do not depend on LinkedIn scraping at all. If that is more aligned with where you want the market to go, go take a look.

There are much more to share regarding what LinkDB was, and I’m sure you have many questions too. Feel free to reach out to us via email or chat.

Steven Goh | CEO
World's laziest CEO. CEO of NinjaPear. Ex-Founder of Proxycurl (10+M), Steven founded 5 other startups: Gom VPN, Kloudsec, SilvrBullet, NuMoney, and SharedHere.

Featured Articles

Here's what we've been up to recently.

I dismissed someone, and it was not because of COVID19

The cadence of delivery. Last month, I dismissed the employment of a software developer who oversold himself during the interview phase. He turned out to be on the lowest rung of the software engineers in my company. Not being good enough is not a reason to be dismissed. But not

sharedhere

I got blocked from posting on Facebook

I tried sharing some news on Facebook today, and I got blocked from posting in other groups. I had figured that I needed a better growth engine instead of over-sharing on Facebook, so I spent the morning planning the new growth engine. Growth Hacking I term what I do in