LinkDB - An exhaustive dataset of LinkedIn members and companies
LinkDB is a dataset of LinkedIn Profiles

/ proxycurl

LinkDB - An exhaustive dataset of LinkedIn members and companies

Subscribe to our newsletter

Get the latest news from Proxycurl

Don't let the naysayers get to you and tell you this is the era of GPT3 and Cryptocurrency. I will argue that we are in a new age of big data-driven companies. For example, why wait for the right candidate to apply for a role in a company when you can search through the entire LinkedIn profile dataset to source the "perfect candidates"? Big data is similarly embroiled in seeking investment alpha, training an AI model, or automating a sales prospecting process. We have the dataset for some of such big-data needs.

LinkedIn is a wildly popular business platform used by many companies for various use cases. However, conducting menial searches is a tedious and repetitive task that is not cost-effective. While large companies with thousands of employees may be able to cope with this tedious task, this may not be the case for your business if you are a mid-sized company or a startup.

LinkDB is a complete LinkedIn dataset of people and companies that provides you with an opportunity to integrate the vast sea of LinkedIn public data into your business applications in an easy, efficient and legal manner. If you are a mid-market company based in the US or Europe with at least 50 employees, or if a reputable VC funds you for at least 1M USD, then there is a good chance that LinkDB might suit your needs.

Table of contents

What is LinkDB?

LinkDB is an exhaustive dataset of publicly accessible LinkedIn members and companies. The database contains the profiles of 17+M companies worldwide and people's profiles separated by regions. The regions include countries like the United States, Canada, the United Kingdom, Israel, Singapore, Australia, New Zealand, and Ireland. We have snapshots of other countries too. Reach out to us to get a breakdown of our people profiles by country.

Google and Bing scrape public LinkedIn profiles and index publicly accessible LinkedIn profiles as part of their search results. In the same way, you, a tech startup, can have the power of the LinkedIn dataset powering your product.

Besides, given the Supreme court ruling and reaffirmation, you have the legal precedence backing this up.

What is LinkDB For?

Sales/marketing Prospecting automation

Automate your sales process with by powering your LinkedIn prospect searches with LinkDB.

Find the perfect candidate from a complete dataset of LinkedIn profiles of people within a region of interest so that your HR department or business can spend less time and money sourcing potential employees.

Investment analysis and Training AI models for investment analysis

Use data metrics, such as how fast an employee count is growing, to make investment decisions to gain significant alpha against your peers.

Sample Search Queries on LinkDB

To use LinkDB, you require a basic knowledge of SQL, JSON query functions and operators, and SQL/JSON path expressions. Let us start with a basic example.

SELECT
  id,
  parsed_data->>'first_name' AS first_name, 
  parsed_data->>'last_name' AS last_name
  FROM profile
  LIMIT 10;

Find Me All Software Engineers In San Francisco

To find all software engineers in the database, you would write:

SELECT
    id,
    parsed_data->>'first_name' AS first_name, 
    parsed_data->>'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'Software Engineer')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "title": "Software Engineer", "location": "San Francisco"}]}';
                        

The above query uses text search and JSON path syntax. Using the more SQL-like pattern, you can rewrite the above query like so:

SELECT 
  id,
  parsed_data->>'first_name' AS first_name, 
  parsed_data->>'last_name' AS last_name,
  jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM PROFILE
WHERE EXISTS
    (SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE 
       (exp->>'title' ilike '%Software Engineer%') AND 
       (exp->>'location' ilike '%San Francisco%') AND 
       exp->>'ends_at' is null
    );

Find Me All Apple's Employees

To retrieve all employees of the company Apple, you write the following SQL query:

SELECT
    id,
    parsed_data->>'first_name' AS first_name,
    parsed_data->'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.company_linkedin_profile_url == "https://www.linkedin.com/company/apple/").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'https://www.linkedin.com/company/apple/')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "company_linkedin_profile_url": "https://www.linkedin.com/company/apple/"}]}'

OR

SELECT
    id,
    ... --same as above
FROM profile
WHERE EXISTS
(SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE (exp->>'company_linkedin_profile_url' ilike '%linkedin.com/company/apple%') AND exp->>'ends_at' is null
    )

Who is LinkDB For?

We sell LinkDB to middle-market companies based in the US or Europe with at least 50 employees, or to startups funded by a reputable venture capital firms for at least 1M USD.

Please contact us if you are interested in LinkDB and your company belongs to any of the categories above.

The LinkDB dataset

To provide flexibility in terms of usage and size, the LinkDB dataset has been separated into regions. We have various classes of data snapshots. These are:

  1. People profiles - 170+MM profiles of people based in the US
  2. People profiles - 8.7M profiles of people based in Canada
  3. People profiles - 876000+ profiles of people based in Israel
  4. People profiles - 15+M profiles of people based in the UK
  5. People profiles - 1.7+M profiles of people based in Singapore
  6. People profiles - ~6M profiles of people based in Australia
  7. People profiles - ~1.5M profiles of people based in Ireland
  8. People profiles - ~1.5M profiles of people based in New Zealand
  9. People profiles - ~9.2M profiles of people based in Germany
  10. Company profiles - 17+M profiles of global companies

Other than the countries listed above, we also have a growing list of datasets of other countries. Contact us for a breakdown of our people profile by countries.

This is how a Person Profile looks like from within LinkDB:

{
  // elided for brevity
  ...,
  "first_name": "Jeff", 
  "last_name": "B.",
   ...,
  "experiences": [
    {
      "starts_at": {"day": 1, "month": 8, "year": 2018}, 
      "ends_at": null, 
      "company": "Illinois Mutual",
      ...
    },
    ...
  ],
  "accomplishment_courses": [
    ...
  ],
  "accomplishment_honors_awards": [
    ...
  ],
  "accomplishment_organisations": [
    ...
  ],
  "accomplishment_patents": [
    ...
  ],  
  ...
  // elided for brevity
}

You can find the entire schema of people's and companies' profiles in our API docs under Person Profile Endpoint and Company Profile Endpoint.

We have provided some sample data here if you want to experiment with the dataset in your local development environment. You can load the sample dataset into a relational database and check out the sample queries above for guidance.

We ship the data in Parquet file format. Apache Parquet, a column-oriented data file format, is designed for efficient data storage and retrieval and provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. This means you get to save space on cloud storage space while gaining increased data throughput and performance. You can also use Parquet files directly for MapReduce operations in Apache Spark.

How We Keep LinkDB up to Date

Each time a real-time API is made to scrape a Linkedin profile, we update the dataset with the results of the API call. This means you wouldn't have to worry about stale data because this method keeps data on LinkDB constantly updated to the count of (up to) millions of profiles a day.

Pricing

Our prices are usually priced in the 5-digits per data segment. We do not share price quotes to prevent pricing war. See this post for context.

Summary

Your business or applications can benefit immensely from the right dataset. LinkedIn is an excellent source of data on potential customers, companies, investments, and trends. Are you a mid-market-sized company or a startup looking for a complete LinkedIn dataset? Please send us an email, and let's get on a call!

Subscribe to our newsletter

Get the latest news from Proxycurl

Latest Articles

Here’s what we’ve been up to recently.