LinkDB - a global database with datasets of LinkedIn People & Company Profiles

/ proxycurl

LinkDB: A Comprehensive LinkedIn Dataset of Over 485 Million People and Company Profiles

Steven Goh
Share:

Subscribe to our newsletter

Get the latest news from Proxycurl

Don't let the naysayers get to you and tell you this is the era of GPT3 and Cryptocurrency. I will argue that we are in a new age of big data-driven companies. For example, why wait for the right candidate to apply for a role in a company when you can search through the entire LinkedIn profile dataset to source the "perfect candidates"? Big data is similarly embroiled in seeking investment alpha, training an AI model, or automating a sales prospecting process. We have the perfect datasets for some of such big-data needs.

LinkedIn is a wildly popular business platform used by many companies for various use cases. However, conducting menial searches is a tedious and repetitive task that is not cost-effective. While large companies with thousands of employees may be able to cope with this tedious task, this may not be the case for your business if you are a mid-sized company or a startup.

Content

A Postgresql Database With 401M+ People & Company Profiles

LinkDB is an exhaustive dataset of publicly accessible LinkedIn members and companies. The database contains the profiles of more than 401+M people and companies separated by regions globally. The regions include countries like the United States, Canada, the United Kingdom, Israel, Singapore, Australia, New Zealand, and Ireland. We have snapshots of lots of other countries too.

Google and Bing scrape public LinkedIn profiles and index publicly accessible LinkedIn profiles as part of their search results. In the same way, you can have the power of the LinkedIn dataset powering your product. Read further to know all the details of our LinkDB database product.

The LinkDB Dataset

To provide flexibility in terms of usage and size, the LinkDB dataset has been separated into regions. We have various classes of data snapshots. These are:

  1. People profiles - 170+MM profiles of people based in the US
  2. People profiles - 8.7M profiles of people based in Canada
  3. People profiles - 876000+ profiles of people based in Israel
  4. People profiles - 15+M profiles of people based in the UK
  5. People profiles - 1.7+M profiles of people based in Singapore
  6. People profiles - ~6M profiles of people based in Australia
  7. People profiles - ~1.5M profiles of people based in Ireland
  8. People profiles - ~1.5M profiles of people based in New Zealand
  9. People profiles - ~9.2M profiles of people based in Germany
  10. Company profiles - 17+M profiles of global companies
  11. Other than the countries listed above, we also have a growing list of datasets of other countries. Email us for a breakdown of our profiles by countries.

This is how a Person Profile looks like from within LinkDB:

{
  // elided for brevity
  ...,
  "first_name": "Jeff", 
  "last_name": "B.",
   ...,
  "experiences": [
    {
      "starts_at": {"day": 1, "month": 8, "year": 2018}, 
      "ends_at": null, 
      "company": "Illinois Mutual",
      ...
    },
    ...
  ],
  "accomplishment_courses": [
    ...
  ],
  "accomplishment_honors_awards": [
    ...
  ],
  "accomplishment_organisations": [
    ...
  ],
  "accomplishment_patents": [
    ...
  ],  
  ...
  // elided for brevity
}

You can find the entire schema of people and companies profiles in our API docs under Person Profile Endpoint and Company Profile Endpoint.

Who Is LinkDB For?

Proxycurl LinkDB database is a high-quality database, thus we also seek customers that are serious about getting a quality database for your business’ needs. To prevent unnecessary and damaging reselling of our database, we proactively work only with customers that are (some examples):

  • Companies who has received Venture Capital of at least USD1M
  • Companies with more than 50 US/EU employees
  • Located in a country with a strong rule of law
  • Case by case basis

With millions of profiles accessible immediately at your fingertips, there are tons of use cases and industries that can benefit from this comprehensive LinkedIn database of LinkDB. Be it if you just started building a company or startup and needed to kickstart your growth immediately, and a mature company that needed to revive and rejuvenate your operations, here are some use cases:

3 types of use cases for Proxycurl LinkDB Database

Sales & Marketing: Scale Prospecting & Lead Generation

With millions of profiles available through LinkDB, you can kickstart your outreach immediately and not waste a single second scrambling to find leads, and gain a headstart straight away.

Apart from direct outreach, these profiles proved to be a valuable advertising boost to be fed into platforms such as Facebook & LinkedIn to scale your advertising beyond these profiles.

Recruitment: Find The Best Candidates For Hiring

Find the perfect candidates from a complete dataset of LinkedIn profiles of people within a region of interest so that your HR department or business can spend less time and money sourcing potential employees. With our database of millions of profiles, you can filter for candidates down to the very last details.

Investing: Alternative Data For Investments & Venture Capital Firms

Use data metrics, such as how fast an employee count of a company is growing, to make investment decisions to gain significant alpha against your peers. With LinkDB, you can have data on past and present employees of companies, and tons of alternative data can be gleaned from here, from tracking employee count to determine growth rate and prospects of a company, to the future condition of the economy.

For VC, easily track stealth startups to spot the next big thing.

And more! There are all kinds of businesses and industries that use our LinkedIn profiles database, try them out for yourself to see if it suits you.

Sneak Peek: Sample Search Queries on LinkDB Database

To use LinkDB, you require a basic knowledge of SQL, JSON query functions and operators, and SQL/JSON path expressions. Let us start with a basic example.

SELECT
  id,
  parsed_data->>'first_name' AS first_name, 
  parsed_data->>'last_name' AS last_name
  FROM profile
  LIMIT 10;

Find Me All Software Engineers In San Francisco

To find all software engineers in the database, you would write:

SELECT
    id,
    parsed_data->>'first_name' AS first_name, 
    parsed_data->>'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'Software Engineer')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "title": "Software Engineer", "location": "San Francisco"}]}';
                        

The above query uses text search and JSON path syntax. Using the more SQL-like pattern, you can rewrite the above query like so:

SELECT 
  id,
  parsed_data->>'first_name' AS first_name, 
  parsed_data->>'last_name' AS last_name,
  jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.title == "Software Engineer" && @.location == "San Francisco").title') AS title
FROM PROFILE
WHERE EXISTS
    (SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE 
       (exp->>'title' ilike '%Software Engineer%') AND 
       (exp->>'location' ilike '%San Francisco%') AND 
       exp->>'ends_at' is null
    );

Find Me All Apple's Employees

To retrieve all employees of the company Apple, you write the following SQL query:

SELECT
    id,
    parsed_data->>'first_name' AS first_name,
    parsed_data->'last_name' AS last_name,
    jsonb_path_query_first(parsed_data, '$.experiences[*] ? (@.ends_at == null && @.company_linkedin_profile_url == "https://www.linkedin.com/company/apple/").title') AS title
FROM profile
WHERE to_tsvector('simple', parsed_data) @@ plainto_tsquery('simple', 'https://www.linkedin.com/company/apple/')
  AND parsed_data @> '{"experiences": [{"ends_at": null, "company_linkedin_profile_url": "https://www.linkedin.com/company/apple/"}]}'

OR

SELECT
    id,
    ... --same as above
FROM profile
WHERE EXISTS
(SELECT
     FROM jsonb_array_elements(parsed_data->'experiences') exp
     WHERE (exp->>'company_linkedin_profile_url' ilike '%linkedin.com/company/apple%') AND exp->>'ends_at' is null
    )

See the few other code samples from our LinkDB product page, including finding Stanford Computer Science graduates.

In-Depth Features Of Proxycurl LinkDB LinkedIn Database

6 features of Proxycurl LinkDB database

1. Data Freshness

Apart from our LinkDB LinkedIn database product, our other key products are tons of quality APIs to power our customers’ businesses and applications. Both products are closely linked in terms of data freshness, being: each time a real-time API is made to scrape a Linkedin profile, we update the LinkDB dataset with the results of the API call. This means you wouldn't have to worry about stale data for your purchase because this method keeps data on LinkDB constantly updated to the count of (up to) millions of profiles a day.

Upon purchase, you can opt to keep your LinkDB database updated with optional quarterly updates for a nominal fee. You can choose not to as well and the database is all yours.

2. Data Schema

The LinkDB LinkedIn database that has millions of people and company profiles, has tons of data points for each individual person and company. These are some data points available in the database, you can check out more in their respective docs:

• People Profiles - data schema

first and last name, profile picture, personal phone numbers, personal emails, work emails, occupation, work industry, Github, Facebook & Twitter profiles ID, LinkedIn profile headline & summary, country, city & state, work experience, education, languages, organizations, publications, honors & awards, patents, courses completed, projects, test scores, volunteer work, certifications, connections, activities, articles, groups, skills, inferred salary, gender, birth date, interests, related LinkedIn profiles, LinkedIn recommendations, similarly-name LinkedIn profiles, etc.

• Company Profiles - data schema

company name, description, website, tagline, profile picture, industry, categories, size, HQ & other locations (including country, city, state, street and postal code), company email, company phone number, Facebook & Twitter ID, company type, founded year & date, specialities, stock symbol, IPO status, Crunchbase rank, acquisitions & exits, funding rounds, funding amount, investors count, company LinkedIn updates, LinkedIn profile follower count, similar companies, etc.

3. File Format

We ship the data in Parquet file format. Apache Parquet, a column-oriented data file format, is designed for efficient data storage and retrieval and provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. This means you get to save space on cloud storage space while gaining increased data throughput and performance. Get 20,000 random profiles of people in the US in Parquet format to try out here and see if it suits you to integrate into your workflow.

The compliance of handling personal data is a crucial aspect to consider when conducting data sourcing efforts. Any mistakes can result in severe consequences, potentially damaging your company, case in point LinkedIn is not shy in pursuing legal suits when a company is threatening the data of LinkedIn members (but we know its more about their business/revenue being threatened). Proxycurl ensures full compliance with major legal standards, such as CCPA and GDPR, and is in the process of becoming SOC 2 certified. Our strict policies secure users' data and allow you to concentrate on expanding your business while we take care of your data needs.

The good news is, despite the various legal suits taken by LinkedIn against companies, there has been strong legal precedents (even from Supreme court) backing the companies that obtain data from LinkedIn’s site.

5. LinkedIn Public (Not Private) Profiles Data

Proxycurl LinkDB database has only data of public LinkedIn profiles, we do not scrape for private profiles. Scraping data from LinkedIn public profiles can be challenging, but attempting to scrape private profiles is not only a huge undertaking, but it also poses significant legal risks. As the name suggests, private profiles are meant to be kept confidential, and are only visible to other LinkedIn members who have a connection with the profile owner. Scraping private profiles illegally or inappropriately can impact LinkedIn's revenue, and they are likely to take action to protect their interests.

6. Pricing

Our prices are one of the most competitive ones out there, or at least what our customers told us. Our LinkDB LinkedIn datasets can be segmented by country level for person profiles (for company profiles there is only a global database) with transparent - and did I mention competitive already? - pricing. For pricing of datasets by the country level, reach out to us at [email protected] and we'll provide you the pricing of datasets that you're looking for. We do offer multiple financing options, enabling you to pay in monthly or quarterly installments, check out the pricing page for more details.

Besides the pricing all laid out, you can even test the quality of our database for purchase consideration. You can get some sample data here with people and company profiles of various regions globally.

Get Real Profiles With Proxycurl LinkDB Database Today

That’s the full walkthrough of our LinkDB LinkedIn Database for you. On its own you can already achieve a lot, or you can pair up with our other API products to return even more structured data or to automate your applications.

We have provided some sample data of 10,000 profiles each of various regoins if you want to experiment with the dataset in your local development environment. You can load the sample dataset into a relational database and check out the sample queries above for guidance.

Your business or applications can benefit immensely from the right dataset. LinkedIn is an excellent source of data on potential customers, companies, investments, and trends. There are much more to share regarding our LinkDB product, and I’m sure you have many questions too. Feel free to reach out to us via email or chat.

Subscribe to our newsletter

Get the latest news from Proxycurl

Steven Goh
Share:

Featured Articles

Here’s what we’ve been up to recently.