Using Proxycurl's Historic LinkedIn Employee Count Tool for Investment Research

What is Proxycurl's Historic LinkedIn Employee Count Tool, and why is it useful?

Proxycurl's Historic LinkedIn Employee Count Tool counts employees during historical snapshots and provides a trend line dating as far back as the user wants, with the caveat that it's only accurate as long as there are no major statistical confounding factors, such as the company being in stealth. This might already sound awesome to you. But let's talk about why you might want to use this tool.

First of all, we have to understand that LinkedIn employee data is a good estimate for total employee data. We're mostly interested in the tech sector in the English-speaking world, where LinkedIn is the ubiquitous social media platform for sharing employment status. And we won't be looking back more than a few years in time, so LinkedIn's status as such won't be in question. We'll also apply common sense to the results. If our trend line goes down to 0 and news stories said that the company in question was in stealth but existed, we'll believe the news stories, not our trend line.

Example: HireEZ

Let's now look at an example. Here's a graph of HireEZ's extrapolated employee counts. What can we learn from it?

A graph of HireEZ's employee counts between March 2020 and February 2023

A couple things stand out:

HireEZ had a huge period of growth in headcount from about December 2020 to February 2021, continuing even through July 2021.
HireEZ had another burst of hiring around February 2022.
The recent tech layoffs did not leave them alone, and they've had a downturn in headcount since October 2022.

Without this data, what methods might we use to look for the health of a company? If we were investors in HireEZ, we would have access to investor reports, but we're outsiders. Well then, we can ask, what are investors doing? Let's look at HireEZ's funding rounds. In August 2020, HireEZ, then known as Hiretual, raised a $13 million Series B round, and in February 2022, they raised another $26 million.

These dates don't correlate exactly to our data. It looks like they started hiring a bit after the first funding round that we saw. But it's a good estimate, and it looks like they used these injections of capital to fund hiring.

A quick search doesn't pull up results for "HireEZ layoffs," but we can make inferences. Is this company doing well now? Of course, this question must be put into context with the entire market going through a downturn right now, but these graphs give us some intuition.

Example: Backblaze

In contrast to the privately-held HireEZ, Backblaze is public, they IPO'd in November 2021. Public companies have an indicator of their health that's unavailable for private companies: their stock prices. But how does this correlate to employee count? And which is the better signal of company health?

Here's a screenshot of Backblaze's stock since their IPO, courtesy of Google Finance:

Backblaze's stock since their IPO

And here's their employee count, which has risen pretty steadily, with a very small reduction in the past couple months.

Backblaze's employee count between March 2020 and February 2023

What do you think about their current health? What does this graph tell you that the stock price didn't? Have you learned something new?

I'm in! How can we make graphs like this?

Obviously, we have to scrape LinkedIn in some way, as LinkedIn doesn't provide an API. If you're a regular reader of this blog or already used the Proxycurl API, it's great to have you back. If not, you may want to read this post about scraping LinkedIn for structured data.

Update: Proxycurl has since been sunset. The founder behind Proxycurl is now building NinjaPear. In Steven G's shutdown note, he explains that LinkedIn sued Proxycurl in 2025 and the team chose to shut it down rather than fight a war of attrition with Microsoft. So the Proxycurl parts below are preserved for historical context. If you're building something new today, I would look at NinjaPear's B2B data APIs instead, especially if you want similar company-shaped data with none of the legal liability.

In this particular case, though, there's actually another option that we could theoretically use, which you might be familiar with: LinkedIn's in-house Premium Business Insights. However, if you want to analyze this at scale then it would be a TOS violation to scrape data using this tool, and you'd risk your LinkedIn account being banned. So we are back to the Proxycurl API, which was built for developers looking to scrape LinkedIn.

There is no API endpoint where we can say requests.get(some_endpoint, {url="company"}) and have it return what we want. But that's okay, that's why the rest of this post will be a lot of fun. We built a tool that allows you to make use of three endpoints to perform a calculation that approximates the result. In the rest of this post, I will introduce you to the Proxycurl Historic Employee Count Tool.

Want to run it yourself? It's available as a docker container. To download, run:

docker pull ghcr.io/nubelaco/historic-employee-count-tool:master

And to execute, run:

docker run -it ghcr.io/nubelaco/historic-employee-count-tool:master PROXYCURL_API_KEY TARGET_COMPANY_LI_URL > employee_count_history.csv

You can also clone the repository yourself if you want to edit the Python code or you're not comfortable with Docker, and there's some additional documentation available in that repo's README that's not covered here.

How to build a historic LinkedIn employee count tool

High-level overview of the tool

Grab the current month's total employee count from LinkedIn. This is very easy.
For each of the past N months, grab a snapshot of the number of employees with public profiles on LinkedIn, since this is all that's available to us. This is a little trickier, but still doable.
Use step 1 and 2 to calculate X in the following ratio for each month: previous snapshot : X = current month's snapshot count : current month's total told us by LinkedIn.

Proxycurl endpoints we will use

Employee Listing Endpoint, one of the company endpoints, this lists every employee in a company and gives links to their profile URLs. In the Proxycurl API, a LinkedIn URL is always the unique identifier of an entity, be it company, person, job, or anything else.
Employee Count Endpoint, another company endpoint, this does exactly what it says: it gives us a count of the employees employed by the company. You can get both cached information from Proxycurl in the form of a linkdb_employee_count, which can be either past, current, or all employees. For this tool, we're more interested in the linkedin_employee_count, which was scraped directly from LinkedIn and includes private profiles.
Person Profile Endpoint, this endpoint is optional and is a performance enhancement and cache invalidator. We could, if we wanted, use the first endpoint with the enrich_profiles=enrich option instead. That endpoint would then enrich our first query with the person endpoint profile results. But for performance, we can batch our queries here and use the async Proxycurl Python client library to run the script a bit faster, with the use_cache=if-recent flag, which lets us ensure our data is never more than 29 days out of date.

NinjaPear alternative today

If you're reading this in 2026 or later, the practical alternative is to stop depending on LinkedIn scraping entirely.

NinjaPear's Company API gives you a cleaner way to do the investment-research part of this workflow:

Use the GET /api/v1/company/employee-count endpoint to fetch a fresh employee count from public web sources.
Poll it monthly and store your own history, which gives you the same trend-line habit this article is advocating.
Layer on GET /api/v1/company/funding to compare headcount expansion against financing events.
Layer on GET /api/v1/company/updates to track hiring pushes, new-market moves, product launches, and executive messaging alongside the headcount curve.
Use GET /api/v1/company/details to normalize company metadata like founded year, executives, location, and industry.

It is not the same thing as the old Proxycurl approach. It does not scrape LinkedIn, and that is the point. You get B2B data in a similar shape for company research, but with richer non-LinkedIn context and none of the legal liability.

How to implement the high-level overview

Query data from these endpoints.
Create an array of datetime.date intervals using timedelta and calendar.monthrange. Honestly, this would probably have been the hardest part of the entire project had I not done something nearly identical several years ago.
Check when the experiences.start and experiences.end ranges intersect the month intervals we created in step 2 for every employee whose profile we pulled.
Progressively accumulate these intersections into an array of total_employee_ranges.
Calculate, for each month, X in our ratio from the high-level overview:

python @staticmethod def get_adjusted_employee_counts(past_employee_counts: List[int], current_employee_count: int) -> List[int]: current_employee_estimate = past_employee_counts[0] adjusted_employee_counts = [] for item in past_employee_counts: if current_employee_estimate == 0: adjusted_employee_counts.append(0) continue adjusted_employee_counts.append(int(item * current_employee_count / current_employee_estimate)) return adjusted_employee_counts 6. Construct a CSV to print to the user.

Gotchas

There's a few gotchas we have to address:

The proxycurl-py library raises and logs an exception on profiles that 404, but in this particular case we are expecting some profiles not to exist, so we need to catch and silence this particular exception.
In some cases, users will link to an internationalized LinkedIn URL, for example https://pl.linkedin.com. We will match against only the last portion of the URL, keeping in mind there could be a trailing slash, or:

python @staticmethod def identifier(url: str) -> str: arr = url.split('/') return arr[-2] if arr[-1] == '' else arr[-1] * Off-by-one: what to do about the current month? We don't want to print it to the user because it's going to be incomplete. But in our proportion of previous snapshot : X = current month's snapshot count : current month's total told us by LinkedIn, the entire RHS must refer to the current month. So we do have to include the current month in our data set. The choice we made, therefore, was to include the current month in our months object, but then not print it to the user at the end:

python for i, month in enumerate(self.month_ranges): if i == 0: # Recall we used the first slot for current information, and not for a full month of data. continue o.append(f"{month['end'].strftime('%Y-%m-%d')},{adjusted_employee_accounts[i]}")

Do you have to query the entire company to get meaningful results?

Probably not. We tested on Stripe, which LinkedIn lists as having 8003 employees, and the linkdb_employee_count from the Employee Count Endpoint gives 8766 when employment_status is set to all. If we query only 3000 of these, or a bit under half, we can get a pretty accurate picture of the trend line, and better yet, our script takes only about 6 minutes to run. Here's a graph with various limits:

Querying Stripe with different methods and limits

In the graph above, the solid blue line shows the trend if we didn't do the extra work involved to use the Person Profile Endpoint at all, and simply used the employee listing endpoint with enrich_profiles=yes. As you can see, the trend is still mostly visible. However, this method is significantly slower than the async Person Profile Endpoint method, and we don't recommend it.

The orange heavy dashed line represents the most accurate data. This line was generated by using all company data. The next two lines show the query limited to 3000 and 1000 employees, respectively, and finally we showed a query limited to 500. We chose 3000 as a default, but you can use this chart to guide your decision based on the size of the company you're querying and how precise a result you want.

You might wonder: is there any bias in the ordering of the sample data? Proxycurl orders by LinkedIn ID, so hopefully employees aren't leaving and joining based on alphabetical order. But just in case they are, this sample is being determined randomly:

    @staticmethod
    def get_limited_sample_of_urls(past_employee_urls: List[str], limit: int) -> List[str]:
        if limit == -1 or limit >= len(past_employee_urls):
            return past_employee_urls
        ret = []
        ordering = list(range(len(past_employee_urls)))
        shuffle(ordering)
        for i in range(limit):
            ret.append(past_employee_urls[ordering[i]])
        return ret

More data is coming

If the old Proxycurl workflow was useful to you, the modern equivalent is not to rebuild the same LinkedIn dependency with a different wrapper. It is to build your own longitudinal dataset using NinjaPear's company endpoints and public-web signals.

A simple version looks like this:

Call Company Details once to normalize the company.
Poll the Employee Count endpoint monthly and append the result to your own time series.
Pull Funding and Updates on the same schedule so your employee-count chart has context, not just a line.
Store everything in your own warehouse so your research edge compounds over time.

That gets you a safer workflow, richer context, and none of the nonsense that comes with building on top of LinkedIn scraping infrastructure.

Or if you still want the original project for reference, you can build this historical tool yourself. And if you're building the 2026 version of this workflow, sign up for NinjaPear and start with the company endpoints instead.