Introducing Proxycurl's Search API
Introducing Proxycurl's Search API

/ proxycurl

Proxycurl Announces Search!

📢
As of 1 March 2024, we've deprecated this initial version - the Search API V1, and launched the Search API V2, with some amazing updates:
• No more regexes, now on Boolean search
• No more 35-credit base costs, it's now just 3 credits/result
• Much faster search speed
Check out the announcement post here.

Proxycurl Search is here! We've been working hard on it all year, and we're excited to finally bring it to you.

Proxycurl Search consists of two endpoints: Person Search Endpoint and Company Search Endpoint. Both endpoints let you specify a set of search fields in the form of REST API parameters, and then return a list of people or companies, respectively. The cost of a successful query is 35 credits base + 3 credits per row returned, where successful means "does not error."

We had a few goals for our initial release of Search:

  • Goal: Search should be accurate (you are getting correct results).
  • Goal: Search should be stable (the code you write today won't have to change in the future, and the endpoint isn't crashing).
  • Goal: Search should be available (all of our paying customers have access to it).

We also had a couple non-goals, which come with a couple implications of their own:

  • Non-goal: Search should be complete (we are still adding some features to it, see below).
  • Non-goal: Search should be fast (expect long response times for some queries).

(Exactly how long is "long?" We profiled using the example "Targeting your ideal customer profiles for sales pitches" query from below. The average response to the Company Search Endpoint over 10 attempts took 54 seconds; however, there were two outlier responses that were both over a minute. Discounting those outliers, the average response took 38 seconds. So expect about 40 seconds per round-trip if you are querying at non-peak times.)

At this point, it's also helpful to define: What do we mean when we say "Search?" You might be picturing something like Google, an impersonal rectangle on a page that you can type words into and then magic happens, and you get a result.

Here's our definition: Search is a Search+Sample product designed to bridge the gap between the Lookup API and LinkDB. You give us parameters, we sample up to 20,000 results and return them to you (without any ordering). If you want a "best match" then you should use one of the Lookup endpoints, and if you want all of the data then you should buy LinkDB. Search is a specialty product for an in-between use case.

Speaking of use cases, let's look at a couple.

Use cases

In the Python demos below, initialize the following at the top of your file, and set the environment variable PROXYCURL_API_KEY appropriately.

import os, requests

company_endpoint = 'https://nubela.co/proxycurl/api/search/company'
person_endpoint = 'https://nubela.co/proxycurl/api/search/person'
api_key = os.environ['PROXYCURL_API_KEY']
headers = {'Authorization': 'Bearer ' + api_key}

Identifying talent with specific skillsets

Say you work in HR at a company experiencing hypergrowth. You need 2-3 specialists in your company's almost-unique tech stack, and you want someone who can be guaranteed to hit the ground running. You've identified one or more major non-competitor companies who use the same stack as yourself, and for now you specifically want alumni from these companies, plus "software engineer." You'll run your additional analyses once you have all the prospective candidates' profiles, we'll give you the data.

Sample Python code:

params = {
    'current_company_linkedin_profile_url': 'https://www.linkedin.com/company/discord/',
    'current_role_title': '[Ss]oftware [Ee]ngineer|[Ss][Ww][Ee]'
}
response1 = requests.get(person_endpoint, params=params, headers=headers)

params = {
    'past_company_linkedin_profile_url': 'https://www.linkedin.com/company/discord/',
    'past_role_title': '[Ss]oftware [Ee]ngineer|[Ss][Ww][Ee]'
}
response2 = requests.get(person_endpoint, params=params, headers=headers)
print(response1.json())
print(response2.json())

To avoid encoding anyone's private information in a permanent blog post, we will list the profiles of celebrities instead in the following result:

{
  "results": [
    { "linkedin_profile_url": "https://www.linkedin.com/in/satyanadella", "profile": "None" },
    { "linkedin_profile_url": "https://www.linkedin.com/in/williamhgates", "profile": "None" },
  ]
}
{
  "results": [
    { "linkedin_profile_url": "https://www.linkedin.com/in/tim-cook-a19738202", "profile": "None" },
    { "linkedin_profile_url": "https://www.linkedin.com/in/barackobama", "profile": "None" },
  ]
}

In this result and all others, we are making a couple minor formatting changes so that the JSON syntax highlighter behaves properly (for example, changing single quotes to double quotes) and also leaving out the next_page parameter; your real response will look a little different.

Targeting your ideal customer profiles for sales pitches

Say you work at a B2B SaaS startup in a period of growth. You want to launch a new marketing campaign. The problem is identifying customers. Your most precious resource is your CEO's time, and she is swamped. You have metrics on your ICP (ideal customer profile): You want to target small medical device companies that haven't yet IPO'd in the United States and that were founded in the past 5 years.

Sample Python code:

params = {
    'type': 'PRIVATELY_HELD',
    'founded_after_year': '2018',
    'employee_count_max': '1250',
    'industry': 'Medical Equipment Manufacturing',
    'country': 'US',
}

response = requests.get(company_endpoint, params=params, headers=headers)
print(response.json())

Here's what your first two results might look like:

{
  "results": [
    { "linkedin_profile_url": "https://www.linkedin.com/company/corzamedical", "profile": "None" },
    { "linkedin_profile_url": "https://www.linkedin.com/company/omniscientneurotechnology", "profile": "None" },
  ]
}

Identifying early-stage startups: Build a list of prospective founders two different ways

Say you work at a VC/investment firm. You want to identify early-stage privately-held startups that are open to an investment round - one that's likely to pay off.

Let's make a list of prospective founders in two ways.

First, we'll make the list based on their background. In this example, we'll look at graduates from Caltech, Stanford, or MIT who graduated with either a CS or Applied Math degree in a single query.

params = {
    'education_degree_name': '[Aa]pplied.*[Mm]ath|[Aa][Cc][Mm]|[Cc]omputer [Ss]cience',
    'education_school_name':
        'Caltech|Massachusetts Institute of Technology|Stanford University',
}
response = requests.get(person_endpoint, params=params, headers=headers)
print(response.json())

Now let's do it based on who's working at a stealth startup. LinkedIn lets you enter "Stealth Startup" as your company name, so we can run the following:

params = {
    'current_company': 'Stealth Startup',
}
response = requests.get(person_endpoint, params=params, headers=headers)
print(response.json())

In both cases, your results will look similar to the "Identifying talent with specific skillsets" example above.

The Search API has two endpoints. They are:

  1. The Person Search Endpoint, which returns a list of people.
  2. The Company Search Endpoint, which returns a list of companies.

There is one "meta" parameter for each of the Proxycurl Search endpoints: page_size. It works the same way it does in any of the Listing endpoints, for example the Employee Listing Endpoint. Include this so that you can manage your credit spend, especially during testing.

Aside from this, every parameter is for querying. Let's look at a couple from each endpoint:

  • In the Person Search Endpoint, in addition to the parameters you've already seen, you can also query based on education with parameters like education_school_name and education_field_of_study.
  • In the Company Search Endpoint, in addition to the parameters you've already seen, you can also query based on a regex matching company name, the company's description, and its LinkedIn follower_count_max or follower_count_min.
  • Both endpoints have geo params - city, region, and country.

There's still a bit more than this, and you may have missed some along the way since I didn't restate anything mentioned in a use case above, so head on over to the docs to check them out: Person Search Endpoint & Company Search Endpoint.

Everything you need to know about our regex support

Many of these fields have type regex, so here's what you need to know about our regex support.

  • All regular expressions contain an implicit .* on each side of them. For example, notice that to search "Software Engineer," we didn't start the field with .* to include "Senior Software Engineer" in our search. If you want to specify a start or end of word, include ^ and/or $, such as ^Apple$ to search for the technology company specifically (although in this case you might want to use the Company Lookup Endpoint).
  • While the regular expression parameters in our other endpoints are now case-insensitive, search regexes are all case sensitive. This is a result of our specific database implementation. We've made it clear in the docs which fields have which level of sensitivity in case you ever need to look it up.
  • We do support lookaround! In particular, you can use negative lookahead to ensure certain strings are not present in the result.
  • If you want more details, we're using Trino's regex support, which in turn uses Java's Pattern syntax, so you can refer to the Pattern docs for what you can and can't do if you really want to get down in the weeds.

What's next?

We're going to be adding more parameters to both endpoints and improving performance. While future development plans are subject to change, here are some parameters that we intend to add include the following:

  • To the Company Search Endpoint, funding_raised_after, funding_raised_before, and funding_amount.
  • To the Person Search Endpoint, industries, interests, and skills.
  • To the Person Search Endpoint, the ability to search based on any company parameter, applied to that person's company. For example, company_country, company_industry, company_founded_after, etc.
  • To both endpoints, enrich_profiles=enrich. If you use our Listing Endpoints, you might be familiar with this parameter; it allows you to enrich any query with profile data automatically, without having to make extra queries on your own once you've gotten a list of LinkedIn profile URLs. We know this is useful to our customers, and it's on our roadmap.

Let's look at a new use case that will be possible with Proxycurl Search in the future. To be clear, this code will not run currently, but it's something that we look forward to supporting in a future iteration of Search.

Search for companies based on past funding history

Let's use the forthcoming Company Search Endpoint parameters funding_raised_after and funding_amount to construct another example that may be of interest to VCs and investment firms. We will make a list of privately-held companies founded since 2019 that have raised money in the past two years. Furthermore, we'll require that they've raised at least $5 million USD to date. You could restrict this further by industry, country, etc, but here's the baseline:

params = {
    'type': 'PRIVATELY_HELD',
    'founded_year_after': '2019',
    'funding_raised_after': '2021-04-01',
    'funding_amount': '5000000'
}
response = requests.get(company_endpoint, params=params, headers=headers)
print(response.json())

Stay in touch!

Excited about Proxycurl Search? Have a product question? Or an opinion about our documentation? Let us know at [email protected]! We read every single one of your emails, and we LOVE getting email from you - please keep in touch and then stay in touch!

Subscribe to our newsletter

Get the latest news from Proxycurl

Featured Articles

Here’s what we’ve been up to recently.