b2b sales data and prospecting SaaS
It's hot right now to build "sales intelligence" tools – but there's an easy way and a hard way... I pick the easy way, what about you?

/ linkdb

The Blueprint to Building a Successful Sales Prospecting Application

Colton Randolph

Subscribe to our newsletter

Get the latest news from Proxycurl

Take a look around the B2B sales data and general prospecting software as a service space...

What kind of companies do you see?

  • Apollo
  • Lusha
  • Clearbit
  • RocketReach
  • ZoomInfo
  • UpLead
  • Coresignal
  • UserGems
  • Reply.io

All of them are successful in their own right and do more or less the same thing a little bit differently; providing B2B data while charging a profitable recurring premium for it.

Is selling B2B data profitable?

Yep. Many companies require rich B2B data to function, particularly internal sales teams that use it for sales intelligence.

Using the popular SaaS database tool, Latka, if we take a look at the B2B data market we can see:

  • Apollo raised just over $140M over five rounds (including seed and pre-seed), and the revenue run rate has hit roughly $23M in revenue as of 2023.
  • Clearbit raised $37M over three rounds (including pre-seed), and the revenue run rate has hit roughly $41M in revenue as of 2023.
  • Lusha raised $245M over two rounds, and the revenue run rate has hit roughly $29M in revenue as of 2023.

Clearly, there are plenty of investors and clients to go around the B2B data space. In my opinion, it's just as great of an industry to enter as something like AI is, especially once you incorporate the power of AI with rich B2B data.

Once you find your product/market fit, there's a lot of growth potential within the B2B data space.

Only some B2B data companies scrape and acquire their own data

Quite a few of the SaaS products in the general B2B data and sales prospecting tool space acquire their datasets from other data providers -- and there's a reason for that; it's hard and time-consuming to scrape massive amounts of data.

This is exactly why we'll be avoiding web scraping in this article.

We'll be showing you a way that you can build a sales prospecting application fed with rich B2B data -- all without having to scrape a single page.

But first, let me give you a background on how web scraping works, so you understand the full picture.

Everything starts somewhere

At Proxycurl, we're extremely experienced with scraping and collecting data on a massive scale.

When it comes to collecting B2B data one of the biggest sources of data we scrape from is LinkedIn. We've crawled an extensive amount of the public information on LinkedIn to scrape information from it.

Everything first starts with a seed. From the seed, you can branch out, collecting more and more data. The crawler's job is to branch out from the original seed.

Before you know it, you've clicked thousands of new links and have collected thousands of datasets.

Google's web crawlers work exactly the same way

They start from a seed and work outwards from the seed site, clicking all of the links and ultimately browsing the entire web as it's continuously expanding.

The problem is when it comes to scraping and running crawlers it's extremely difficult at scale.

In Google's case, everyone wants their crawlers on their site because it brings in more business, but in everyone else's case, businesses like LinkedIn are actively trying to prevent that to protect their source of monetization (data).

LinkedIn has strict limits on the amount of data you can pull from it, even on paid plans like Sales Navigator (2,500 records per search to be exact).

(See: How To Bypass LinkedIn Search Limits in 2023)

It's understandable because LinkedIn is in fact one of the best sources of B2B data. They just don't like to give it out without a very hefty premium, hardly even providing API access.

Should you scrape your own B2B data?

You're not LinkedIn. You don't have a giant B2B social media site to feed you data.

You're also not Google. People don't want you to crawl their site and content, collecting information because it's mutually beneficial.

So, you'll need to think about the logistics of hardware, accounts, IP addresses, and beyond to scrape data at scale. That's the scary part.

That's why often many B2B data and sales prospecting tools simply opt to skip doing their own scraping and instead rely on background data providers.

There are also a few other added benefits of doing this, the first being the time to market.

Not having to worry about building an infrastructure for data collection is major, but what's even bigger than that is you won't have to waste any time actually scraping and building a dataset. You can skip that entire process.

But, you will still need to acquire a rather large dataset to use as your data foundation for your application.

Lucky for you, I know just the right company for the job... us.

We power a lot of the B2B data and sales prospecting tools you see on the market

For example, Reply.io, mentioned above, is actually one of our customers.

reply.io business prospecting tool
Reply.io's website and unique selling point

They add their own twist on things by putting an emphasis on automating and integrating AI with sales outreach:

Reply.io's dashbaord
Reply.io's dashboard

They approached us when they wanted to roll out their "Data" section:

Reply.io's data section
Reply.io's "Data" section

All that data section really is, is one giant dataset that provides the ability to filter prospects, and build prospecting lists.

Other sales prospecting applications work similarly

For example, Lusha provides their own twist to things, but the central premise remains around filtering a large dataset to be used for prospecting.

Lusha prospecting tool
Lusha's dashboard

You can also enrich your data:

Lusha data enrichment
Data enrichment options on Lusha

Next, let's look at Apollo since it's a fairly large prospecting tool:

Apollo prospecting
Apollo's dashboard

Same thing again, it's basically one giant dataset with its own twist on things.

Like Lusha, they also offer enrichment:

Apollo enrichment
How Apollo does enrichment

Across all of these tools, the value lies in the data, not necessarily the product.

Building the actual tool isn't that hard. Acquiring the data that's then distributed through the application is the hard part.

Good thing we've done it for you.

Introducing LinkDB: your new data foundation

Reply.io is powered by LinkDB, which is our dataset of over 472 million public LinkedIn profiles.

(Is that enough profiles for you?)

We've scraped a massive amount of data for you to build your application on, and we provide it for a very, very, competitive price.

How do we compare to our competitors?

It varies, but here's a comparison between us and Coresignal, which focuses largely on selling datasets rather than enrichment:

Proxycurl vs Coresignal
You can learn more on our product comparison page.

But still, don't get it confused. While its price tag tends to be less than those of our competitors, LinkDB isn't inherently cheap. It's designed for real businesses with real capital. There's a worthy investment to obtain LinkDB.

You can click here to see all of LinkDB's pricing. We're very transparent, and there's no hidden pricing or gotchas.

We sell people profiles segmented by the following countries:

  • United States (264+M)
  • India (24+M)
  • United Kingdom (20+M)
  • Brazil (17+M)
  • Canada (13+M)
  • France (11+M)
  • Germany (9.3+M)
  • Australia (8.3+M)
  • Mexico (6+M)
  • Italy (5.3+M)
  • Spain (5.2+M)
  • Indonesia (4.4+M)
  • Netherlands (4.2+M)
  • China (3.8+M)
  • South Africa (3.3+M)

Our US People Profile segment is our most frequently purchased dataset, and it's also our largest. We also have our global company dataset available specifically for company data.

Specifically, LinkDB is delivered via parquet files. You can view sample data.

If you're interested in making a purchase or learning more, please send us an email at "[email protected]".

Can LinkDB be used as a source for my B2B data application?

Yes, you can use LinkDB data to power your application but you should not disclose more than 33% of LinkDB data you have purchased from us to any given customer.

You just can't directly resell LinkDB.

Now that we've covered:

  • If the B2B data industry is profitable or not
  • The hard way to obtain B2B data
  • The easy way to obtain B2B data

Let's get into the technicals...

Feeding your application B2B data

With LinkDB, you can segment by nearly anything you want. For example, here's a query to segment by school:

SELECT profile.id,
FROM   profile
       LEFT JOIN profile_education
              ON profile.id = profile_education.profile_id
WHERE  profile_education.school = 'Caltech';

We also provide an API that provides data on people and companies and is partially powered by LinkDB, but they work differently.

Our API equivalent is our Person Search Endpoint query:

import json, requests

api_key = 'Your_API_Key_Here'
headers = {'Authorization': 'Bearer ' + api_key}

person_endpoint = 'https://nubela.co/proxycurl/api/search/person'
params = {
    'country': 'US',
    'education_school_name': '(?i)^caltech$',
response = requests.get(person_endpoint, params=params, headers=headers)
result = response.json()
print(json.dumps(result, indent=2))

But the LinkDB version is more powerful. Why is it more powerful?

  1. We don't have to limit ourselves to a country
  2. We can select whichever fields we want
  3. With a database in our backend, there's no need to wait for expensive network calls to complete

You can check out sample data for LinkDB including 10,000 random US profiles in the Parquet format to see how LinkDB can be embedded into your application.

Building on your data foundation

With LinkDB you now have your data foundation to power your application.

Using our API mentioned above, you can get fancy and do other things like you saw with Lusha and Apollo, where they enriched data (and more).

Which, by the way, is basically just taking different data points and searching our dataset to see if we can find who you're searching for and return a richer, fresher data profile.

Pulling fresher datasets with our API

Our API returns profiles with varying levels of freshness, depending on how recently we scraped it.

If you use the use_cache=if-recent parameter with our API, you're requesting a profile freshness of 29 days or less. About 88% of these profiles are then fetched right then.

The other 12% are popular profiles, which are cached. This parameter guarantees fresh data, but it takes longer for requests to complete.

If you use the use_cache=if-present parameter with our API, it'll fetch cached profiles if possible. If not, profiles are fetched from LinkDB if it's possible. If that's not possible, the request is then fetched live. This request returns a response nearly immediately.

Usually a better idea for B2B data applications with a user interface that needs quick response times.

What can you do with our API?

With our API you can:

Let me show you an example.

How to use our API to enrich data

Using our Reverse Email Lookup Endpoint, we can take only an email address and enrich it.

Here's a quick Python example:

import json, requests

api_key = 'Your_API_Key_Here'
headers = {'Authorization': 'Bearer ' + api_key}
api_endpoint = 'https://nubela.co/proxycurl/api/linkedin/profile/resolve/email'
params = {
    'lookup_depth': 'deep',
    'email': '[email protected]',
    'enrich_profile': 'enrich',
response = requests.get(api_endpoint, params=params, headers=headers)
result = response.json()
print(json.dumps(result, indent=2))

The deep parameter extends past our database and is particularly useful for work emails. The enrich_profile parameter returns information such as:

  • Full name
  • LinkedIn headline
  • LinkedIn summary
  • Country
  • City
  • State
  • Education
  • Experiences
  • Industries
  • Personal phone number
  • All of the other responses listed here

Moreover, there are several different endpoints and several different ways you can use our API to pull fresh, accurate, and rich data to complement LinkDB.

Our API is entirely self-serve and you could start using it today.

You can create your Proxycurl account for free here.

Or, you can view our pricing here.

Do I need API access or just LinkDB?

LinkDB is a massive database of hundreds of millions of different people and companies scraped from LinkedIn. As I mentioned earlier, it's a huge data foundation.

That being said, LinkDB and Proxycurl API are intended to work in tandem. LinkDB to surface profiles of interest, and Proxycurl API to enrich and refresh profile data.

Closing thoughts

If you can add your own twist on things and have some technical competency, or someone you know who does (a software engineer), there's some money to be made if you can carve your own slice out of the B2B data and sales prospecting application market.

Ideally, you don't want to have to hire a data science or web scraping team, either...

That's where we come in -- and that's who we replace, for way less than they would cost.

We've scraped all of the B2B data you could possibly need for you already, and now we'd like to give it to you.

LinkDB can act as a perfect data foundation for your application just like it does for Reply.io.

And if you need profiles enriched, or your application demands the absolute freshest data, you can use our API to accomplish that and complement LinkDB.

If you have any questions at all, please feel free to reach out to us at "[email protected]".

P.S. If you need it, our software engineers can help you implement LinkDB with your application. Reach out here to learn more details.

Subscribe to our newsletter

Get the latest news from Proxycurl

Colton Randolph

Featured Articles

Here’s what we’ve been up to recently.