Introducing Prospector Spreadsheets Learn more

Ultimate Guide to Lookalike Prospecting (Code Snippets + GitHub Project)
look alike prospecting

Ultimate Guide to Lookalike Prospecting (Code Snippets + GitHub Project)

A customer sought out recommendations for his problem: “I have been experimenting with NinjaPear using my AI agents to find investors and leads for my company.” In this article, I’m going to show you how to build a full agentic lead generation system with PydanticAI, NinjaPear, and a small set of supporting tools, including the exact 4 loops, the code structure, and the GitHub project you can ship.

r/Sales_Professionals u/executivegtm-47 · ▲ 1
The stale data problem you're describing isn't really an Apollo problem, it's a database problem... by the time you're reaching out the information is already months old.
💻 Full code on GitHub: lookalike-prospecting-guide
A runnable starter repo with the 4 loops, Pydantic models, synthetic CSV seeds, sample NinjaPear-shaped payloads, and tests.
git clone https://github.com/NinjaPear-Shares/lookalike-prospecting-guide.git
View on GitHub →

What this guide does

This is a developer guide for look alike prospecting inside an agentic SDR system.

It covers four loops:

  • Competitor → Customers: turn one known company into additive prospect accounts.
  • CRM Account → Competitors: widen your account universe from closed-won seeds.
  • CRM Contact → Similar People: turn one good contact into many role-adjacent people at relevant companies.
  • Company → Updates: rank the best prospects by visible timing signals.

That is the whole job here. I’m going to show the loops, the code, the sample responses, the models, and the guardrails that keep this from becoming a fancy way to buy bad leads faster.

Lookalike prospecting, without the bullshit

Look alike prospecting is not “find me companies with similar headcount.” It is generating new accounts or people that resemble proven wins across fit, context, and timing. Most tools stop at fit. That is why most outputs feel generic.

Most lookalike prospecting products are just firmographic cloning with an AI label attached. They fail for ordinary reasons: dirty seeds, weak source signals, no suppression layer, and scoring logic nobody can explain once a rep asks why a company is on the list.

Clean input is boring. That is why it works. Start with closed-won first. Split by use case. Exclude existing customers, churned accounts, open opps, partners, agencies, and test junk before you enrich anything. A 20-account clean seed will beat a 2,000-account blended mess almost every time.

The signal stack is not equal. Firmographics are table stakes. Technographics add context. Relationship data is stronger. Trigger data handles timing. If you already have customer work emails in CRM, Similar People is usually your best 1→N move.

r/coldemail u/cursedboy328 · ▲ 61
ran 464k cold emails last year across clients. Tested every list source out there... Ended up building our own scraping stack for almost everything because the bought data is stale, expensive, and everyone else is emailing the same contacts.

The 4 agent loops

The system has four loops because there are four separate problems.

  1. Competitor to customers gives you fast account expansion.
  2. CRM accounts to competitors gives you clean market widening.
  3. CRM contacts to similar people gives you additive people discovery.
  4. Triggers to outreach gives you timing.

Do not collapse all four into one big pipeline on day one. Keep them separate. It makes debugging easier, measurement easier, and failure less ambiguous.

Loop 1: Competitor to customers

Problem: You know a competitor or adjacent company and want a prospect list fast.

Solution: Use the Customer API customer listing endpoint to find companies already buying from that vendor or sitting in its ecosystem.

This endpoint is a good starting point because it returns three relationship buckets: customers, investors, and partner_platforms. The docs also matter here. Cost is 1 credit per request + 2 credits per company returned. quality_filter defaults to true, which filters junk TLDs and unreachable sites. Good default.

from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectAccount

client = NinjaPearClient()
response = client.get_customer_listing("https://stripe.com")
accounts = [
    ProspectAccount.from_customer_listing(item, source="customer_listing")
    for item in response["customers"]
]

Sample response, using the same shape as NinjaPear docs:

{
  "customers": [
    {
      "name": "Apple",
      "description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
      "tagline": "Think different.",
      "website": "https://www.apple.com",
      "company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.apple.com",
      "id": "abc123",
      "industry": 45202030,
      "specialties": ["Technology", "Consumer Electronics"],
      "x_profile": "https://x.com/Apple"
    }
  ],
  "investors": [
    {
      "name": "Sequoia Capital",
      "description": "Sequoia Capital is a venture capital firm focused on technology companies.",
      "tagline": null,
      "website": "https://www.sequoiacap.com",
      "company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.sequoiacap.com",
      "id": "def456",
      "industry": 40203010,
      "specialties": ["Venture Capital", "Growth Equity"],
      "x_profile": "https://x.com/sequoia"
    }
  ],
  "partner_platforms": [
    {
      "name": "Amazon Web Services",
      "description": "Amazon Web Services provides cloud computing platforms and APIs.",
      "tagline": null,
      "website": "https://aws.amazon.com",
      "company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Faws.amazon.com",
      "id": "ghi789",
      "industry": 45101010,
      "specialties": ["Cloud Computing", "Infrastructure"],
      "x_profile": "https://x.com/awscloud"
    }
  ],
  "next_page": "https://nubela.co/api/v1/customer/listing?website=https://www.stripe.com&cursor=abc123"
}

The flow is simple:

  • input website
  • customer list
  • normalize
  • suppress existing CRM accounts
  • score

Expected normalized output:

{
  "name": "Apple",
  "website": "https://www.apple.com",
  "industry": "45202030",
  "source": "customer_listing",
  "source_evidence": [
    "Returned by customer_listing",
    "Company id: abc123"
  ],
  "fit_score": 0.65,
  "relationship_score": 0.85,
  "timing_score": 0.25
}

A workable outreach angle is short and specific: you already sell into the same ecosystem as Stripe, so this is not a random account pulled from firmographic filters.

Loop 2: CRM accounts to competitors

Problem: You have closed-won accounts in CRM and want to widen the account universe around them.

Solution: Use the Competitor Listing Endpoint on each CRM account website, merge results, dedupe, suppress, score.

This is the cleanest 0→1 account expansion loop in the stack. It starts with proven wins. That matters more than people think.

from src.clients.ninjapear import NinjaPearClient
from src.scoring import score_account

client = NinjaPearClient()

for website in seed_account_websites:
    competitors = client.get_competitor_listing(website)
    for comp in competitors["competitors"]:
        scored = score_account(comp, source="competitor_listing")
        if scored.total_score >= 0.72:
            save_candidate(scored)

Sample response:

{
  "competitors": [
    {
      "name": "Adyen",
      "website": "https://www.adyen.com",
      "description": "Financial technology platform for enterprise businesses.",
      "competition_type": "product_category_overlap",
      "reason": "Both companies offer payment infrastructure and enterprise checkout products.",
      "industry": 40204010
    },
    {
      "name": "PayPal",
      "website": "https://www.paypal.com",
      "description": "Digital payments platform for consumers and merchants.",
      "competition_type": "organic_seo_overlap",
      "reason": "Both companies rank for overlapping payments-related organic search terms.",
      "industry": 40204010
    }
  ],
  "next_page": null
}

Expected scored output with evidence retained:

Account Evidence Fit Relationship Timing Total
Adyen product_category_overlap 0.70 0.72 0.20 0.5820
Checkout.com product_category_overlap 0.70 0.72 0.20 0.5820
PayPal organic_seo_overlap 0.70 0.58 0.20 0.5330

This is the part I like about competitor data when it is explicit. You keep the reason. You do not throw it away. Product overlap usually deserves more weight than shared SEO adjacency because it points to budget competition, not just similar search surfaces.

r/coldemail u/No-Rock-1875 · ▲ 1
A quick spreadsheet filter on the username part plus a lookup of the company’s current employee list ... weeds out a lot of dead leads.

That quote is not about competitors directly, but it points at the same thing: most list quality problems are filtering problems, not enrichment problems.

Loop 3: CRM contacts to similar people

Problem: You already have customer contacts in CRM and want additive 1→N growth.

Solution: Use the Similar People Endpoint from work emails to find similar roles at other relevant companies.

This is the real 1→N motion if your CRM has actual work emails. Not guessed emails. Real ones.

from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectPerson

client = NinjaPearClient()

for work_email in customer_contact_emails:
    similar_people = client.get_similar_people(work_email=work_email)
    for person in similar_people["results"]:
        prospect = ProspectPerson.from_similar_person(person)
        if not is_suppressed_person(prospect):
            save_person(prospect)

Sample response:

{
  "results": [
    {
      "full_name": "Will Cannon",
      "first_name": "Will",
      "last_name": "Cannon",
      "bio": "Founder building B2B lead generation software.",
      "work_email": "[email protected]",
      "role": "Founder & CEO",
      "company_name": "UpLead",
      "company_website": "https://uplead.com",
      "city": "Walnut",
      "country": "US",
      "x_handle": "willcannon",
      "input_role": "Founder & CEO"
    },
    {
      "full_name": "Henry Schuck",
      "work_email": "[email protected]",
      "role": "CEO & Chairman",
      "company_name": "ZoomInfo",
      "company_website": "https://zoominfo.com",
      "city": "Vancouver",
      "country": "US",
      "input_role": "Founder & CEO"
    }
  ]
}

Expected normalized output:

{
  "full_name": "Will Cannon",
  "work_email": "[email protected]",
  "company_website": "https://uplead.com",
  "role": "Founder & CEO",
  "source": "similar_people",
  "source_evidence": [
    "Matched as similar person to Founder & CEO"
  ],
  "account_score": 0.76,
  "person_score": 0.80
}

This is the loop where the system starts to feel good, because you are transferring buyer patterns from known-good contacts instead of guessing titles from a huge database.

The published Similar People benchmarks are useful because they show where the endpoint is strongest:

  • Tim Cook / Apple: 18 attempted, 18 found, 100% yield
  • Elon Musk / Tesla: 11 attempted, 11 found, 100% yield
  • Patrick Collison / Stripe: 19 attempted, 16 found, 84% yield
  • Bryan Irace / Stripe engineering manager: 19 attempted, 12 found, 63% yield
  • Robert Heaton / Stripe MTS: 65 attempted, 36 found, 55% yield

That decay lower in the org chart is normal. Executives are more public. Mid-level people are noisier.

And yes, here is the context-specific example outreach angle from the plan:

“Hey! Your competitor from Company X just joined NinjaPear. It happens that NinjaPear has a feature to extract customers of your company. Would you like to join NinjaPear to also gain an edge against your competitors?”

Use that as an example of context, not default copy. If the agent cannot show the evidence, the rep should not send the email.

Loop 4: Triggers to outreach

Problem: Your lookalikes are plausible, but you still do not know who to contact now.

Solution: Use Company Updates or Monitor signals to prioritize accounts showing real change.

Trigger data is a ranking layer. It is not a prospect source.

from src.clients.ninjapear import NinjaPearClient
from src.outreach import draft_outreach

client = NinjaPearClient()

updates = client.get_company_updates("https://example.com")
for event in updates["results"]:
    if event["category"] in {"website update", "blog", "x"}:
        draft = draft_outreach(account, event)
        save_draft(draft)

Sample response:

{
  "results": [
    {
      "title": "Pricing page updated, new Enterprise tier added",
      "link": "https://example.com/pricing",
      "category": "website update",
      "pub_date": "Thu, 27 Feb 2026 07:00:00 GMT",
      "summary": "Enterprise packaging was added to the pricing page."
    },
    {
      "title": "Announcing global payments expansion",
      "link": "https://example.com/blog/global-payments",
      "category": "blog",
      "pub_date": "Thu, 27 Feb 2026 10:00:00 GMT",
      "summary": "The company announced broader market coverage for payments."
    }
  ]
}

Expected outreach draft:

{
  "subject": "Saw this at ExampleCo",
  "body": "Saw the update: Pricing page updated, new Enterprise tier added. Usually that means the team is changing packaging, priorities, or buyer motion.",
  "evidence": [
    "Returned in customer listing for Stripe",
    "Trigger: Pricing page updated, new Enterprise tier added"
  ],
  "confidence": 0.78,
  "requires_review": true
}

The Company Monitor docs are refreshingly specific on credit usage:

  • 20 weekly targets: ~346 credits/month
  • 10 daily competitor targets: ~1,203 credits/month
  • 5 daily prospect accounts, blog + X only: ~453 credits/month

That is enough to budget the loop without guessing.

That is the right framing. Not all signals are equal. A pricing page change is usually more actionable than another vague “high intent” badge.

Repo structure

Keep this concrete.

Push from the real project root, not from a parent wrapper folder. This sounds obvious because it is obvious. People still get it wrong.

README.md
.env.example
pyproject.toml
data/
  closed_won_accounts.csv
  crm_contacts.csv
  suppression_accounts.csv
  suppression_people.csv
examples/
  sample_customer_listing.json
  sample_competitor_listing.json
  sample_similar_people.json
  sample_updates.json
src/
  config.py
  models.py
  scoring.py
  suppressions.py
  outreach.py
  clients/
    ninjapear.py
  agents/
    coordinator.py
    research_agent.py
    scoring_agent.py
    copy_agent.py
  pipelines/
    loop_competitor_to_customers.py
    loop_crm_to_competitors.py
    loop_contacts_to_similar_people.py
    loop_triggers_to_outreach.py
tests/
  test_scoring.py
  test_suppressions.py

There is no reason to bury the project inside a wrapper folder. If someone clones the repo, they should immediately see README.md, pyproject.toml, src/, data/, and tests/ at the top level.

r/EntrepreneurRideAlong u/TeslaLegacy · ▲ 1
the temptation is to try and replicate the full clay workflow with cheaper pieces but honestly that usually ends up messier than just paying for one tool that does the job. the stacking problem is real though, you end up spending more time maintaining integrations than actually doing outbound.

That is exactly why repo structure matters. If the code path is messy, the workflow will be messy too.

Core Pydantic models

Keep this practical. The point of Pydantic here is not ceremony. It is to force evidence to stay attached to the record.

SeedAccount

Fields: name, website, segment, source, is_closed_won, arr_band

SeedContact

Fields: full_name, work_email, company_website, role, seniority

ProspectAccount

Fields: name, website, industry, source, source_evidence, fit_score, relationship_score, timing_score

ProspectPerson

Fields: full_name, work_email, company_website, role, source, source_evidence, account_score, person_score

OutreachDraft

Fields: subject, body, evidence, confidence, requires_review

from pydantic import BaseModel, HttpUrl
from typing import List, Optional

class ProspectAccount(BaseModel):
    name: str
    website: HttpUrl
    industry: Optional[str] = None
    source: str
    source_evidence: List[str] = []
    fit_score: float = 0.0
    relationship_score: float = 0.0
    timing_score: float = 0.0

If a prospect loses its evidence trail between raw JSON and outbound draft, the model failed.

NinjaPear client wrapper

Use a thin wrapper. Do not build a fake framework.

Also, if you are building this with a coding agent, point it to https://nubela.co/llms-full.txt. The docs are already structured for LLMs and include endpoint coverage, rate limits, pagination, timeout guidance, and examples.

import os
import httpx

class NinjaPearClient:
    def __init__(self, api_key: str | None = None):
        self.api_key = api_key or os.environ["NINJAPEAR_API_KEY"]
        self.base_url = "https://nubela.co"
        self.headers = {"Authorization": f"Bearer {self.api_key}"}

    def get_customer_listing(self, website: str):
        r = httpx.get(
            f"{self.base_url}/api/v1/customer/listing",
            params={"website": website},
            headers=self.headers,
            timeout=100.0,
        )
        r.raise_for_status()
        return r.json()

Then add the obvious follow-up methods for competitor listing, similar people, and company updates.

The API behavior that matters:

  • normal rate limit is 300 requests/minute
  • the effective window is 1,500 requests per 5 minutes
  • trial accounts are limited to 2 requests/minute
  • long-running endpoints can take 30 to 60 seconds
  • recommended timeout is 100 seconds
  • 429 needs exponential backoff
  • 404 is charged, failed requests are otherwise not charged

Those details change how you write the client. They are not side notes.

Scoring and suppressions

One short intro and then code.

Make scoring simple enough that a human can understand it.

def score_account(account) -> float:
    return round(
        (account.fit_score * 0.40)
        + (account.relationship_score * 0.35)
        + (account.timing_score * 0.25),
        4,
    )

That weighting is fine as a default because it gives fit first position, gives relationship data real weight, and stops timing from hijacking the list.

def is_suppressed_account(account, suppression_websites: set[str]) -> bool:
    return str(account.website) in suppression_websites

Suppress before enrichment:

  • existing customers
  • open opps
  • churned accounts you should not re-enter yet
  • partners
  • agencies
  • internal domains
  • obvious bad domains

If you enrich first and suppress later, you are paying to make junk more detailed.

I am also skeptical of black-box intent. If the system cannot tell you why an account is hot, the score is decorative.

Outreach generation

The copy agent should only use evidence the system can show.

SYSTEM_PROMPT = """
Write short outbound emails using only the supplied evidence.
Do not invent facts.
If evidence is weak, say so and mark requires_review=true.
"""
def build_evidence_block(account, event=None):
    evidence = list(account.source_evidence)
    if event:
        evidence.append(f"Trigger: {event['title']}")
    return evidence

That is the whole rule. Do not let the model improvise facts because it has a good tone. Tone does not save a false claim.

How to run the project

Practical checklist:

uv venv
source .venv/bin/activate
uv pip install -e .
cp .env.example .env
export NINJAPEAR_API_KEY=your_key_here
python -m src.pipelines.loop_competitor_to_customers
python -m src.pipelines.loop_crm_to_competitors
python -m src.pipelines.loop_contacts_to_similar_people
python -m src.pipelines.loop_triggers_to_outreach

And yes, inspect the full endpoint docs in https://nubela.co/llms-full.txt when wiring parameters, pagination, retries, and endpoint-specific schemas. The article gives you the architecture. The docs give you the sharp edges.

What to measure

Track this by loop source, not just in aggregate:

  • suppression rate
  • enrichment rate
  • reply rate
  • meeting rate
  • opportunity rate
  • average evidence depth per record

If Similar People produces fewer rows but twice the meetings, that is the better loop.

Mistakes to avoid

Keep this short.

  • giant blended seed lists
  • enriching too early
  • no suppression layer
  • blind trust in black-box intent
  • full auto before review
  • outreach that cites evidence the system cannot prove
  • pushing a GitHub repo with the real project buried in a nested folder like some kind of maniac

Most failures here are not model failures. They are operating mistakes.

If you want the right next step, clone the repo, point your coding agent at https://nubela.co/llms-full.txt, and run the four loops against your own closed-won seeds first. That will tell you more than another week spent tuning a giant dirty list.

Alex Meyer
Alex Meyer is a patterns-obsessed growth architect. As Head of GTM at NinjaPear, he leads the charge in building the actual intelligence layer that modern B2B teams use to win.

Featured Articles

Here's what we've been up to recently.

I dismissed someone, and it was not because of COVID19

The cadence of delivery. Last month, I dismissed the employment of a software developer who oversold himself during the interview phase. He turned out to be on the lowest rung of the software engineers in my company. Not being good enough is not a reason to be dismissed. But not

sharedhere

I got blocked from posting on Facebook

I tried sharing some news on Facebook today, and I got blocked from posting in other groups. I had figured that I needed a better growth engine instead of over-sharing on Facebook, so I spent the morning planning the new growth engine. Growth Hacking I term what I do in