Introducing Work Email Data (+ other QOL improvements) Learn more

Ultimate Guide to Lookalike Prospecting (Code Snippets + GitHub Project)
look alike prospecting

Ultimate Guide to Lookalike Prospecting (Code Snippets + GitHub Project)

Agentic Prospecting Playground
Static examples, real NinjaPear-shaped payloads. Enter a company site or work email, switch loops, and inspect the request, response, normalized model, and next action.
Sample curl request

      
    
Sample Python snippet

      
    
Sample JSON response

    
Normalized Pydantic model

      
Next agent action
Sample outreach angle

A customer sought out recommendations for his problem: “I have been experimenting with NinjaPear using my AI agents to find investors and leads for my company.” In this article, I’m going to show you how to build a full agentic lead generation system with PydanticAI, NinjaPear, and a small set of supporting tools, including the exact 4 loops, the code structure, and the GitHub project you can ship.

“still shows people at companies they left months ago, titles are outdated, emails bouncing even after running verification. spending more time cleaning lists than actually doing outreach.”

Can't use Apollo anymore, whats a better alternative for prospecting?
by u/executivegtm-47 in Sales_Professionals
💻 Full code on GitHub: look-alike-prospecting
A runnable starter repo with the 4 loops, Pydantic models, sample NinjaPear-shaped payloads, CSV seeds, suppressions, and tests.
git clone https://github.com/NinjaPear-Shares/look-alike-prospecting.git
View on GitHub →
📥 Free download: Agentic Lookalike Prospecting Starter Kit
A real XLSX workbook with seed templates, suppression sheets, scoring defaults, experiment tracking, and the outreach prompt rules from this guide.
Download now →

What this guide does

This is a developer guide for building lookalike prospecting into an agentic SDR system.

It covers four loops:

  • Competitor → Customers: turn one known company into additive prospect accounts.
  • CRM Account → Competitors: widen your account universe from closed-won seeds.
  • CRM Contact → Similar People: turn one good contact into many role-adjacent people at relevant companies.
  • Company → Updates: rank the best prospects by visible timing signals.

That’s what I’m going to walk through, code first, with sample responses, normalization logic, and the operating rules that keep the whole thing from turning into expensive bullshit.

Lookalike prospecting, without the bullshit

Lookalike prospecting is just generating new accounts or people that resemble proven wins across fit, context, and timing. Not “same employee count” and not “same tech stack, probably.” Most so-called lookalike systems are just firmographic cloning with an AI sticker slapped on top.

They fail for boring reasons. Dirty seeds. Weak signals. No suppression layer. Opaque scoring nobody can defend when a rep asks why some random account got pushed to the top. When I was running FluxoMetric, I burned ~4K/mo on tools that gave me worse targeting logic than a spreadsheet with three weighted columns.

Clean input looks like this: start with closed-won first, split by use case, then exclude customers, churn, open opps, partners, agencies, and test junk before you enrich anything. A 20-account clean seed beats a 2,000-account dirty seed. Every time.

And the signal hierarchy is not equal. Firmographics are table stakes. Technographics add context. Relationship data is stronger. Trigger data handles timing. In practice, relationship data beats generic similarity most of the time, and trigger data should rank prospects, not create them from thin air.

“It still shows people listed at companies they left months ago, titles that are outdated and emails bouncing even after running verification”

Best Apollo alternative for prospecting in 2026?
by u/BessieFlamboyant in coldemail

The 4 agent loops

My preferred lookalike prospecting system has 4 loops, because each one solves a different problem.

  1. Competitor to customers gives you fast account expansion.
  2. CRM accounts to competitors gives you the cleanest 0→1 market widening.
  3. CRM contacts to similar people gives you the real 1→N motion.
  4. Triggers to outreach gives you timing.

Do not jam all four into one giant workflow on day one. That’s how you end up with a fragile automation blob nobody trusts.

Loop 1: Competitor to customers

Problem: You know a competitor or adjacent company and want a prospect list fast.

Solution: Use the Customer Listing API to find companies already buying from that vendor or sitting in its ecosystem. This is the fastest path from one website to a prospect universe that doesn’t feel made up.

from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectAccount

client = NinjaPearClient()
response = client.get_customer_listing("https://stripe.com")
accounts = [
    ProspectAccount.from_customer_listing(item, source="customer_listing")
    for item in response["customers"]
]

Sample response, using the same shape as NinjaPear docs:

{
  "customers": [
    {
      "name": "Apple",
      "description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
      "tagline": "Think different.",
      "website": "https://www.apple.com",
      "company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.apple.com",
      "id": "abc123",
      "industry": 45202030,
      "specialties": ["Technology", "Consumer Electronics"],
      "x_profile": "https://x.com/Apple"
    }
  ],
  "investors": [
    {
      "name": "Sequoia Capital",
      "website": "https://www.sequoiacap.com",
      "id": "def456",
      "industry": 40203010
    }
  ],
  "partner_platforms": [
    {
      "name": "Amazon Web Services",
      "website": "https://aws.amazon.com",
      "id": "ghi789",
      "industry": 45101010
    }
  ],
  "next_page": "https://nubela.co/api/v1/customer/listing?website=https://www.stripe.com&cursor=abc123"
}

The flow is simple:

  • input website
  • customer list
  • normalize into ProspectAccount
  • suppress existing CRM accounts
  • score

Expected normalized output:

{
  "name": "Apple",
  "website": "https://www.apple.com",
  "industry": "45202030",
  "source": "customer_listing",
  "source_evidence": [
    "Returned by customer_listing",
    "Company id: abc123"
  ],
  "fit_score": 0.65,
  "relationship_score": 0.85,
  "timing_score": 0.25,
  "total_score": 0.6225
}

Outreach angle: You already sell in the same ecosystem as Stripe. We’re not guessing fit from generic firmographic filters.

A few doc details matter here. The endpoint costs 1 credit per request + 2 credits per company returned. quality_filter=true is on by default, which filters out junk TLDs and unreachable websites. That one flag alone saves a stupid amount of cleanup.

Loop 2: CRM accounts to competitors

Problem: You have closed-won accounts in CRM and want to widen the account universe around them.

Solution: Use the Competitor Listing Endpoint on each CRM account website, merge results, dedupe, suppress, score.

This is the cleanest 0→1 account expansion loop. Why? Because it starts from accounts you already know you can win.

from src.clients.ninjapear import NinjaPearClient
from src.scoring import score_account

client = NinjaPearClient()

for website in seed_account_websites:
    competitors = client.get_competitor_listing(website)
    for comp in competitors["competitors"]:
        scored = score_account(comp, source="competitor_listing")
        if scored.total_score >= 0.72:
            save_candidate(scored)

Sample response:

{
  "competitors": [
    {
      "name": "Adyen",
      "website": "https://www.adyen.com",
      "description": "Financial technology platform for enterprise businesses.",
      "competition_type": "product_category_overlap",
      "reason": "Both companies offer payment infrastructure and enterprise checkout products.",
      "industry": 40204010
    },
    {
      "name": "PayPal",
      "website": "https://www.paypal.com",
      "description": "Digital payments platform for consumers and merchants.",
      "competition_type": "organic_seo_overlap",
      "reason": "Both companies rank for overlapping payments-related organic search terms.",
      "industry": 40204010
    }
  ],
  "next_page": null
}

Expected scored output with evidence retained:

Account Evidence Fit Relationship Timing Total
Adyen product_category_overlap 0.70 0.72 0.20 0.5820
Checkout.com product_category_overlap 0.70 0.72 0.20 0.5820
PayPal organic_seo_overlap 0.70 0.58 0.20 0.5330

That split matters. Product category overlap usually beats organic SEO overlap because it points to actual budget competition, not just shared keywords.

“The stale data problem you're describing isn't really an Apollo problem, it's a database problem... by the time you're reaching out the information is already months old.”

Can't use Apollo anymore, whats a better alternative for prospecting?
by u/executivegtm-47 in Sales_Professionals

That comment is exactly why this loop works. It does not pretend an old giant contact database is strategy. It starts from companies you’ve already proven are close to your real market.

Loop 3: CRM contacts to similar people

Problem: You already have customer contacts in CRM and want additive 1→N growth.

Solution: Use the Similar People Endpoint from work emails to find similar roles at other relevant companies.

This is the strongest loop in the whole stack if your CRM has good work emails. Similar People is the real 1→N motion.

from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectPerson

client = NinjaPearClient()

for work_email in customer_contact_emails:
    similar_people = client.get_similar_people(work_email=work_email)
    for person in similar_people["results"]:
        prospect = ProspectPerson.from_similar_person(person)
        if not is_suppressed_person(prospect):
            save_person(prospect)

Sample response:

{
  "results": [
    {
      "full_name": "Will Cannon",
      "first_name": "Will",
      "last_name": "Cannon",
      "bio": "Founder building B2B lead generation software.",
      "work_email": "[email protected]",
      "role": "Founder & CEO",
      "company_name": "UpLead",
      "company_website": "https://uplead.com",
      "city": "Walnut",
      "country": "US",
      "x_handle": "willcannon",
      "input_role": "Founder & CEO"
    },
    {
      "full_name": "Henry Schuck",
      "work_email": "[email protected]",
      "role": "CEO & Chairman",
      "company_name": "ZoomInfo",
      "company_website": "https://zoominfo.com",
      "city": "Vancouver",
      "country": "US",
      "input_role": "Founder & CEO"
    }
  ]
}

Expected normalized output:

{
  "full_name": "Will Cannon",
  "work_email": "[email protected]",
  "company_website": "https://uplead.com",
  "role": "Founder & CEO",
  "source": "similar_people",
  "source_evidence": [
    "Matched as similar person to Founder & CEO"
  ],
  "account_score": 0.76,
  "person_score": 0.80
}

And yes, here’s the context-specific outreach example from the plan:

“Hey! Your competitor from Company X just joined NinjaPear. It happens that NinjaPear has a feature to extract customers of your company. Would you like to join NinjaPear to also gain an edge against your competitors?”

I would never ship that as default copy. I’d use it only when the account is already strong and the underlying evidence is real. If the agent cannot show evidence, the rep should not send the email.

NinjaPear’s published Similar People benchmarks are actually useful here. The launch post showed:

  • Tim Cook / Apple: 18 attempted, 18 found, 100% yield
  • Elon Musk / Tesla: 11 attempted, 11 found, 100% yield
  • Patrick Collison / Stripe: 19 attempted, 16 found, 84% yield
  • Bryan Irace / Stripe engineering manager: 19 attempted, 12 found, 63% yield
  • Robert Heaton / Stripe MTS: 65 attempted, 36 found, 55% yield

That drop lower in the org chart is normal. Public executives are easier. Mid-level humans are messier.

Loop 4: Triggers to outreach

Problem: Your lookalikes are plausible, but you still do not know who to contact now.

Solution: Use Company Updates or Monitor signals to prioritize accounts showing real change.

Trigger data should rank prospects, not create them.

from src.clients.ninjapear import NinjaPearClient
from src.outreach import draft_outreach

client = NinjaPearClient()

updates = client.get_company_updates("https://example.com")
for event in updates["results"]:
    if event["category"] in {"website update", "blog", "x"}:
        draft = draft_outreach(account, event)
        save_draft(draft)

Sample response:

{
  "results": [
    {
      "title": "Pricing page updated, new Enterprise tier added",
      "link": "https://example.com/pricing",
      "category": "website update",
      "pub_date": "Thu, 27 Feb 2026 07:00:00 GMT",
      "summary": "Enterprise packaging was added to the pricing page."
    },
    {
      "title": "Announcing global payments expansion",
      "link": "https://example.com/blog/global-payments",
      "category": "blog",
      "pub_date": "Thu, 27 Feb 2026 10:00:00 GMT",
      "summary": "The company announced broader market coverage for payments."
    }
  ]
}

Expected outreach draft:

{
  "subject": "Saw this at ExampleCo",
  "body": "Saw the update: Pricing page updated, new Enterprise tier added. Usually that means the team is changing packaging, priorities, or buyer motion.",
  "evidence": [
    "Returned in customer listing for Stripe",
    "Trigger: Pricing page updated, new Enterprise tier added"
  ],
  "confidence": 0.78,
  "requires_review": true
}

This is the prioritization layer. Not the discovery layer.

The pricing examples in the Company Monitor launch post are refreshingly concrete:

  • 20 weekly targets: ~346 credits/month
  • 10 daily competitor targets: ~1,203 credits/month
  • 5 daily prospect accounts, blog + X only: ~453 credits/month

That’s enough to actually budget the loop, which is more than I can say for most “intent” products.

Repo structure

If you’re publishing a code tutorial, the repo packaging matters more than people think.

Push from the real project root, not from a parent wrapper folder. This sounds stupidly obvious, but people screw it up constantly.

README.md
.env.example
pyproject.toml
data/
  closed_won_accounts.csv
  crm_contacts.csv
  suppression_accounts.csv
  suppression_people.csv
examples/
  sample_customer_listing.json
  sample_competitor_listing.json
  sample_similar_people.json
  sample_updates.json
src/
  config.py
  models.py
  scoring.py
  suppressions.py
  outreach.py
  clients/
    ninjapear.py
  agents/
    coordinator.py
    research_agent.py
    scoring_agent.py
    copy_agent.py
  pipelines/
    loop_competitor_to_customers.py
    loop_crm_to_competitors.py
    loop_contacts_to_similar_people.py
    loop_triggers_to_outreach.py
tests/
  test_scoring.py
  test_suppressions.py

I created the public repo for this article here:

💻 Full code on GitHub: look-alike-prospecting
Project root includes `README.md`, `pyproject.toml`, `src/`, `data/`, and `tests/`, exactly how it should.
git clone https://github.com/NinjaPear-Shares/look-alike-prospecting.git
View on GitHub →

“At your volume with stable workflows you're just paying a premium for a pretty UI at this point... we started moving orchestration to n8n two months ago and haven't looked back.”

Is Clay still worth it after the new pricing changes?
by u/noobCoder00101 in gtmengineering

That’s the workflow overhead problem in one sentence. Clay can be useful. But once your logic stabilizes, paying orchestration tax forever gets old fast.

Core Pydantic models

This should stay practical. Pydantic is useful here because it forces your system to carry evidence, not just vibes.

SeedAccount

Fields: name, website, segment, source, is_closed_won, arr_band

SeedContact

Fields: full_name, work_email, company_website, role, seniority

ProspectAccount

Fields: name, website, industry, source, source_evidence, fit_score, relationship_score, timing_score

ProspectPerson

Fields: full_name, work_email, company_website, role, source, source_evidence, account_score, person_score

OutreachDraft

Fields: subject, body, evidence, confidence, requires_review

from pydantic import BaseModel, HttpUrl
from typing import List, Optional

class ProspectAccount(BaseModel):
    name: str
    website: HttpUrl
    industry: Optional[str] = None
    source: str
    source_evidence: List[str] = []
    fit_score: float = 0.0
    relationship_score: float = 0.0
    timing_score: float = 0.0

The schema is not the interesting part. The important part is that every prospect drags its evidence trail with it all the way to the rep-facing output.

NinjaPear client wrapper

Use a thin wrapper. Don’t build a fake framework when a small client class will do.

Also, if you’re using an AI coding agent, point it to https://nubela.co/llms-full.txt. It already contains the endpoint references, rate limits, errors, examples, and enough procedural detail to stop your agent from doing dumb things.

import os
import httpx

class NinjaPearClient:
    def __init__(self, api_key: str | None = None):
        self.api_key = api_key or os.environ["NINJAPEAR_API_KEY"]
        self.base_url = "https://nubela.co"
        self.headers = {"Authorization": f"Bearer {self.api_key}"}

    def get_customer_listing(self, website: str):
        r = httpx.get(
            f"{self.base_url}/api/v1/customer/listing",
            params={"website": website},
            headers=self.headers,
            timeout=100.0,
        )
        r.raise_for_status()
        return r.json()

Then add the obvious follow-ups for competitor listing, similar people, and company updates. Keep it boring.

A few implementation details from the docs are non-negotiable:

  • normal rate limit is 300 requests/minute
  • the rate-limit window is 5 minutes, so burst is 1,500 requests per 5 minutes
  • long-running endpoints can take 30 to 60 seconds
  • recommended timeout is 100 seconds
  • handle 429 with exponential backoff

That’s real API behavior, not theory.

Scoring and suppressions

This section is where teams overcomplicate things and disappear up their own asses.

Keep the score simple enough that a rep can understand it.

def score_account(account) -> float:
    return round(
        (account.fit_score * 0.40)
        + (account.relationship_score * 0.35)
        + (account.timing_score * 0.25),
        4,
    )

And keep suppressions brutally simple too.

def is_suppressed_account(account, suppression_websites: set[str]) -> bool:
    return str(account.website) in suppression_websites

Suppress these before enrichment:

  • existing customers
  • churned accounts
  • open opps
  • partners
  • agencies
  • internal test domains
  • bad domains

If you enrich first and suppress later, you just paid to polish garbage.

I also do not trust black-box intent. Most intent products cannot explain why an account is “hot” in a way a rep can actually use. That’s not intelligence. That’s horoscope software for RevOps.

Outreach generation

The copy agent should only use evidence the system can show.

SYSTEM_PROMPT = """
Write short outbound emails using only the supplied evidence.
Do not invent facts.
If evidence is weak, say so and mark requires_review=true.
"""
def build_evidence_block(account, event=None):
    evidence = list(account.source_evidence)
    if event:
        evidence.append(f"Trigger: {event['title']}")
    return evidence

That rule sounds strict because it should be. I’ve seen too many teams let a model freestyle outreach from thin air, then wonder why reps stop trusting the system.

How to run the project

Here’s the practical checklist.

uv venv
source .venv/bin/activate
uv pip install -e .
cp .env.example .env
export NINJAPEAR_API_KEY=your_key_here
python -m src.pipelines.loop_competitor_to_customers
python -m src.pipelines.loop_crm_to_competitors
python -m src.pipelines.loop_contacts_to_similar_people
python -m src.pipelines.loop_triggers_to_outreach

And yes, inspect the full endpoint docs in https://nubela.co/llms-full.txt when wiring parameters, pagination, retries, and endpoint-specific schemas. A blog post should teach you the architecture. It should not pretend to replace the whole API reference.

What to measure

Track performance by loop, not just in aggregate.

  • loop source
  • suppression rate
  • enrichment rate
  • reply rate
  • meeting rate
  • opportunity rate

If your Similar People loop produces fewer rows but 2x the meeting rate of your competitor loop, then congratulations, you found your real motion. Stop worshipping list size.

Mistakes to avoid

Keep this section punchy because the mistakes are obvious.

  • giant blended seed lists
  • enriching too early
  • no suppression layer
  • blind trust in black-box intent
  • full auto before review
  • outreach that cites evidence the system cannot prove
  • pushing a GitHub repo with the real project buried in a nested folder like some kind of maniac

A lot of lookalike prospecting systems fail because the operator wants automation before clarity. That order is backwards.

📥 Free download: Agentic Lookalike Prospecting Starter Kit
Grab the workbook: seeds, suppressions, scoring defaults, experiment tracking, and outreach prompt rules, all in one place.
Download now →

If you want the right next step, it’s not “go buy more data.” Clone the repo, run the four loops against your own closed-won seeds, and inspect the evidence trail for every prospect the system produces. That’s how you build lookalike prospecting that a real sales team will actually trust.

Alex Meyer
Alex Meyer is a patterns-obsessed growth architect. As Head of GTM at NinjaPear, he leads the charge in building the actual intelligence layer that modern B2B teams use to win.

Featured Articles

Here's what we've been up to recently.

I dismissed someone, and it was not because of COVID19

The cadence of delivery. Last month, I dismissed the employment of a software developer who oversold himself during the interview phase. He turned out to be on the lowest rung of the software engineers in my company. Not being good enough is not a reason to be dismissed. But not

sharedhere

I got blocked from posting on Facebook

I tried sharing some news on Facebook today, and I got blocked from posting in other groups. I had figured that I needed a better growth engine instead of over-sharing on Facebook, so I spent the morning planning the new growth engine. Growth Hacking I term what I do in