A customer sought out recommendations for his problem: “I have been experimenting with NinjaPear using my AI agents to find investors and leads for my company.” In this article, I’m going to show you how to build a full agentic lead generation system with PydanticAI, NinjaPear, and a small set of supporting tools, including the exact 4 loops, the code structure, and the GitHub project you can ship.
“still shows people at companies they left months ago, titles are outdated, emails bouncing even after running verification. spending more time cleaning lists than actually doing outreach.”
Can't use Apollo anymore, whats a better alternative for prospecting?
by u/executivegtm-47 in Sales_Professionals
A runnable starter repo with the 4 loops, Pydantic models, sample NinjaPear-shaped payloads, CSV seeds, suppressions, and tests.
git clone https://github.com/NinjaPear-Shares/look-alike-prospecting.gitView on GitHub →
A real XLSX workbook with seed templates, suppression sheets, scoring defaults, experiment tracking, and the outreach prompt rules from this guide.
Download now →
What this guide does
This is a developer guide for building lookalike prospecting into an agentic SDR system.
It covers four loops:
- Competitor → Customers: turn one known company into additive prospect accounts.
- CRM Account → Competitors: widen your account universe from closed-won seeds.
- CRM Contact → Similar People: turn one good contact into many role-adjacent people at relevant companies.
- Company → Updates: rank the best prospects by visible timing signals.
That’s what I’m going to walk through, code first, with sample responses, normalization logic, and the operating rules that keep the whole thing from turning into expensive bullshit.
Lookalike prospecting, without the bullshit
Lookalike prospecting is just generating new accounts or people that resemble proven wins across fit, context, and timing. Not “same employee count” and not “same tech stack, probably.” Most so-called lookalike systems are just firmographic cloning with an AI sticker slapped on top.
They fail for boring reasons. Dirty seeds. Weak signals. No suppression layer. Opaque scoring nobody can defend when a rep asks why some random account got pushed to the top. When I was running FluxoMetric, I burned ~4K/mo on tools that gave me worse targeting logic than a spreadsheet with three weighted columns.
Clean input looks like this: start with closed-won first, split by use case, then exclude customers, churn, open opps, partners, agencies, and test junk before you enrich anything. A 20-account clean seed beats a 2,000-account dirty seed. Every time.
And the signal hierarchy is not equal. Firmographics are table stakes. Technographics add context. Relationship data is stronger. Trigger data handles timing. In practice, relationship data beats generic similarity most of the time, and trigger data should rank prospects, not create them from thin air.
“It still shows people listed at companies they left months ago, titles that are outdated and emails bouncing even after running verification”
Best Apollo alternative for prospecting in 2026?
by u/BessieFlamboyant in coldemail
The 4 agent loops
My preferred lookalike prospecting system has 4 loops, because each one solves a different problem.
- Competitor to customers gives you fast account expansion.
- CRM accounts to competitors gives you the cleanest 0→1 market widening.
- CRM contacts to similar people gives you the real 1→N motion.
- Triggers to outreach gives you timing.
Do not jam all four into one giant workflow on day one. That’s how you end up with a fragile automation blob nobody trusts.
Loop 1: Competitor to customers
Problem: You know a competitor or adjacent company and want a prospect list fast.
Solution: Use the Customer Listing API to find companies already buying from that vendor or sitting in its ecosystem. This is the fastest path from one website to a prospect universe that doesn’t feel made up.
from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectAccount
client = NinjaPearClient()
response = client.get_customer_listing("https://stripe.com")
accounts = [
ProspectAccount.from_customer_listing(item, source="customer_listing")
for item in response["customers"]
]
Sample response, using the same shape as NinjaPear docs:
{
"customers": [
{
"name": "Apple",
"description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
"tagline": "Think different.",
"website": "https://www.apple.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.apple.com",
"id": "abc123",
"industry": 45202030,
"specialties": ["Technology", "Consumer Electronics"],
"x_profile": "https://x.com/Apple"
}
],
"investors": [
{
"name": "Sequoia Capital",
"website": "https://www.sequoiacap.com",
"id": "def456",
"industry": 40203010
}
],
"partner_platforms": [
{
"name": "Amazon Web Services",
"website": "https://aws.amazon.com",
"id": "ghi789",
"industry": 45101010
}
],
"next_page": "https://nubela.co/api/v1/customer/listing?website=https://www.stripe.com&cursor=abc123"
}
The flow is simple:
- input website
- customer list
- normalize into
ProspectAccount - suppress existing CRM accounts
- score
Expected normalized output:
{
"name": "Apple",
"website": "https://www.apple.com",
"industry": "45202030",
"source": "customer_listing",
"source_evidence": [
"Returned by customer_listing",
"Company id: abc123"
],
"fit_score": 0.65,
"relationship_score": 0.85,
"timing_score": 0.25,
"total_score": 0.6225
}
Outreach angle: You already sell in the same ecosystem as Stripe. We’re not guessing fit from generic firmographic filters.
A few doc details matter here. The endpoint costs 1 credit per request + 2 credits per company returned. quality_filter=true is on by default, which filters out junk TLDs and unreachable websites. That one flag alone saves a stupid amount of cleanup.
Loop 2: CRM accounts to competitors
Problem: You have closed-won accounts in CRM and want to widen the account universe around them.
Solution: Use the Competitor Listing Endpoint on each CRM account website, merge results, dedupe, suppress, score.
This is the cleanest 0→1 account expansion loop. Why? Because it starts from accounts you already know you can win.
from src.clients.ninjapear import NinjaPearClient
from src.scoring import score_account
client = NinjaPearClient()
for website in seed_account_websites:
competitors = client.get_competitor_listing(website)
for comp in competitors["competitors"]:
scored = score_account(comp, source="competitor_listing")
if scored.total_score >= 0.72:
save_candidate(scored)
Sample response:
{
"competitors": [
{
"name": "Adyen",
"website": "https://www.adyen.com",
"description": "Financial technology platform for enterprise businesses.",
"competition_type": "product_category_overlap",
"reason": "Both companies offer payment infrastructure and enterprise checkout products.",
"industry": 40204010
},
{
"name": "PayPal",
"website": "https://www.paypal.com",
"description": "Digital payments platform for consumers and merchants.",
"competition_type": "organic_seo_overlap",
"reason": "Both companies rank for overlapping payments-related organic search terms.",
"industry": 40204010
}
],
"next_page": null
}
Expected scored output with evidence retained:
| Account | Evidence | Fit | Relationship | Timing | Total |
|---|---|---|---|---|---|
| Adyen | product_category_overlap |
0.70 | 0.72 | 0.20 | 0.5820 |
| Checkout.com | product_category_overlap |
0.70 | 0.72 | 0.20 | 0.5820 |
| PayPal | organic_seo_overlap |
0.70 | 0.58 | 0.20 | 0.5330 |
That split matters. Product category overlap usually beats organic SEO overlap because it points to actual budget competition, not just shared keywords.
“The stale data problem you're describing isn't really an Apollo problem, it's a database problem... by the time you're reaching out the information is already months old.”
Can't use Apollo anymore, whats a better alternative for prospecting?
by u/executivegtm-47 in Sales_Professionals
That comment is exactly why this loop works. It does not pretend an old giant contact database is strategy. It starts from companies you’ve already proven are close to your real market.
Loop 3: CRM contacts to similar people
Problem: You already have customer contacts in CRM and want additive 1→N growth.
Solution: Use the Similar People Endpoint from work emails to find similar roles at other relevant companies.
This is the strongest loop in the whole stack if your CRM has good work emails. Similar People is the real 1→N motion.
from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectPerson
client = NinjaPearClient()
for work_email in customer_contact_emails:
similar_people = client.get_similar_people(work_email=work_email)
for person in similar_people["results"]:
prospect = ProspectPerson.from_similar_person(person)
if not is_suppressed_person(prospect):
save_person(prospect)
Sample response:
{
"results": [
{
"full_name": "Will Cannon",
"first_name": "Will",
"last_name": "Cannon",
"bio": "Founder building B2B lead generation software.",
"work_email": "[email protected]",
"role": "Founder & CEO",
"company_name": "UpLead",
"company_website": "https://uplead.com",
"city": "Walnut",
"country": "US",
"x_handle": "willcannon",
"input_role": "Founder & CEO"
},
{
"full_name": "Henry Schuck",
"work_email": "[email protected]",
"role": "CEO & Chairman",
"company_name": "ZoomInfo",
"company_website": "https://zoominfo.com",
"city": "Vancouver",
"country": "US",
"input_role": "Founder & CEO"
}
]
}
Expected normalized output:
{
"full_name": "Will Cannon",
"work_email": "[email protected]",
"company_website": "https://uplead.com",
"role": "Founder & CEO",
"source": "similar_people",
"source_evidence": [
"Matched as similar person to Founder & CEO"
],
"account_score": 0.76,
"person_score": 0.80
}
And yes, here’s the context-specific outreach example from the plan:
“Hey! Your competitor from Company X just joined NinjaPear. It happens that NinjaPear has a feature to extract customers of your company. Would you like to join NinjaPear to also gain an edge against your competitors?”
I would never ship that as default copy. I’d use it only when the account is already strong and the underlying evidence is real. If the agent cannot show evidence, the rep should not send the email.
NinjaPear’s published Similar People benchmarks are actually useful here. The launch post showed:
- Tim Cook / Apple: 18 attempted, 18 found, 100% yield
- Elon Musk / Tesla: 11 attempted, 11 found, 100% yield
- Patrick Collison / Stripe: 19 attempted, 16 found, 84% yield
- Bryan Irace / Stripe engineering manager: 19 attempted, 12 found, 63% yield
- Robert Heaton / Stripe MTS: 65 attempted, 36 found, 55% yield
That drop lower in the org chart is normal. Public executives are easier. Mid-level humans are messier.
Loop 4: Triggers to outreach
Problem: Your lookalikes are plausible, but you still do not know who to contact now.
Solution: Use Company Updates or Monitor signals to prioritize accounts showing real change.
Trigger data should rank prospects, not create them.
from src.clients.ninjapear import NinjaPearClient
from src.outreach import draft_outreach
client = NinjaPearClient()
updates = client.get_company_updates("https://example.com")
for event in updates["results"]:
if event["category"] in {"website update", "blog", "x"}:
draft = draft_outreach(account, event)
save_draft(draft)
Sample response:
{
"results": [
{
"title": "Pricing page updated, new Enterprise tier added",
"link": "https://example.com/pricing",
"category": "website update",
"pub_date": "Thu, 27 Feb 2026 07:00:00 GMT",
"summary": "Enterprise packaging was added to the pricing page."
},
{
"title": "Announcing global payments expansion",
"link": "https://example.com/blog/global-payments",
"category": "blog",
"pub_date": "Thu, 27 Feb 2026 10:00:00 GMT",
"summary": "The company announced broader market coverage for payments."
}
]
}
Expected outreach draft:
{
"subject": "Saw this at ExampleCo",
"body": "Saw the update: Pricing page updated, new Enterprise tier added. Usually that means the team is changing packaging, priorities, or buyer motion.",
"evidence": [
"Returned in customer listing for Stripe",
"Trigger: Pricing page updated, new Enterprise tier added"
],
"confidence": 0.78,
"requires_review": true
}
This is the prioritization layer. Not the discovery layer.
The pricing examples in the Company Monitor launch post are refreshingly concrete:
- 20 weekly targets: ~346 credits/month
- 10 daily competitor targets: ~1,203 credits/month
- 5 daily prospect accounts, blog + X only: ~453 credits/month
That’s enough to actually budget the loop, which is more than I can say for most “intent” products.
Repo structure
If you’re publishing a code tutorial, the repo packaging matters more than people think.
Push from the real project root, not from a parent wrapper folder. This sounds stupidly obvious, but people screw it up constantly.
README.md
.env.example
pyproject.toml
data/
closed_won_accounts.csv
crm_contacts.csv
suppression_accounts.csv
suppression_people.csv
examples/
sample_customer_listing.json
sample_competitor_listing.json
sample_similar_people.json
sample_updates.json
src/
config.py
models.py
scoring.py
suppressions.py
outreach.py
clients/
ninjapear.py
agents/
coordinator.py
research_agent.py
scoring_agent.py
copy_agent.py
pipelines/
loop_competitor_to_customers.py
loop_crm_to_competitors.py
loop_contacts_to_similar_people.py
loop_triggers_to_outreach.py
tests/
test_scoring.py
test_suppressions.py
I created the public repo for this article here:
Project root includes `README.md`, `pyproject.toml`, `src/`, `data/`, and `tests/`, exactly how it should.
git clone https://github.com/NinjaPear-Shares/look-alike-prospecting.gitView on GitHub →
“At your volume with stable workflows you're just paying a premium for a pretty UI at this point... we started moving orchestration to n8n two months ago and haven't looked back.”
Is Clay still worth it after the new pricing changes?
by u/noobCoder00101 in gtmengineering
That’s the workflow overhead problem in one sentence. Clay can be useful. But once your logic stabilizes, paying orchestration tax forever gets old fast.
Core Pydantic models
This should stay practical. Pydantic is useful here because it forces your system to carry evidence, not just vibes.
SeedAccount
Fields: name, website, segment, source, is_closed_won, arr_band
SeedContact
Fields: full_name, work_email, company_website, role, seniority
ProspectAccount
Fields: name, website, industry, source, source_evidence, fit_score, relationship_score, timing_score
ProspectPerson
Fields: full_name, work_email, company_website, role, source, source_evidence, account_score, person_score
OutreachDraft
Fields: subject, body, evidence, confidence, requires_review
from pydantic import BaseModel, HttpUrl
from typing import List, Optional
class ProspectAccount(BaseModel):
name: str
website: HttpUrl
industry: Optional[str] = None
source: str
source_evidence: List[str] = []
fit_score: float = 0.0
relationship_score: float = 0.0
timing_score: float = 0.0
The schema is not the interesting part. The important part is that every prospect drags its evidence trail with it all the way to the rep-facing output.
NinjaPear client wrapper
Use a thin wrapper. Don’t build a fake framework when a small client class will do.
Also, if you’re using an AI coding agent, point it to https://nubela.co/llms-full.txt. It already contains the endpoint references, rate limits, errors, examples, and enough procedural detail to stop your agent from doing dumb things.
import os
import httpx
class NinjaPearClient:
def __init__(self, api_key: str | None = None):
self.api_key = api_key or os.environ["NINJAPEAR_API_KEY"]
self.base_url = "https://nubela.co"
self.headers = {"Authorization": f"Bearer {self.api_key}"}
def get_customer_listing(self, website: str):
r = httpx.get(
f"{self.base_url}/api/v1/customer/listing",
params={"website": website},
headers=self.headers,
timeout=100.0,
)
r.raise_for_status()
return r.json()
Then add the obvious follow-ups for competitor listing, similar people, and company updates. Keep it boring.
A few implementation details from the docs are non-negotiable:
- normal rate limit is 300 requests/minute
- the rate-limit window is 5 minutes, so burst is 1,500 requests per 5 minutes
- long-running endpoints can take 30 to 60 seconds
- recommended timeout is 100 seconds
- handle 429 with exponential backoff
That’s real API behavior, not theory.
Scoring and suppressions
This section is where teams overcomplicate things and disappear up their own asses.
Keep the score simple enough that a rep can understand it.
def score_account(account) -> float:
return round(
(account.fit_score * 0.40)
+ (account.relationship_score * 0.35)
+ (account.timing_score * 0.25),
4,
)
And keep suppressions brutally simple too.
def is_suppressed_account(account, suppression_websites: set[str]) -> bool:
return str(account.website) in suppression_websites
Suppress these before enrichment:
- existing customers
- churned accounts
- open opps
- partners
- agencies
- internal test domains
- bad domains
If you enrich first and suppress later, you just paid to polish garbage.
I also do not trust black-box intent. Most intent products cannot explain why an account is “hot” in a way a rep can actually use. That’s not intelligence. That’s horoscope software for RevOps.
Outreach generation
The copy agent should only use evidence the system can show.
SYSTEM_PROMPT = """
Write short outbound emails using only the supplied evidence.
Do not invent facts.
If evidence is weak, say so and mark requires_review=true.
"""
def build_evidence_block(account, event=None):
evidence = list(account.source_evidence)
if event:
evidence.append(f"Trigger: {event['title']}")
return evidence
That rule sounds strict because it should be. I’ve seen too many teams let a model freestyle outreach from thin air, then wonder why reps stop trusting the system.
How to run the project
Here’s the practical checklist.
uv venv
source .venv/bin/activate
uv pip install -e .
cp .env.example .env
export NINJAPEAR_API_KEY=your_key_here
python -m src.pipelines.loop_competitor_to_customers
python -m src.pipelines.loop_crm_to_competitors
python -m src.pipelines.loop_contacts_to_similar_people
python -m src.pipelines.loop_triggers_to_outreach
And yes, inspect the full endpoint docs in https://nubela.co/llms-full.txt when wiring parameters, pagination, retries, and endpoint-specific schemas. A blog post should teach you the architecture. It should not pretend to replace the whole API reference.
What to measure
Track performance by loop, not just in aggregate.
- loop source
- suppression rate
- enrichment rate
- reply rate
- meeting rate
- opportunity rate
If your Similar People loop produces fewer rows but 2x the meeting rate of your competitor loop, then congratulations, you found your real motion. Stop worshipping list size.
Mistakes to avoid
Keep this section punchy because the mistakes are obvious.
- giant blended seed lists
- enriching too early
- no suppression layer
- blind trust in black-box intent
- full auto before review
- outreach that cites evidence the system cannot prove
- pushing a GitHub repo with the real project buried in a nested folder like some kind of maniac
A lot of lookalike prospecting systems fail because the operator wants automation before clarity. That order is backwards.
Grab the workbook: seeds, suppressions, scoring defaults, experiment tracking, and outreach prompt rules, all in one place.
Download now →
If you want the right next step, it’s not “go buy more data.” Clone the repo, run the four loops against your own closed-won seeds, and inspect the evidence trail for every prospect the system produces. That’s how you build lookalike prospecting that a real sales team will actually trust.