A customer sought out recommendations for his problem: “I have been experimenting with NinjaPear using my AI agents to find investors and leads for my company.” In this article, I’m going to show you how to build a full agentic lead generation system with PydanticAI, NinjaPear, and a small set of supporting tools, including the exact 4 loops, the code structure, and the GitHub project you can ship.
The stale data problem you're describing isn't really an Apollo problem, it's a database problem... by the time you're reaching out the information is already months old.
A runnable starter repo with the 4 loops, Pydantic models, synthetic CSV seeds, sample NinjaPear-shaped payloads, and tests.
git clone https://github.com/NinjaPear-Shares/lookalike-prospecting-guide.gitView on GitHub →
What this guide does
This is a developer guide for look alike prospecting inside an agentic SDR system.
It covers four loops:
- Competitor → Customers: turn one known company into additive prospect accounts.
- CRM Account → Competitors: widen your account universe from closed-won seeds.
- CRM Contact → Similar People: turn one good contact into many role-adjacent people at relevant companies.
- Company → Updates: rank the best prospects by visible timing signals.
That is the whole job here. I’m going to show the loops, the code, the sample responses, the models, and the guardrails that keep this from becoming a fancy way to buy bad leads faster.
Lookalike prospecting, without the bullshit
Look alike prospecting is not “find me companies with similar headcount.” It is generating new accounts or people that resemble proven wins across fit, context, and timing. Most tools stop at fit. That is why most outputs feel generic.
Most lookalike prospecting products are just firmographic cloning with an AI label attached. They fail for ordinary reasons: dirty seeds, weak source signals, no suppression layer, and scoring logic nobody can explain once a rep asks why a company is on the list.
Clean input is boring. That is why it works. Start with closed-won first. Split by use case. Exclude existing customers, churned accounts, open opps, partners, agencies, and test junk before you enrich anything. A 20-account clean seed will beat a 2,000-account blended mess almost every time.
The signal stack is not equal. Firmographics are table stakes. Technographics add context. Relationship data is stronger. Trigger data handles timing. If you already have customer work emails in CRM, Similar People is usually your best 1→N move.
ran 464k cold emails last year across clients. Tested every list source out there... Ended up building our own scraping stack for almost everything because the bought data is stale, expensive, and everyone else is emailing the same contacts.
The 4 agent loops
The system has four loops because there are four separate problems.
- Competitor to customers gives you fast account expansion.
- CRM accounts to competitors gives you clean market widening.
- CRM contacts to similar people gives you additive people discovery.
- Triggers to outreach gives you timing.
Do not collapse all four into one big pipeline on day one. Keep them separate. It makes debugging easier, measurement easier, and failure less ambiguous.
Loop 1: Competitor to customers
Problem: You know a competitor or adjacent company and want a prospect list fast.
Solution: Use the Customer API customer listing endpoint to find companies already buying from that vendor or sitting in its ecosystem.
This endpoint is a good starting point because it returns three relationship buckets: customers, investors, and partner_platforms. The docs also matter here. Cost is 1 credit per request + 2 credits per company returned. quality_filter defaults to true, which filters junk TLDs and unreachable sites. Good default.
from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectAccount
client = NinjaPearClient()
response = client.get_customer_listing("https://stripe.com")
accounts = [
ProspectAccount.from_customer_listing(item, source="customer_listing")
for item in response["customers"]
]
Sample response, using the same shape as NinjaPear docs:
{
"customers": [
{
"name": "Apple",
"description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
"tagline": "Think different.",
"website": "https://www.apple.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.apple.com",
"id": "abc123",
"industry": 45202030,
"specialties": ["Technology", "Consumer Electronics"],
"x_profile": "https://x.com/Apple"
}
],
"investors": [
{
"name": "Sequoia Capital",
"description": "Sequoia Capital is a venture capital firm focused on technology companies.",
"tagline": null,
"website": "https://www.sequoiacap.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.sequoiacap.com",
"id": "def456",
"industry": 40203010,
"specialties": ["Venture Capital", "Growth Equity"],
"x_profile": "https://x.com/sequoia"
}
],
"partner_platforms": [
{
"name": "Amazon Web Services",
"description": "Amazon Web Services provides cloud computing platforms and APIs.",
"tagline": null,
"website": "https://aws.amazon.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Faws.amazon.com",
"id": "ghi789",
"industry": 45101010,
"specialties": ["Cloud Computing", "Infrastructure"],
"x_profile": "https://x.com/awscloud"
}
],
"next_page": "https://nubela.co/api/v1/customer/listing?website=https://www.stripe.com&cursor=abc123"
}
The flow is simple:
- input website
- customer list
- normalize
- suppress existing CRM accounts
- score
Expected normalized output:
{
"name": "Apple",
"website": "https://www.apple.com",
"industry": "45202030",
"source": "customer_listing",
"source_evidence": [
"Returned by customer_listing",
"Company id: abc123"
],
"fit_score": 0.65,
"relationship_score": 0.85,
"timing_score": 0.25
}
A workable outreach angle is short and specific: you already sell into the same ecosystem as Stripe, so this is not a random account pulled from firmographic filters.
Loop 2: CRM accounts to competitors
Problem: You have closed-won accounts in CRM and want to widen the account universe around them.
Solution: Use the Competitor Listing Endpoint on each CRM account website, merge results, dedupe, suppress, score.
This is the cleanest 0→1 account expansion loop in the stack. It starts with proven wins. That matters more than people think.
from src.clients.ninjapear import NinjaPearClient
from src.scoring import score_account
client = NinjaPearClient()
for website in seed_account_websites:
competitors = client.get_competitor_listing(website)
for comp in competitors["competitors"]:
scored = score_account(comp, source="competitor_listing")
if scored.total_score >= 0.72:
save_candidate(scored)
Sample response:
{
"competitors": [
{
"name": "Adyen",
"website": "https://www.adyen.com",
"description": "Financial technology platform for enterprise businesses.",
"competition_type": "product_category_overlap",
"reason": "Both companies offer payment infrastructure and enterprise checkout products.",
"industry": 40204010
},
{
"name": "PayPal",
"website": "https://www.paypal.com",
"description": "Digital payments platform for consumers and merchants.",
"competition_type": "organic_seo_overlap",
"reason": "Both companies rank for overlapping payments-related organic search terms.",
"industry": 40204010
}
],
"next_page": null
}
Expected scored output with evidence retained:
| Account | Evidence | Fit | Relationship | Timing | Total |
|---|---|---|---|---|---|
| Adyen | product_category_overlap |
0.70 | 0.72 | 0.20 | 0.5820 |
| Checkout.com | product_category_overlap |
0.70 | 0.72 | 0.20 | 0.5820 |
| PayPal | organic_seo_overlap |
0.70 | 0.58 | 0.20 | 0.5330 |
This is the part I like about competitor data when it is explicit. You keep the reason. You do not throw it away. Product overlap usually deserves more weight than shared SEO adjacency because it points to budget competition, not just similar search surfaces.
A quick spreadsheet filter on the username part plus a lookup of the company’s current employee list ... weeds out a lot of dead leads.
That quote is not about competitors directly, but it points at the same thing: most list quality problems are filtering problems, not enrichment problems.
Loop 3: CRM contacts to similar people
Problem: You already have customer contacts in CRM and want additive 1→N growth.
Solution: Use the Similar People Endpoint from work emails to find similar roles at other relevant companies.
This is the real 1→N motion if your CRM has actual work emails. Not guessed emails. Real ones.
from src.clients.ninjapear import NinjaPearClient
from src.models import ProspectPerson
client = NinjaPearClient()
for work_email in customer_contact_emails:
similar_people = client.get_similar_people(work_email=work_email)
for person in similar_people["results"]:
prospect = ProspectPerson.from_similar_person(person)
if not is_suppressed_person(prospect):
save_person(prospect)
Sample response:
{
"results": [
{
"full_name": "Will Cannon",
"first_name": "Will",
"last_name": "Cannon",
"bio": "Founder building B2B lead generation software.",
"work_email": "[email protected]",
"role": "Founder & CEO",
"company_name": "UpLead",
"company_website": "https://uplead.com",
"city": "Walnut",
"country": "US",
"x_handle": "willcannon",
"input_role": "Founder & CEO"
},
{
"full_name": "Henry Schuck",
"work_email": "[email protected]",
"role": "CEO & Chairman",
"company_name": "ZoomInfo",
"company_website": "https://zoominfo.com",
"city": "Vancouver",
"country": "US",
"input_role": "Founder & CEO"
}
]
}
Expected normalized output:
{
"full_name": "Will Cannon",
"work_email": "[email protected]",
"company_website": "https://uplead.com",
"role": "Founder & CEO",
"source": "similar_people",
"source_evidence": [
"Matched as similar person to Founder & CEO"
],
"account_score": 0.76,
"person_score": 0.80
}
This is the loop where the system starts to feel good, because you are transferring buyer patterns from known-good contacts instead of guessing titles from a huge database.
The published Similar People benchmarks are useful because they show where the endpoint is strongest:
- Tim Cook / Apple: 18 attempted, 18 found, 100% yield
- Elon Musk / Tesla: 11 attempted, 11 found, 100% yield
- Patrick Collison / Stripe: 19 attempted, 16 found, 84% yield
- Bryan Irace / Stripe engineering manager: 19 attempted, 12 found, 63% yield
- Robert Heaton / Stripe MTS: 65 attempted, 36 found, 55% yield
That decay lower in the org chart is normal. Executives are more public. Mid-level people are noisier.
And yes, here is the context-specific example outreach angle from the plan:
“Hey! Your competitor from Company X just joined NinjaPear. It happens that NinjaPear has a feature to extract customers of your company. Would you like to join NinjaPear to also gain an edge against your competitors?”
Use that as an example of context, not default copy. If the agent cannot show the evidence, the rep should not send the email.
Loop 4: Triggers to outreach
Problem: Your lookalikes are plausible, but you still do not know who to contact now.
Solution: Use Company Updates or Monitor signals to prioritize accounts showing real change.
Trigger data is a ranking layer. It is not a prospect source.
from src.clients.ninjapear import NinjaPearClient
from src.outreach import draft_outreach
client = NinjaPearClient()
updates = client.get_company_updates("https://example.com")
for event in updates["results"]:
if event["category"] in {"website update", "blog", "x"}:
draft = draft_outreach(account, event)
save_draft(draft)
Sample response:
{
"results": [
{
"title": "Pricing page updated, new Enterprise tier added",
"link": "https://example.com/pricing",
"category": "website update",
"pub_date": "Thu, 27 Feb 2026 07:00:00 GMT",
"summary": "Enterprise packaging was added to the pricing page."
},
{
"title": "Announcing global payments expansion",
"link": "https://example.com/blog/global-payments",
"category": "blog",
"pub_date": "Thu, 27 Feb 2026 10:00:00 GMT",
"summary": "The company announced broader market coverage for payments."
}
]
}
Expected outreach draft:
{
"subject": "Saw this at ExampleCo",
"body": "Saw the update: Pricing page updated, new Enterprise tier added. Usually that means the team is changing packaging, priorities, or buyer motion.",
"evidence": [
"Returned in customer listing for Stripe",
"Trigger: Pricing page updated, new Enterprise tier added"
],
"confidence": 0.78,
"requires_review": true
}
The Company Monitor docs are refreshingly specific on credit usage:
- 20 weekly targets: ~346 credits/month
- 10 daily competitor targets: ~1,203 credits/month
- 5 daily prospect accounts, blog + X only: ~453 credits/month
That is enough to budget the loop without guessing.
A lot of people are building #GTM engines now, but the real effectiveness comes down to which market signals you track and how strong your market intelligence is. In B2B, we’ve found that not all signals are equal.
— lev8 (@lev8ai) Mon Apr 13 03:02:09 +0000 2026
That is the right framing. Not all signals are equal. A pricing page change is usually more actionable than another vague “high intent” badge.
Repo structure
Keep this concrete.
Push from the real project root, not from a parent wrapper folder. This sounds obvious because it is obvious. People still get it wrong.
README.md
.env.example
pyproject.toml
data/
closed_won_accounts.csv
crm_contacts.csv
suppression_accounts.csv
suppression_people.csv
examples/
sample_customer_listing.json
sample_competitor_listing.json
sample_similar_people.json
sample_updates.json
src/
config.py
models.py
scoring.py
suppressions.py
outreach.py
clients/
ninjapear.py
agents/
coordinator.py
research_agent.py
scoring_agent.py
copy_agent.py
pipelines/
loop_competitor_to_customers.py
loop_crm_to_competitors.py
loop_contacts_to_similar_people.py
loop_triggers_to_outreach.py
tests/
test_scoring.py
test_suppressions.py
There is no reason to bury the project inside a wrapper folder. If someone clones the repo, they should immediately see README.md, pyproject.toml, src/, data/, and tests/ at the top level.
the temptation is to try and replicate the full clay workflow with cheaper pieces but honestly that usually ends up messier than just paying for one tool that does the job. the stacking problem is real though, you end up spending more time maintaining integrations than actually doing outbound.
That is exactly why repo structure matters. If the code path is messy, the workflow will be messy too.
Core Pydantic models
Keep this practical. The point of Pydantic here is not ceremony. It is to force evidence to stay attached to the record.
SeedAccount
Fields: name, website, segment, source, is_closed_won, arr_band
SeedContact
Fields: full_name, work_email, company_website, role, seniority
ProspectAccount
Fields: name, website, industry, source, source_evidence, fit_score, relationship_score, timing_score
ProspectPerson
Fields: full_name, work_email, company_website, role, source, source_evidence, account_score, person_score
OutreachDraft
Fields: subject, body, evidence, confidence, requires_review
from pydantic import BaseModel, HttpUrl
from typing import List, Optional
class ProspectAccount(BaseModel):
name: str
website: HttpUrl
industry: Optional[str] = None
source: str
source_evidence: List[str] = []
fit_score: float = 0.0
relationship_score: float = 0.0
timing_score: float = 0.0
If a prospect loses its evidence trail between raw JSON and outbound draft, the model failed.
NinjaPear client wrapper
Use a thin wrapper. Do not build a fake framework.
Also, if you are building this with a coding agent, point it to https://nubela.co/llms-full.txt. The docs are already structured for LLMs and include endpoint coverage, rate limits, pagination, timeout guidance, and examples.
import os
import httpx
class NinjaPearClient:
def __init__(self, api_key: str | None = None):
self.api_key = api_key or os.environ["NINJAPEAR_API_KEY"]
self.base_url = "https://nubela.co"
self.headers = {"Authorization": f"Bearer {self.api_key}"}
def get_customer_listing(self, website: str):
r = httpx.get(
f"{self.base_url}/api/v1/customer/listing",
params={"website": website},
headers=self.headers,
timeout=100.0,
)
r.raise_for_status()
return r.json()
Then add the obvious follow-up methods for competitor listing, similar people, and company updates.
The API behavior that matters:
- normal rate limit is 300 requests/minute
- the effective window is 1,500 requests per 5 minutes
- trial accounts are limited to 2 requests/minute
- long-running endpoints can take 30 to 60 seconds
- recommended timeout is 100 seconds
429needs exponential backoff404is charged, failed requests are otherwise not charged
Those details change how you write the client. They are not side notes.
Scoring and suppressions
One short intro and then code.
Make scoring simple enough that a human can understand it.
def score_account(account) -> float:
return round(
(account.fit_score * 0.40)
+ (account.relationship_score * 0.35)
+ (account.timing_score * 0.25),
4,
)
That weighting is fine as a default because it gives fit first position, gives relationship data real weight, and stops timing from hijacking the list.
def is_suppressed_account(account, suppression_websites: set[str]) -> bool:
return str(account.website) in suppression_websites
Suppress before enrichment:
- existing customers
- open opps
- churned accounts you should not re-enter yet
- partners
- agencies
- internal domains
- obvious bad domains
If you enrich first and suppress later, you are paying to make junk more detailed.
I am also skeptical of black-box intent. If the system cannot tell you why an account is hot, the score is decorative.
Outreach generation
The copy agent should only use evidence the system can show.
SYSTEM_PROMPT = """
Write short outbound emails using only the supplied evidence.
Do not invent facts.
If evidence is weak, say so and mark requires_review=true.
"""
def build_evidence_block(account, event=None):
evidence = list(account.source_evidence)
if event:
evidence.append(f"Trigger: {event['title']}")
return evidence
That is the whole rule. Do not let the model improvise facts because it has a good tone. Tone does not save a false claim.
How to run the project
Practical checklist:
uv venv
source .venv/bin/activate
uv pip install -e .
cp .env.example .env
export NINJAPEAR_API_KEY=your_key_here
python -m src.pipelines.loop_competitor_to_customers
python -m src.pipelines.loop_crm_to_competitors
python -m src.pipelines.loop_contacts_to_similar_people
python -m src.pipelines.loop_triggers_to_outreach
And yes, inspect the full endpoint docs in https://nubela.co/llms-full.txt when wiring parameters, pagination, retries, and endpoint-specific schemas. The article gives you the architecture. The docs give you the sharp edges.
What to measure
Track this by loop source, not just in aggregate:
- suppression rate
- enrichment rate
- reply rate
- meeting rate
- opportunity rate
- average evidence depth per record
If Similar People produces fewer rows but twice the meetings, that is the better loop.
Mistakes to avoid
Keep this short.
- giant blended seed lists
- enriching too early
- no suppression layer
- blind trust in black-box intent
- full auto before review
- outreach that cites evidence the system cannot prove
- pushing a GitHub repo with the real project buried in a nested folder like some kind of maniac
Most failures here are not model failures. They are operating mistakes.
If you want the right next step, clone the repo, point your coding agent at https://nubela.co/llms-full.txt, and run the four loops against your own closed-won seeds first. That will tell you more than another week spent tuning a giant dirty list.