Most lead generation content is useless if you actually have to build the damn system yourself. A customer sought out recommendations for his problem: “I have been experimenting with NinjaPear using my AI agents to find investors and leads for my company.” In this article, I’m going to show you how to build a full agentic lead generation system with PydanticAI, NinjaPear, and a small set of supporting tools, including the exact 4 loops, the code structure, and the GitHub project you can ship.
AI SDRs are the most overhyped thing in SaaS right now.
— Junaid Choudhary (@JustJunaidHere) Tue Apr 14 13:00:01 +0000 2026
Not because AI is bad.
Because the input is bad.
An AI writing personalized emails to 10,000 random contacts is still spray and pray.
You just automated the waste.
A runnable Python starter project with NinjaPear wrappers, Pydantic models, a 4-loop pipeline, mocked outreach helpers, and an agent entrypoint.
git clone https://github.com/NinjaPear-Shares/agentic-lead-generation.gitView on GitHub →
Build the system first
When I was running FluxoMetric, this is the mistake I made for way too long: I thought better modeling would fix bad market inputs. It does not. It just gives you prettier wrong answers.
If I were building lead generation for a startup today, and my two obvious competitors were Stripe and Shopify, I would start with this system shape:
- Loop 1, customer loop: pull likely customers of Stripe and Shopify, because a competitor's customer is better evidence than a random firmographic export
- Loop 2, competitor loop: expand into adjacent companies, because most teams do not have a lead shortage, they have a market map shortage
- Loop 3, people loop: find role-alike buyers only after account qualification, because people-first prospecting is how you build a faster bullshit cannon
- Loop 4, timing loop: enrich and monitor those accounts for changes, because without timing you do not have agentic lead generation, you have a nicer spreadsheet
That is what this guide is going to do: take those four loops, wire them into one practical workflow, and show you the exact code and response shapes I would use if I were a founder trying to steal market share from better-funded competitors.
This is the whole workflow in one glance:
seed domains -> customer loop -> competitor loop -> people loop -> timing loop -> queue
And here is the boring main() that matters more than 90% of AI SDR demos on X:
def main():
seeds = ["https://stripe.com", "https://shopify.com"]
accounts = pull_competitor_customers(seeds)
accounts = expand_competitors(accounts)
people = find_similar_people(accounts)
enriched = enrich_and_monitor(accounts, people)
queue = build_outreach_queue(enriched)
return queue
Short. Good. Most systems should be this obvious.
The stack
Keep the stack brutally practical:
- NinjaPear for company, customer, competitor, people, funding, updates, and monitor data
- PydanticAI for orchestration, typed tools, and validated outputs
- OpenRouter as the model gateway
- Mock helpers for email verification and email sending, because those are not the point of this guide
Install what you need:
pip install ninjapear pydantic-ai httpx pydantic python-dotenv
Minimal settings model:
from pydantic import BaseModel
import os
class Settings(BaseModel):
ninjapear_api_key: str
openrouter_api_key: str
settings = Settings(
ninjapear_api_key=os.environ["NINJAPEAR_API_KEY"],
openrouter_api_key=os.environ["OPENROUTER_API_KEY"],
)
And yes, you can run PydanticAI through OpenRouter cleanly:
from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
model = OpenRouterModel(
"anthropic/claude-sonnet-4-5",
provider=OpenRouterProvider(api_key=settings.openrouter_api_key),
)
agent = Agent(model)
My opinion is simple: the model matters less than the data and the wiring. Most agentic lead generation demos fail because they are dressed-up toys sitting on bad data.
NinjaPear’s docs make the operational constraints pretty clear too:
- up to 300 requests/minute
- a 1,500 request burst window over 5 minutes
- trial accounts limited to 2 requests/minute
- many endpoints take 30 to 60 seconds
- recommended timeout: 100 seconds
That is real infrastructure guidance. Not marketing garnish.
Loop 1: Pull competitor customers
If I were a founder selling into companies that already buy from Stripe or Shopify, this is where I would start.
A competitor’s customer is usually a better lead than a giant stale firmographic export. Not because it is magical. Because it is evidence of actual category demand.
Seed your domains
Start with companies your buyer would already compare you against.
For this guide, I am using:
seeds = [
"https://stripe.com",
"https://shopify.com",
]
Do not start with 500 domains. Start with 2 to 10 and make sure the plumbing works.
Call Customer Listing
Use the Customer Listing API.
Doc-backed details that matter:
- Endpoint:
GET /api/v1/customer/listing - Cost: 1 credit/request + 2 credits/company returned
- Empty results can still cost you
quality_filter=trueshould stay on unless you enjoy junk- Pagination uses
next_pageandcursor page_sizesupports 1 to 200, default 200
Official SDK setup:
import ninjapear
configuration = ninjapear.Configuration(
host="https://nubela.co",
access_token="YOUR_API_KEY",
)
with ninjapear.ApiClient(configuration) as api_client:
api = ninjapear.CustomerAPIApi(api_client)
response = api.get_customer_listing(
website="https://www.stripe.com",
quality_filter=True,
)
Wrapper I would actually keep in a project:
import ninjapear
from urllib.parse import urlparse
from pydantic import BaseModel
class ProspectAccount(BaseModel):
source_company: str
relationship_type: str
name: str
website: str | None
description: str | None = None
specialties: list[str] = []
x_profile: str | None = None
def normalize_domain(url: str) -> str:
return urlparse(url).netloc.replace("www.", "")
def get_customer_listing(api_client, website: str):
api = ninjapear.CustomerAPIApi(api_client)
return api.get_customer_listing(website=website)
Pagination helper:
from urllib.parse import urlparse, parse_qs
def paginate_customer_listing(api_client, website: str, page_size: int = 200):
cursor = None
while True:
response = get_customer_listing(
api_client,
website=website,
cursor=cursor,
page_size=page_size,
quality_filter=True,
)
yield response
next_page = response.get("next_page") if isinstance(response, dict) else getattr(response, "next_page", None)
if not next_page:
break
cursor_values = parse_qs(urlparse(next_page).query).get("cursor", [])
cursor = cursor_values[0] if cursor_values else None
if not cursor:
break
Parse only what matters
For lead generation, I care about the relationship buckets differently.
| Relationship type | How to use it |
|---|---|
| Customer | Primary prospect pool |
| Investor | Good for market mapping, less useful as direct buyer signal |
| Partner platform | Useful for integrations and ecosystem plays |
And I flatten the response like this:
def flatten_customer_response(source_company: str, payload) -> list[ProspectAccount]:
rows = []
data = payload.to_dict() if hasattr(payload, "to_dict") else payload
for key, relationship_type in {
"customers": "customer",
"investors": "investor",
"partner_platforms": "partner_platform",
}.items():
for item in data.get(key, []) or []:
rows.append(
ProspectAccount(
source_company=source_company,
relationship_type=relationship_type,
name=item.get("name", "Unknown"),
website=item.get("website"),
description=item.get("description"),
specialties=item.get("specialties") or [],
x_profile=item.get("x_profile"),
)
)
return rows
Here is a sample response shape using documented NinjaPear fields:
{
"customers": [
{
"name": "Apple",
"description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
"tagline": "Think different.",
"website": "https://www.apple.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.apple.com",
"id": "abc123",
"industry": 45202030,
"specialties": ["Technology", "Consumer Electronics"],
"x_profile": "https://x.com/Apple"
}
],
"investors": [
{
"name": "Sequoia Capital",
"description": "Sequoia Capital is a venture capital firm focused on technology companies.",
"tagline": null,
"website": "https://www.sequoiacap.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Fwww.sequoiacap.com",
"id": "def456",
"industry": 40203010,
"specialties": ["Venture Capital", "Growth Equity"],
"x_profile": "https://x.com/sequoia"
}
],
"partner_platforms": [
{
"name": "Amazon Web Services",
"description": "Amazon Web Services provides cloud computing platforms and APIs.",
"tagline": null,
"website": "https://aws.amazon.com",
"company_logo_url": "https://nubela.co/api/v1/company/logo?website=https%3A%2F%2Faws.amazon.com",
"id": "ghi789",
"industry": 45101010,
"specialties": ["Cloud Computing", "Infrastructure"],
"x_profile": "https://x.com/awscloud"
}
],
"next_page": "https://nubela.co/api/v1/customer/listing?website=https://www.stripe.com&cursor=abc123"
}
What I do not do is dump investors and partner platforms straight into the outreach queue. That is how people end up bragging about lead volume while quietly burning sender reputation.
Stop Using Apollo for Leads the Database Is Already Burned
by u/Singh_Acquisitions in coldemail
“When thousands of marketers, agencies, and freelancers are all buying from the same database, those contacts get absolutely hammered with pitches long before you ever hit send.”
Yep. That is exactly why customer-graph-first lead generation beats database-first lead generation.
Dedupe the account list
Dedupe by normalized domain.
Not company name. Not the CRM label your rep typed in a hurry. Domain.
from collections import OrderedDict
def dedupe_accounts(accounts: list[ProspectAccount]) -> list[ProspectAccount]:
deduped = OrderedDict()
for account in accounts:
key = normalize_domain(account.website) if account.website else account.name.lower()
if key not in deduped:
deduped[key] = account
return list(deduped.values())
Most teams do not have a lead generation shortage. They have a data hygiene shortage.
Loop 2: Expand competitors
If Stripe and Shopify are your starting map, this loop is how you find the adjacent terrain.
This is where the system stops being a list puller and starts becoming a market mapper.
Find adjacent competitors
Use the Competitor Listing endpoint to pull adjacent and overlapping companies.
The docs show this clearly enough to act on:
- Endpoint:
GET /api/v1/competitor/listing - Cost: 2 credits per competitor returned
- Minimum request cost applies even when the result is weak
A thin wrapper is enough:
def get_competitors(api_client, website: str, cursor: str | None = None, page_size: int = 50):
api = ninjapear.CompetitorAPIApi(api_client)
return api.get_competitor_listing(website=website, cursor=cursor, page_size=page_size)
Sample response shape:
{
"competitors": [
{
"company_name": "Adyen",
"website": "https://www.adyen.com",
"product_category_overlap": ["payment processing", "checkout", "fraud tools"],
"organic_seo_overlap": ["payment gateway", "merchant of record"]
},
{
"company_name": "Checkout.com",
"website": "https://www.checkout.com",
"product_category_overlap": ["payments infrastructure"],
"organic_seo_overlap": ["online payments"]
}
],
"next_page": null
}
That is already enough to do useful work. You do not need a giant ontology sermon here. You need plausible adjacent companies and the reasons they overlap.
Expand one layer at a time
Do not recurse like a lunatic.
One layer is usually enough to improve lead generation coverage without turning the queue into broad-market mush.
def expand_competitors(api_client, seed_domains, max_depth=1):
visited = set()
frontier = list(seed_domains)
discovered = []
depth = 0
while frontier and depth < max_depth:
next_frontier = []
for website in frontier:
if website in visited:
continue
visited.add(website)
payload = get_competitors(api_client, website=website)
data = payload.to_dict() if hasattr(payload, "to_dict") else payload
for item in data.get("competitors", []) or []:
comp_website = item.get("website")
if comp_website and comp_website not in visited:
discovered.append(comp_website)
next_frontier.append(comp_website)
frontier = next_frontier
depth += 1
return discovered
Track provenance
Every discovered account should carry provenance:
- which seed exposed it
- which competitor chain exposed it
- what relationship type got it into the graph
That becomes useful later for scoring and messaging. If an account appears as a Stripe customer and again through an adjacent payment competitor, that is not duplication. That is signal.
Stop the loop early
Use three brakes:
max_depth- domain dedupe
- account caps
If you do not cap this, you end up exploring the entire category and pretending that is pipeline generation.
A concurrency-safe version is fine later. Get the logic right first.
# pseudocode
async for website in frontier:
if website not in visited and len(discovered) < account_cap:
fetch competitor listing
append new domains
I found a quote on X that says this pain more honestly than most GTM decks do:
This is why agency bros don’t impress me
— Khalifa (@saleskhalifa) Fri Apr 10 16:05:09 +0000 2026
Super easy to blast the TAM and get some results
Super hard to work a set of 100 qualified accounts for the year and penetrate 10-20% of them
The latter is what I’m paid to do
That is the whole point. Better lead generation is usually about better market mapping, not just more names.
Loop 3: Find similar people
This loop is where a lot of people get stupid.
If you run people discovery before account scoring, you just built a faster bullshit cannon.
Start with the account
Only run people discovery on accounts that already passed some fit threshold.
For example:
- competitor customer
- sensible headcount band
- recent public updates
- category relevance from the source chain
Do that first. Then go looking for humans.
Pull role-alike people
The Similar People endpoint is built for role-alike discovery.
That means you can start with a target role and pull similar profiles at relevant companies.
class ProspectPerson(BaseModel):
account_domain: str
full_name: str | None = None
role: str | None = None
work_email: str | None = None
x_handle: str | None = None
score: float = 0.0
def find_similar_people(api_client, employer_website: str, role: str):
api = ninjapear.EmployeeAPIApi(api_client)
return api.get_similar_people(employer_website=employer_website, role=role)
If I were a founder building this for a payments-adjacent product, I would start with one clean role like Head of Revenue Operations or VP Sales Operations.
Not vague mush like “operations lead.”
Sample response shape:
{
"results": [
{
"full_name": "Jane Doe",
"role": "Head of Revenue Operations",
"work_email": "[email protected]",
"x_handle": "janedoe",
"x_profile_url": "https://x.com/janedoe",
"employer_website": "https://shopify.com"
},
{
"full_name": "Alex Kim",
"role": "VP Sales Operations",
"work_email": null,
"x_handle": null,
"x_profile_url": null,
"employer_website": "https://www.apple.com"
}
]
}
Enrich the person record
Then use the Person Profile endpoint to enrich from:
- work email
- name + company
- role + company
NinjaPear’s own Person Profile write-up gave useful match-rate guidance:
| Input method | Profiles found | Accuracy |
|---|---|---|
| Work email | 10/10 | 100% |
| First name + last name + company | 9/10 | 90% |
| Role + company | 7/10 | 70% |
That is exactly what I would expect. Role + company is useful, but weaker than deterministic inputs.
def get_person_profile(api_client, work_email=None, employer_website=None, role=None, first_name=None, last_name=None):
api = ninjapear.EmployeeAPIApi(api_client)
return api.get_employee_profile(
work_email=work_email,
employer_website=employer_website,
role=role,
first_name=first_name,
last_name=last_name,
)
Sample response shape from the documented Person Profile style:
{
"id": "a3xK9mP2",
"full_name": "Patrick Collison",
"first_name": "Patrick",
"last_name": "Collison",
"bio": "Co-founder and CEO of Stripe",
"profile_pic_url": "https://pbs.twimg.com/...",
"country": "IE",
"city": "IELIM",
"x_handle": "patrickc",
"x_profile_url": "https://x.com/patrickc",
"personal_website": "https://patrickcollison.com",
"work_experience": [
{
"role": "Co-founder & CEO",
"company_name": "Stripe",
"company_website": "stripe.com",
"start_date": "2010-01",
"end_date": null
}
],
"education": [
{
"major": "B.S. Mathematics",
"school": "MIT",
"start_date": "2006-09",
"end_date": "2009-06"
}
]
}
Build the contact queue
Turn the best accounts into a contact queue, not an export dump.
def build_contact_queue(api_client, accounts, target_role="Head of Revenue Operations"):
people = []
for account in accounts:
if not account.website:
continue
payload = find_similar_people(api_client, employer_website=account.website, role=target_role)
data = payload.to_dict() if hasattr(payload, "to_dict") else payload
for item in data.get("results", []) or data.get("people", []) or []:
people.append(
ProspectPerson(
account_domain=normalize_domain(account.website),
full_name=item.get("full_name"),
role=item.get("role"),
work_email=item.get("work_email"),
x_handle=item.get("x_handle"),
)
)
return people
And because fake personalization is a plague now:
How do you personalize at scale without it becoming fake or too "AI"
by u/LifeSuccessful9302 in coldemail
“The secret is to stop using AI for 'openers' and start using it for 'relevance logic'...”
Exactly. Use AI for routing and relevance. Not for one more creepy sentence about a podcast appearance.
Loop 4: Enrich and monitor
If your workflow has no timing layer, it is not agentic lead generation. It is a nicer spreadsheet.
This is the part I care about most, because this is where the system stops being static.
Enrich the account
Use these NinjaPear endpoints:
- Company Details for context
- Employee Count for fit and segmentation
- Company Funding for capital signal
- Company Updates for recent public changes
Here is the endpoint map I would actually hand to an engineer:
| Endpoint | Purpose | Loop |
|---|---|---|
| Customer Listing | Pull customers, investors, partners | Loop 1 |
| Competitor Listing | Expand market graph | Loop 2 |
| Similar People | Find role-alike contacts | Loop 3 |
| Person Profile | Enrich target person | Loop 3 |
| Company Details | Add firmographic and narrative context | Loop 4 |
| Employee Count | Fit and segment by size | Loop 4 |
| Company Funding | Detect capital signal | Loop 4 |
| Company Updates | Detect changes worth acting on | Loop 4 |
| Monitor API | Turn updates into an ongoing feed | Loop 4 |
Wrappers:
def get_company_details(api_client, website: str):
api = ninjapear.CompanyAPIApi(api_client)
return api.get_company_details(website=website)
def get_employee_count(api_client, website: str):
api = ninjapear.CompanyAPIApi(api_client)
return api.get_employee_count(website=website)
def get_company_updates(api_client, website: str):
api = ninjapear.CompanyAPIApi(api_client)
return api.get_company_updates(website=website)
Sample Company Details-style shape:
{
"name": "Stripe",
"website": "https://stripe.com",
"description": "Financial infrastructure platform for businesses.",
"industry": 40205020,
"specialties": ["payments", "billing", "fraud prevention"],
"x_profile": "https://x.com/stripe"
}
Sample Employee Count-style shape:
{
"website": "https://stripe.com",
"employee_count": 8000
}
Sample Company Funding-style shape:
{
"company_name": "Stripe",
"funding_rounds": [
{
"round_type": "Series H",
"announced_date": "2023-03-15",
"money_raised": 6500000000,
"lead_investors": ["Andreessen Horowitz"]
}
]
}
Add company changes
This is where lead generation becomes timely instead of decorative.
NinjaPear’s Updates docs are refreshingly direct. The product tracks:
- blog posts
- X posts
- meaningful website changes
And the output can be consumed as RSS. Good. That means you can pipe it into Slack, a CRM worker, or a queue without inventing another ugly polling layer.
Sample Company Updates response shape:
{
"updates": [
{
"title": "New Checkout Experience for Global Payments",
"category": "blog",
"summary": "Stripe announced a new checkout experience focused on international merchants.",
"link": "https://stripe.com/blog/new-checkout",
"published_at": "2026-02-27T10:00:00Z"
},
{
"title": "Pricing page updated, new Enterprise tier added",
"category": "website update",
"summary": "A new enterprise pricing tier was detected on the public pricing page.",
"link": "https://stripe.com/pricing",
"published_at": "2026-02-27T07:00:00Z"
},
{
"title": "We just launched Stripe Billing v3",
"category": "x",
"summary": "Product launch announcement on X.",
"link": "https://x.com/stripe/status/1839200000000000000",
"published_at": "2026-02-26T14:30:00Z"
}
]
}
The pricing examples in NinjaPear’s Company Monitor launch post are also useful sanity checks:
| Scenario | Targets | Frequency | Credits/month |
|---|---|---|---|
| VC tracking portfolio | 20 | Weekly | ~346 |
| Startup monitoring competitors | 10 | Daily | ~1,203 |
| Sales team watching prospect accounts | 5 | Daily | ~453 |
Those are low enough that a founder or a small GTM engineering team can test timing-based lead generation without making it a budgeting drama.
Set up monitoring
The docs describe the Monitor API as returning an RSS feed of company changes.
If the exact SDK call shape moves around later, keep the wrapper isolated so you only change one file.
def create_monitor_feed(api_client, websites: list[str], frequency: str = "daily"):
api = ninjapear.CompanyAPIApi(api_client)
return api.create_monitor_feed({"targets": websites, "frequency": frequency})
Sample monitor response shape:
{
"rss_feed_url": "https://nubela.co/api/v1/monitor/feed/demo-stripe.xml",
"targets": [
"https://stripe.com",
"https://shopify.com"
],
"frequency": "daily"
}
Build a score
The scoring stub from the plan is good enough to start:
def score_account(is_competitor_customer: bool, employee_count: int | None, recent_updates: int, role_match: bool) -> float:
score = 0.0
if is_competitor_customer:
score += 40
if employee_count and 50 <= employee_count <= 2000:
score += 20
score += min(recent_updates * 10, 20)
if role_match:
score += 20
return score
And here is the output signal table I would use:
| Output signal | Scoring impact | Why it matters |
|---|---|---|
| Competitor customer | +40 | Strongest evidence of category demand |
| Employee count in target band | +20 | Filters out bad-fit tiny and giant accounts |
| Recent updates | up to +20 | Improves timing and relevance |
| Role match found | +20 | Confirms a likely route to action |
If you want a person score too:
def score_person(person: ProspectPerson, account: ProspectAccount) -> float:
score = 0.0
if person.role and any(k in person.role.lower() for k in ['revenue', 'sales', 'marketing', 'growth', 'operations']):
score += 40
if person.work_email:
score += 20
if person.x_handle:
score += 10
score += min(account.score / 3, 30)
return round(score, 2)
More outreach is not better outreach. Timed outreach is better outreach.
Most email outreach agencies are just spam factories in disguise.
by u/Difficult-Arrival665 in b2b_sales
“The real issue is agencies optimize for volume because that's what they can sell, not what's actually gonna work for your specific ICP.”
Exactly. Timing and fit beat volume theater.
Wire the PydanticAI agent
Do not turn this into a full PydanticAI tutorial. You only need enough structure for typed orchestration.
I keep orchestration in one place. Not one script for customers, another notebook for people, and a mystery automation in some no-code tool nobody wants to own.
Typed output model:
from pydantic import BaseModel
class LeadGenResult(BaseModel):
accounts: list[ProspectAccount]
people: list[ProspectPerson]
summary: str
Agent initialization:
from pydantic_ai import Agent
from pydantic_ai.models.openrouter import OpenRouterModel
from pydantic_ai.providers.openrouter import OpenRouterProvider
model = OpenRouterModel(
settings.openrouter_model,
provider=OpenRouterProvider(api_key=settings.openrouter_api_key),
)
agent = Agent(
model,
output_type=LeadGenResult,
system_prompt="Build an agentic lead generation queue using NinjaPear data and return typed results only.",
)
Tool-wrapped function:
from pydantic_ai import RunContext
@agent.tool
async def build_queue(ctx: RunContext[AgentDeps], seeds: list[str], target_role: str = "Head of Revenue Operations") -> dict:
with make_client(ctx.deps.settings.ninjapear_api_key) as api_client:
result = run_pipeline(api_client, seeds=seeds, target_role=target_role)
return result.model_dump()
That is enough.
What matters is not the agent ceremony. What matters is that your lead generation system returns typed structures, not a blob of optimistic prose.
Add NinjaPear tools
This is the practical bridge between docs and working code.
I put the wrappers in app/tools/ninjapear.py.
Required wrappers in the project:
get_customer_listingget_competitorsget_company_detailsget_employee_countget_company_updatesget_company_fundingget_person_profilefind_similar_peoplecreate_monitor_feed
The rule I used in the repo is simple:
- NinjaPear code is real
- non-NinjaPear helpers are mocked on purpose
Model table:
| Model | Fields | Why it exists |
|---|---|---|
ProspectAccount |
source, relationship, website, description, specialties, score | Ranked account object for the queue |
ProspectPerson |
domain, name, role, email, X handle, score | Human routing layer after account qualification |
CompanyUpdate |
title, category, summary, link, date | Timing signal for recency-based prioritization |
LeadGenResult |
accounts, people, summary | Typed output for the agent and CLI |
And yes, I mock these helpers:
def verify_email(email: str) -> dict:
return {
"email": email,
"is_valid": email.endswith("@example.com") is False,
"provider": "mock",
}
def send_email(to: str, subject: str, body: str) -> dict:
return {
"to": to,
"subject": subject,
"status": "mocked-not-sent",
}
Be honest in your build. Mock what you do not actually provide yet. Fake integration theater is how teams confuse themselves.
Build the full GitHub project
I built the starter repo for this article here:
GitHub: https://github.com/NinjaPear-Shares/agentic-lead-generation
Two implementation details the admin specifically asked for are now handled properly:
- the repo root itself has a
README.md, not a nested project folder pretending to be the root - the
README.mdlinks back to this article athttps://nubela.co/blog/agentic-lead-generation
Repo tree:
agentic-lead-generation/
├── README.md
├── .env.example
├── requirements.txt
├── app/
│ ├── config.py
│ ├── models.py
│ ├── main.py
│ ├── scoring.py
│ ├── tools/
│ │ ├── ninjapear.py
│ │ └── mock_email.py
│ └── workflows/
│ └── pipeline.py
└── examples/
└── run_demo.py
A few short excerpts.
config.py
class Settings(BaseModel):
ninjapear_api_key: str = Field(..., alias='NINJAPEAR_API_KEY')
openrouter_api_key: str = Field(..., alias='OPENROUTER_API_KEY')
openrouter_model: str = Field(default='anthropic/claude-sonnet-4-5', alias='OPENROUTER_MODEL')
models.py
class ProspectAccount(BaseModel):
source_company: str
relationship_type: Literal['customer', 'investor', 'partner_platform', 'competitor']
website: str | None = None
normalized_domain: str | None = None
name: str
score: float = 0.0
tools/ninjapear.py
def make_client(api_key: str):
configuration = ninjapear.Configuration(host='https://nubela.co', access_token=api_key)
return ninjapear.ApiClient(configuration)
scoring.py
def score_account(is_competitor_customer: bool, employee_count: int | None, recent_updates: int, role_match: bool) -> float:
score = 0.0
if is_competitor_customer:
score += 40
workflows/pipeline.py
def run_pipeline(api_client, seeds: list[str], target_role: str = 'Head of Revenue Operations') -> LeadGenResult:
accounts = pull_competitor_customers(api_client, seeds)
competitors = expand_competitors(api_client, seeds, max_depth=1)
main.py
result = agent.run_sync(
'Build the lead generation queue for Stripe and Shopify competitors.',
deps=AgentDeps(settings=settings)
)
examples/run_demo.py
result = run_pipeline(
api_client,
seeds=['https://stripe.com', 'https://shopify.com'],
target_role='Head of Revenue Operations',
)
That repo is the lead magnet. Not a spreadsheet. Not a PDF full of fluff. Actual code.
Run an end-to-end example
Now let’s run the whole lead generation flow from the perspective of a founder using Stripe and Shopify as the competitive wedge.
Input:
seeds = ["https://stripe.com", "https://shopify.com"]
target_role = "Head of Revenue Operations"
Runnable demo shape:
from app.config import Settings
from app.tools.ninjapear import make_client
from app.workflows.pipeline import run_pipeline
def main():
settings = Settings.from_env()
with make_client(settings.ninjapear_api_key) as api_client:
result = run_pipeline(
api_client,
seeds=['https://stripe.com', 'https://shopify.com'],
target_role='Head of Revenue Operations',
)
print(result.model_dump_json(indent=2))
What happens, in order:
- Pull customer lists for Stripe and Shopify.
- Expand competitor graph by one layer.
- Re-run customer pull on selected adjacent competitors.
- Keep top accounts.
- Find similar people at those accounts.
- Enrich with headcount and updates.
- Score and rank.
Representative intermediate account output after Loop 1:
[
{
"source_company": "https://stripe.com",
"relationship_type": "customer",
"name": "Apple",
"website": "https://www.apple.com",
"normalized_domain": "apple.com",
"description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide.",
"score": 0
},
{
"source_company": "https://shopify.com",
"relationship_type": "customer",
"name": "Toyota",
"website": "https://toyota.com",
"normalized_domain": "toyota.com",
"description": "Global automotive manufacturer.",
"score": 0
}
]
Representative intermediate people output after Loop 3:
[
{
"account_domain": "shopify.com",
"full_name": "Jane Doe",
"role": "Head of Revenue Operations",
"work_email": "[email protected]",
"x_handle": "janedoe",
"score": 0
}
]
Representative final JSON shape:
{
"accounts": [
{
"website": "https://example.com",
"source_company": "https://stripe.com",
"relationship_type": "customer",
"score": 78
}
],
"people": [
{
"full_name": "Jane Doe",
"role": "Head of Revenue Operations",
"account_domain": "example.com"
}
],
"summary": "Found 14 high-fit accounts and 9 likely buyer profiles from 2 seed competitors."
}
If you want to generate an outreach task after that, fine. Just keep the email send mocked until the queue quality is real.
Because this is also true:
Why did AI SDRs fail?
— Cody Schneider (@codyschneider) Wed Apr 15 18:00:01 +0000 2026
Because their data was wrong
they were optimizing for leads
not for revenue generated
...
Data Pipeline + Data Warehouse + Agent Infrastructure
I agree with the important part. The bottleneck is usually not the model. It is the data layer and the workflow.
Use the NinjaPear Skill
If you already use an AI coding agent, install the NinjaPear Skill first.
Seriously. It will save you dumb mistakes on auth, pagination, endpoint selection, and error handling.
NinjaPear’s docs say the Skill teaches coding agents:
- auth setup
- endpoint selection with cost awareness
- Python and JavaScript SDK usage
- pagination handling
- rate limit handling
- error handling for
401,403,404,429,500,503 - timeout configuration
Install commands:
npx skills add NinjaPear/ninjapear-skill -a claude-code
npx skills add NinjaPear/ninjapear-skill -a codex
npx skills add NinjaPear/ninjapear-skill -a opencode
That belongs in this guide because the entire point is shipping agentic lead generation that actually works.
Not prompt theater. Not “look what my SDR agent wrote.” Real integration code.
If you want the blunt version, here it is.
Most lead generation content is still explaining funnels to people who need systems.
What actually matters is whether your stack can answer four things, in order:
- Who is already buying in the category?
- Which adjacent accounts should I care about next?
- Which humans are worth routing to?
- Why is now the right moment?
If your system cannot answer all four, your “agentic lead generation” stack is mostly branding.
So here is the honest next step: clone the repo, plug in your API keys, and run it against two competitors you already know. Start with Stripe and Shopify if you want the exact wedge from this article. You will learn more from one afternoon of real implementation than from ten generic lead generation posts pretending CSVs are strategy.
And if you are building with Claude Code, Codex, or Opencode, install the NinjaPear Skill before you touch the code. It will save you a bunch of stupid mistakes.