Discovery coverage
Honest measure of where our agents come from and how much each source contributes. More sources = more coverage. Overlap between sources = higher confidence.
A2A agents
2627
Alive
1239
MCP servers
54277
Sources active
19
Contribution per source
recrawl_warm
850 agents
agentic_market
798 agents
mcp_registry
719 agents
recrawl_hot
659 agents
lists
444 agents
github_code
425 agents
recrawl_cold
260 agents
manifests
174 agents
registry
151 agents
github_topics
131 agents
smithery
65 agents
a2aregistry
50 agents
crtsh
38 agents
seeds
8 agents
cdp_discovery
4 agents
submitted
3 agents
erc8004
2 agents
ct
2 agents
huggingface_spaces
1 agents
Source overlap
How many sources discovered the same agent. More overlap = higher cross-source confidence in that agent's existence.
| Sources per agent | Agents |
|---|---|
| 1 source | 1072 |
| 2 sources | 1038 |
| 3 sources | 458 |
| 4 sources | 48 |
| 5 sources | 8 |
| 6 sources | 5 |
What we crawl
- registry — a2aregistry.org JSON API
- lists — 7 curated GitHub awesome-lists for A2A + MCP
- mcp_registry — official MCP registry (deep paginated)
- smithery — Smithery MCP server registry (deep paginated)
- glama — Glama MCP server + connector sitemaps, stored as metadata-only unless a remote endpoint can be probed
- direct_mcp — MCP-looking URLs found in registries, manifests, and lists, probed directly via Streamable HTTP/SSE
- manifests — machine-readable manifests on known hosts:
agent-directory.json,agents.json,mcp.json,llms.txt,agents.txt - github_topics — GitHub repos tagged
a2a-agent/a2a-protocol/mcp-server(~15K repos) - agntcy — AGNTCY's public OASF directory (when reachable)
- github_code — GitHub code search for agent-card.json (requires token)
- ct — CertSpotter Certificate Transparency for ~24 PaaS providers (Render, Fly, Railway, Vercel, Netlify, Deno, Replit, Modal, Heroku, Cloudflare Workers, …) matching agent/a2a/mcp keywords
- crtsh — crt.sh CT corpus full-text search for
a2a,agent-card,mcp-server,agentapi - seeds — manual seed list
- submitted — direct opt-in via /submit form
Big-tech A2A endpoints — what we tried
Direct probes of canonical well-known paths on the major vendor surfaces (Google, OpenAI, Anthropic, Microsoft, AWS, Salesforce, SAP, ServiceNow). Most return 401/403/404 because real tenant agents live behind project IDs and OAuth — this table is the audit trail. Re-run weekly; last status from the most recent crawler cycle.
| URL | Status |
|---|---|
https://agentforce.salesforce.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
https://agents.microsoft.com/.well-known/agent-card.json
|
unreachable: ConnectError |
https://agentspace.google.com/.well-known/agent-card.json
|
no_endpoint |
https://anthropic.com/.well-known/agent-card.json
|
no_endpoint |
https://api.anthropic.com/.well-known/agent-card.json
|
no_endpoint |
https://api.openai.com/.well-known/agent-card.json
|
no_endpoint |
https://assistant.google.com/.well-known/agent-card.json
|
no_endpoint |
https://bedrock-agents.aws.amazon.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
https://chatgpt.com/.well-known/agent-card.json
|
no_endpoint |
https://claude.ai/.well-known/agent-card.json
|
not json |
https://copilot.microsoft.com/.well-known/agent-card.json
|
not json |
https://copilotstudio.microsoft.com/.well-known/agent-card.json
|
not json |
https://dialogflow.cloud.google.com/.well-known/agent-card.json
|
not json |
https://einstein.salesforce.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
https://joule.sap.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
https://now-assist.service-now.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
https://openai.com/.well-known/agent-card.json
|
no_endpoint |
https://q.amazonaws.com/.well-known/agent-card.json
|
unsafe: dns: [Errno -2] Name or service not known |
What we don't crawl (and why)
- NPM/PyPI package registries — packages are code, not deployed agents. We catch many through GitHub, official registries, Smithery, Glama, and manifest links.
- Random .well-known crawl of all domains — would require Common Crawl scale; current candidates / signal-to-noise is too low (mostly 404s).
- Auth-gated agents — agents behind OAuth/API-keys are visible in directories but un-probeable anonymously.
- Truly private / unpublished agents — fundamentally unindexable. By design.