Discovery coverage

Honest measure of where our agents come from and how much each source contributes. More sources = more coverage. Overlap between sources = higher confidence.

A2A agents
2627
Alive
1239
MCP servers
54277
Sources active
19

Contribution per source

recrawl_warm
850 agents
agentic_market
798 agents
mcp_registry
719 agents
recrawl_hot
659 agents
lists
444 agents
github_code
425 agents
recrawl_cold
260 agents
manifests
174 agents
registry
151 agents
github_topics
131 agents
smithery
65 agents
a2aregistry
50 agents
crtsh
38 agents
seeds
8 agents
cdp_discovery
4 agents
submitted
3 agents
erc8004
2 agents
ct
2 agents
huggingface_spaces
1 agents

Source overlap

How many sources discovered the same agent. More overlap = higher cross-source confidence in that agent's existence.

Sources per agent Agents
1 source 1072
2 sources 1038
3 sources 458
4 sources 48
5 sources 8
6 sources 5

What we crawl

  • registry — a2aregistry.org JSON API
  • lists — 7 curated GitHub awesome-lists for A2A + MCP
  • mcp_registry — official MCP registry (deep paginated)
  • smithery — Smithery MCP server registry (deep paginated)
  • glama — Glama MCP server + connector sitemaps, stored as metadata-only unless a remote endpoint can be probed
  • direct_mcp — MCP-looking URLs found in registries, manifests, and lists, probed directly via Streamable HTTP/SSE
  • manifests — machine-readable manifests on known hosts: agent-directory.json, agents.json, mcp.json, llms.txt, agents.txt
  • github_topics — GitHub repos tagged a2a-agent / a2a-protocol / mcp-server (~15K repos)
  • agntcy — AGNTCY's public OASF directory (when reachable)
  • github_code — GitHub code search for agent-card.json (requires token)
  • ct — CertSpotter Certificate Transparency for ~24 PaaS providers (Render, Fly, Railway, Vercel, Netlify, Deno, Replit, Modal, Heroku, Cloudflare Workers, …) matching agent/a2a/mcp keywords
  • crtsh — crt.sh CT corpus full-text search for a2a, agent-card, mcp-server, agentapi
  • seeds — manual seed list
  • submitted — direct opt-in via /submit form

Big-tech A2A endpoints — what we tried

Direct probes of canonical well-known paths on the major vendor surfaces (Google, OpenAI, Anthropic, Microsoft, AWS, Salesforce, SAP, ServiceNow). Most return 401/403/404 because real tenant agents live behind project IDs and OAuth — this table is the audit trail. Re-run weekly; last status from the most recent crawler cycle.

URL Status
https://agentforce.salesforce.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known
https://agents.microsoft.com/.well-known/agent-card.json unreachable: ConnectError
https://agentspace.google.com/.well-known/agent-card.json no_endpoint
https://anthropic.com/.well-known/agent-card.json no_endpoint
https://api.anthropic.com/.well-known/agent-card.json no_endpoint
https://api.openai.com/.well-known/agent-card.json no_endpoint
https://assistant.google.com/.well-known/agent-card.json no_endpoint
https://bedrock-agents.aws.amazon.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known
https://chatgpt.com/.well-known/agent-card.json no_endpoint
https://claude.ai/.well-known/agent-card.json not json
https://copilot.microsoft.com/.well-known/agent-card.json not json
https://copilotstudio.microsoft.com/.well-known/agent-card.json not json
https://dialogflow.cloud.google.com/.well-known/agent-card.json not json
https://einstein.salesforce.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known
https://joule.sap.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known
https://now-assist.service-now.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known
https://openai.com/.well-known/agent-card.json no_endpoint
https://q.amazonaws.com/.well-known/agent-card.json unsafe: dns: [Errno -2] Name or service not known

What we don't crawl (and why)

  • NPM/PyPI package registries — packages are code, not deployed agents. We catch many through GitHub, official registries, Smithery, Glama, and manifest links.
  • Random .well-known crawl of all domains — would require Common Crawl scale; current candidates / signal-to-noise is too low (mostly 404s).
  • Auth-gated agents — agents behind OAuth/API-keys are visible in directories but un-probeable anonymously.
  • Truly private / unpublished agents — fundamentally unindexable. By design.