GoldenMatch
io.github.benzsevern/goldenmatchFind duplicate records in 30 seconds. Zero-config entity resolution, 97.2% F1 out of the box.
Tools · 42
Profile data, detect domain, recommend ER strategy
Run AutoConfigController on a CSV; return the committed GoldenMatchConfig (incl. negative_evidence / Path Y when chosen) plus telemetry — stop_reason, health, decision trace, indicator column priors. …
Return the AutoConfigController telemetry from the most recent `auto_configure` or `agent_deduplicate` call in this MCP session. Same JSON shape as the web /api/v1/controller/telemetry endpoint.
Run full ER pipeline with confidence gating and reasoning
Match two files with intelligent strategy selection
Natural language explanation for a record pair
Explain why records are in the same cluster
Get borderline pairs awaiting approval
Approve or reject a review queue pair
Compare ER strategies on your data
Check if data needs privacy-preserving matching
Run GoldenCheck data quality scan on a CSV file. Returns issues found (encoding errors, Unicode problems, format violations) without applying fixes. Requires goldencheck: pip install goldenmatch[quali…
Run GoldenCheck scan and apply fixes to a CSV file. Returns the fixed data summary and a manifest of all fixes applied. Requires goldencheck: pip install goldenmatch[quality]
Run GoldenFlow data transforms on a CSV file. Normalizes phone numbers (E.164), dates (ISO), categorical spelling, and Unicode issues. Returns a manifest of transforms applied. Requires goldenflow: pi…
List stored Learning Memory corrections, optionally filtered by dataset. Returns id_a, id_b, decision, source, trust, reason, matchkey_name, dataset, original_score, created_at.
Add a pair correction to Learning Memory. Source is set to 'agent' with trust=0.5 (lower than human steward decisions which are 1.0). Pair (id_a, id_b) is canonicalized to (min, max) before storage.
Force a MemoryLearner pass over accumulated corrections. Returns the list of LearnedAdjustments produced (matchkey_name, threshold, sample_size, learned_at). Requires >= 10 corrections per matchkey be…
Return Learning Memory status: total correction count, last learn time, and current learned adjustments. Cheap; safe for status checks.
Return all corrections as a list of dicts (CSV-shaped). Caller is responsible for writing the file. Optionally filter by dataset.
Resolve a record_id to its durable identity. Returns the full identity view (members, evidence edges, recent events) or null when no identity exists for that record.
List identities, optionally filtered by dataset/status.
Return the temporal event log for an identity.
List evidence edges marked `conflicts_with`.
Manually merge two identities. All records from `absorb_entity_id` are reassigned to `keep_entity_id`.
Split a subset of records off an identity into a brand-new identity. The original keeps the remaining records.
Get dataset statistics: record count, cluster count, match rate, cluster sizes.
Find duplicate matches for a record. Provide field values to search against the loaded dataset.
Explain why two records match or don't match. Shows per-field score breakdown.
List duplicate clusters found in the dataset. Returns cluster IDs, sizes, and member counts.
Get details of a specific cluster: all member records and their field values.
Get the merged golden (canonical) record for a cluster.
Match a single record against the loaded dataset in real-time. Paste a record's fields and instantly see if it matches any existing record. Uses the configured matchkeys, scorers, and thresholds. Exam…
Remove a record from its cluster. The record becomes a singleton. Remaining cluster members are re-clustered using stored pair scores. Use this to fix bad merges.
Break an entire cluster into individual records. All members become singletons. Use when a cluster is completely wrong.
Analyze bad merges and suggest config changes. Provide examples of incorrect merges (pairs that should NOT have matched) and GoldenMatch will identify which fields/thresholds to tighten. Example: [{"r…
Get data quality profile: column types, null rates, unique counts, sample values.
Export matching results to a file (CSV or JSON).
List available domain extraction rulebooks (built-in + user-defined).
Create a custom domain extraction rulebook. Define patterns for a specific data domain (medical devices, automotive parts, real estate, etc.).
Test a domain extraction rulebook against sample records. Shows what features would be extracted from the loaded data.
Analyze the loaded dataset and recommend optimal PPRL (privacy-preserving record linkage) configuration. Returns recommended fields, bloom filter parameters, threshold, and explanation.
Run privacy-preserving record linkage between two parties' data. Computes bloom filters, matches records without sharing raw data. Specify fields, threshold, and security level.
Similar MCP servers embedding-nearest
How to use
Add to your Claude Desktop / Cursor / Cline MCP config:
{
"mcpServers": {
"goldenmatch": {
"url": "https://goldenmatch-mcp-production.up.railway.app/mcp/",
"transport": "streamable-http"
}
}
}