Schema Design for Sanctions Screening: Modeling the OFAC SDN List as a Knowledge Graph for Real-Time DD Lookups

The U.S. Treasury’s Office of Foreign Assets Control publishes the Specially Designated Nationals and Blocked Persons (SDN) list as XML and CSV files containing tens of thousands of entries: individuals, entities, vessels, and aircraft subject to sanctions. The naïve screening implementation joins a counterparty file against the SDN list on exact name match, finds nothing, and moves on. The problem is that real sanctioned persons appear in counterparty data with name variants, transliterations, alias-only references, address-only matches, and date-of-birth variants, and an exact-name join misses all of them. The correct screening framework treats the SDN list as a knowledge graph and the counterparty file as a structured query against that graph, returning ranked match candidates with explainable contributions from each match dimension.

This article walks the schema. SDN entries are modeled as SanctionedParty nodes with Alias, Address, DateOfBirth, and IdentificationDocument sub-nodes connected via typed relationships. Neo4j full-text indexes provide fuzzy lookup against names and aliases with transliteration awareness. Composite scoring weights each match dimension by configurable institution-specific calibration. The deliverable is an audit-trail-disciplined screening engine where every returned candidate carries the reasoning that produced it, the documentation discipline examiners require under the FFIEC BSA/AML Examination Manual.

The article’s framing is technical methodology, not regulatory advice. Operational OFAC-compliance questions belong with the institution’s compliance counsel and OFAC’s licensing-application processes.

The sanctions-screening problem at scale

A mid-size financial institution onboards tens of thousands of new counterparties per year and re-screens an existing book of hundreds of thousands of relationships against weekly SDN updates. The volume forecloses analyst-by-analyst screening. The institution needs an engine that runs every counterparty against the SDN list in under 100ms per query and surfaces ranked candidates for analyst review only when composite confidence crosses a threshold. The engine has to handle:

Name variants, Mohammad Al-Bahri vs Mohammed Al Bahri vs Mohamed Albahri. Latin-script transliteration alone produces dozens of orthographic variants for non-Latin source names.
Alias references, sanctioned individuals frequently operate under multiple aliases, each of which OFAC documents. A counterparty file may carry an alias rather than the canonical SDN name.
Partial information, onboarding KYC may capture name and country but not date of birth; the screening must still produce a defensible match score on the partial information.
Address-only and DOB-only matches, when name confidence is low but address or DOB matches a known sanctioned address or DOB cohort, the case warrants analyst review.
Sub-list distinctions, the SDN list is the headline sanctions list; OFAC also maintains the Non-SDN Palestinian Legislative Council (NS-PLC) List, the Foreign Sanctions Evaders (FSE) List, and Sectoral Sanctions Identifications (SSI) List. Screening obligations differ across these.

The regulatory expectation is documented in the FFIEC BSA/AML Examination Manual’s OFAC section and in OFAC’s A Framework for OFAC Compliance Commitments (2019). Examiners ask not only whether the institution screens but whether the screening framework is risk-calibrated, documented, and reviewable.

The SDN list as source data

OFAC publishes the SDN list in three machine-readable formats: legacy SDN.XML, the more recent OFAC Specially Designated Nationals List Data Specification (the “advanced” or “consolidated” format published under the Open Financial Sanctions Data initiative), and CSV exports for entry-level integration. The Open Financial Sanctions Data format is the recommended ingestion target for new builds, it expresses entries as a normalized graph already, with features (names, addresses, identifications) attached to canonical party records.

A representative entry carries:

uid, OFAC’s stable identifier for the listing
firstName / lastName / title (for individuals) or entityName (for entities)
sdnType, Individual, Entity, Vessel, Aircraft
An array of alias features, each with category (a.k.a., f.k.a., n.k.a.) and type (strong / weak alias)
An array of address features with street, city, stateOrProvince, postalCode, country
An array of identification features (passport, national ID, registration number, etc.)
Date-of-birth features with dob value or DOB range
Citizenship and place-of-birth features
Program designations (SDN, SDGT, NS-PLC, IRAN, CYBER2, etc.) indicating which authority basis applies

The update cadence is weekly for the full publication and intra-week for incremental adds and removes via the OFAC Recent Actions feed. Production screening pipelines ingest the full file weekly and the incremental adds daily; the ingestion module is the topic of §6 below.

Graph schema design

A central design choice is whether to model alias / address / DOB / ID features as properties on the SanctionedParty node (as array properties) or as sub-nodes connected via typed relationships. The sub-node modeling is the right choice for screening at scale.

// Schema constraints (one-time setup)
CREATE CONSTRAINT sanctioned_party_uid IF NOT EXISTS
  FOR (s:SanctionedParty) REQUIRE s.uid IS UNIQUE;
CREATE CONSTRAINT alias_id IF NOT EXISTS
  FOR (a:Alias) REQUIRE a.id IS UNIQUE;

// Index for direct lookups
CREATE INDEX sanctioned_party_program IF NOT EXISTS
  FOR (s:SanctionedParty) ON (s.program);
CREATE INDEX address_country IF NOT EXISTS
  FOR (a:Address) ON (a.country_code);

// Full-text index for fuzzy name matching across canonical names AND aliases
CREATE FULLTEXT INDEX sanction_name_search IF NOT EXISTS
  FOR (n:SanctionedParty|Alias) ON EACH [n.full_name, n.normalized_name];

The SanctionedParty node carries minimal identifying properties, uid, sdn_type, program, and the canonical full_name. Every feature attaches via a typed relationship:

(s:SanctionedParty)-[:ALIAS_OF]->(a:Alias)
(s:SanctionedParty)-[:HAS_ADDRESS]->(addr:Address)
(s:SanctionedParty)-[:HAS_DOB]->(d:DateOfBirth)
(s:SanctionedParty)-[:HAS_IDENTIFICATION]->(i:IdentificationDocument)
(s:SanctionedParty)-[:CITIZEN_OF]->(c:Country)

Sub-node modeling is what enables independent indexing. The full-text index on SanctionedParty|Alias covers both the canonical entry and every alias under a single index, so a query for Al-Bahri matches whether the counterparty record stored the canonical or the alias spelling. Address country has its own index, so a country-restricted screening (e.g., “match this name only against parties with Iranian addresses”) completes via index lookup rather than full traversal. Audit-trail discipline benefits too: the query result names the specific Alias node that produced the match, not “some alias in the array.”

Full-text indexes and fuzzy matching

Neo4j’s full-text indexes are Lucene-backed. Out of the box they handle case-folding, ASCII-folding for diacritics, and Porter stemming for English. For sanctions screening, three additional capabilities are required:

Transliteration normalization. Mohammad, Mohammed, and Muhammad are all Latin-script renderings of the same Arabic name (محمد). Production screening typically maintains a normalization table that maps each variant to a canonical token. The normalized_name property on the index stores the normalized form; the full_name property stores the as-received form for analyst review.
Component-level matching. A query for Mohammad Al-Bahri should match a record stored as Al-Bahri, Mohammad (last-first order) and as Mohammad Bahri (hyphenation dropped). Lucene query syntax with phrase-slop and edit-distance parameters handles this:

CALL db.index.fulltext.queryNodes(
  'sanction_name_search',
  'mohammad~1 al-bahri~1'   // ~1 = edit-distance 1 per token
) YIELD node, score

Sound-alike matching. For severe orthographic mismatches that survive normalization, Soundex or Double Metaphone phonetic encoding catches the residual cases. These are typically pre-computed at ingestion as a phonetic_name property and indexed separately.

The composite of these three capabilities, normalization, edit-distance tolerance, phonetic, is the floor for sanctions-screening fuzzy matching. Vendors layer their own proprietary refinements on top, but the baseline above is the auditable open-toolkit floor against which vendor capability is comparable.

Composite scoring

A name-only score is not sufficient. A name match against a different person who happens to share the sanctioned name (false positive) and a missed name against the correct person whose name was variant-rendered (false negative) both fail the screening framework. The fix is composite scoring across multiple match dimensions:

WITH $counterparty_name AS query_name,
     $counterparty_dob  AS query_dob,
     $counterparty_country AS query_country
CALL db.index.fulltext.queryNodes('sanction_name_search', query_name)
  YIELD node, score AS name_score
WITH node, name_score
MATCH (node)<-[:ALIAS_OF*0..1]-(party:SanctionedParty)
OPTIONAL MATCH (party)-[:HAS_DOB]->(d:DateOfBirth)
OPTIONAL MATCH (party)-[:HAS_ADDRESS]->(a:Address)
WITH party, name_score,
     CASE WHEN d.value IS NOT NULL AND query_dob IS NOT NULL
          THEN abs(duration.inDays(date(d.value), date(query_dob)).days) <= 365
          ELSE NULL END AS dob_match,
     CASE WHEN a.country_code = query_country THEN 1.0 ELSE 0.0 END AS country_match
WITH party, name_score, dob_match, country_match,
     (name_score * 0.6
      + coalesce(toFloat(dob_match), 0.0) * 0.3
      + country_match * 0.1) AS composite_score
WHERE composite_score >= 0.5
RETURN party.uid, party.full_name, name_score, dob_match, country_match, composite_score
ORDER BY composite_score DESC
LIMIT 25;

The weights, 0.6 on name, 0.3 on DOB, 0.1 on country in this example, are not canonical. Each institution calibrates the weights against its own risk appetite, historical match-outcome data, and the regulatory profile of the lines of business it supports. A retail-banking screening engine may weight name highest because most onboarding captures only name. A correspondent-banking screening engine may weight DOB and country higher because the counterparty data quality is higher and false-positive cost is asymmetric. The framework is configurable; the weights are an institutional input, not a methodology constant.

The 365-day DOB tolerance window is a sanctions-screening industry convention reflecting documented variability in DOB data quality across source jurisdictions. Tighter tolerance (e.g., 90 days) catches fewer false negatives at the cost of more analyst-review-flagged false positives. Looser tolerance (e.g., 730 days) reduces analyst-review burden at the cost of fewer flagged candidates. The right tolerance is institutional risk appetite, not technical default.

Every returned candidate carries the contributions: name_score, dob_match, country_match. An analyst reviewing a flagged candidate sees not “this person matched the SDN list” but “this person’s name has Lucene-score 0.78 against alias ‘X’ of SanctionedParty uid=12345, their DOB is within 365 days of the recorded DOB, their country does not match.” The audit trail is the screening output itself.

Update workflow

The SDN list updates weekly with intra-week incremental adds and removes. The compliance officer manages three operational concerns.

Ingestion of full and incremental updates. The full weekly file is processed via a two-pass LOAD CSV (parties first; features second). Daily incremental updates use the OFAC Recent Actions feed, parsed and applied as targeted node-and-relationship inserts. The ingestion module is idempotent: re-running the same file against the database produces no spurious updates. Idempotency is enforced via MERGE rather than CREATE on every node, keyed on the OFAC uid. The auditor confirms idempotency during walkthrough by re-running a representative weekly ingestion in a staging environment and inspecting the resulting node-version trail.

Delisting. When OFAC removes a sanctioned party from the SDN list, the compliance officer retains the entry in the screening database for historical-audit purposes. The screening engine has been used to make production decisions while the party was sanctioned; that decision trail has to remain queryable. The model is to set a removed_at property (and a removal_reason if known) rather than delete the node. The AML analyst’s active-screening query filters WHERE s.removed_at IS NULL; the historical-audit query elides the filter.

Reconciliation against prior versions. Every screening result that flagged a candidate carries the timestamp at which the screening ran. If the analyst screened a counterparty on date D against SDN list as of date D-7 and the SDN list updated on D-5, the screening result’s validity is bounded by the version it queried. Re-screening against the current list closes the gap; the compliance officer’s reconciliation against prior versions documents that any change in screening output reflects a change in the list, not a change in the counterparty data.

Worked example

The companion notebook generates a synthetic 50-entry SDN-like list with deliberately seeded variants (transliteration, alias relationships, DOB-day-of-month flips, hyphenation drops) and a synthetic 1,000-counterparty file containing 50 “true” sanctioned parties hidden under variants. The screening framework recovers 47 of 50; the AML analyst walks the 3 misses through standard analyst-resolution workflows.

The three miss categories are instructive:

Miss 1: Phonetic-only similarity below threshold. A counterparty name in the synthetic file was a heavy non-English-language orthographic variant of a sanctioned alias. The Lucene full-text + edit-distance match scored below the institutional name-confidence floor; the phonetic match (Double Metaphone) was close but the address and DOB were absent. Composite score below threshold; case missed.
Miss 2: Address-only and DOB-only match with no name signal. A counterparty file recorded a person who was a sanctioned individual but at a new address with a slightly different name spelling. The address and DOB matched the SDN record but the name-Lucene score was 0.0 (transliteration not in the normalization table). Composite score below threshold; case missed.
Miss 3: Alias only. A counterparty was a sanctioned vessel referenced by its operating name rather than its registered name. The operating name was in the SDN alias list but with a typographical drift that exceeded edit-distance tolerance.

These three miss categories drive operational improvements: expanding the transliteration normalization table (Miss 1), adding name-omitted variant matching paths (Miss 2), and widening edit-distance on Alias-node matches specifically (Miss 3). The composite-scoring framework’s transparency makes each miss a directed engineering target, not an unexplained gap.

Bridge to Articles 005 and 007

Two extensions sharpen the framework.

The forthcoming Random Walks, PageRank, and Personalized PageRank for Cascade-Exposure Ranking article (Personalized PageRank for ownership-chain screening). Direct-match screening catches the named sanctioned parties. The OFAC 50 Percent Rule extends sanctions to any entity 50% or more owned by sanctioned parties, directly or through ownership chains. Catching those entities requires composite-scoring not just direct attributes but the graph distance from the counterparty to any sanctioned ancestor. Personalized PageRank with the SDN list as the source-set computes exactly that distance under the corresponding stochastic semantics. The forthcoming Random Walks, PageRank, and Personalized PageRank for Cascade-Exposure Ranking article walks the math.

The forthcoming Temporal-Graph Patterns for Ownership Lifecycle article (Temporal graph modeling for delisting history). The delisting-but-retain pattern in §6 above produces a graph where some SanctionedParty nodes are active and some are historical. Temporal graph modeling makes the active-as-of-date semantics first-class: every screening query carries an “as of” date and the framework returns the parties sanctioned as of that date. The forthcoming Temporal-Graph Patterns for Ownership Lifecycle article covers the modeling.

Audit-trail and regulatory framing

The audit-trail discipline this article enforces, explainable composite scoring with per-dimension contributions, version-stamped screening results, retained historical entries, is the technical methodology layer. Operational compliance with the BSA/AML and OFAC regulatory frameworks requires institution-specific risk assessment, compliance program documentation, and OFAC licensing where authority exists. This article does not address those layers and should not be read as regulatory advice. Compliance counsel and OFAC’s licensing application processes are the authoritative paths for regulatory questions.

References

Source data:

U.S. Department of the Treasury, Office of Foreign Assets Control. Specially Designated Nationals and Blocked Persons List. home.treasury.gov/policy-issues/financial-sanctions/specially-designated-nationals-and-blocked-persons-list-sdn-human-readable-lists.
OFAC. Specially Designated Nationals List Data Specification (Open Financial Sanctions Data format).
OFAC. A Framework for OFAC Compliance Commitments (2019).
OFAC. Recent Actions feed.

Regulatory framing:

Federal Financial Institutions Examination Council. BSA/AML Examination Manual, OFAC section.
FinCEN (2014). Guidance on Risk-Based Approach to Combating Money Laundering and Terrorist Financing.

Entity resolution and matching:

Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer.
Cohen, W., Ravikumar, P., & Fienberg, S. (2003). “A Comparison of String Distance Metrics for Name-Matching Tasks.” Proceedings of the KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation.

Implementation reference:

Neo4j Documentation: Full-Text Search (current release).

Reproducible code: Companion notebook at github.com/noahrgreen/dd-tech-lab-companion/notebooks/003_sanctions_screening_sdn_kg.ipynb.

Companion fit-diagnostic acceptance criteria. The notebook is considered correctly reproduced when, against the seeded 50-entry synthetic sanctions list and 1,000-counterparty synthetic file, the framework (a) returns exactly 47 true-positive matches at composite-score threshold 0.5, (b) returns the 3 miss categories described in §”Worked example” at the same threshold, and (c) flags no synthetic counterparty in the non-sanctioned 950 as a false positive above composite-score 0.6.

Schema Design for Sanctions Screening: Modeling the OFAC SDN List as a Knowledge Graph for Real-Time DD Lookups

The sanctions-screening problem at scale

The SDN list as source data

Graph schema design

Full-text indexes and fuzzy matching

Composite scoring

Update workflow

Worked example

Bridge to Articles 005 and 007

Audit-trail and regulatory framing

References

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Sheepdog Prosperity Partners LLC

Contact

Schedule