Random Walks on DD Graphs: PageRank, Personalized PageRank, and Influence Ranking for Counterparty Investigations

The Schema Design for Sanctions Screening article in this sub-series handled the direct-match screening problem: counterparty name matches an SDN entry, score the match, route for analyst review. Many sanctioned exposures are not direct. OFAC’s 50 Percent Rule treats any entity owned 50 percent or more — directly or indirectly, by one or more sanctioned persons — as itself sanctioned, even if the entity is not on the SDN list. Indirect exposure cascades through ownership chains; a clean-looking counterparty can be 70 percent owned by a holding company that is 60 percent owned by a sanctioned person, and the counterparty is sanctioned by transitivity. Detecting this requires a graph computation that follows ownership edges and accumulates exposure scores.

Personalized PageRank (PPR) is the right tool. Standard PageRank ranks every node in a graph by its long-run probability under a random walk with restart, measuring overall structural influence. Personalized PageRank changes the restart distribution to concentrate probability mass on a designated source set — in the OFAC application, the SDN-listed persons — and the resulting PPR scores measure each downstream node’s exposure to that source set. This article walks the algorithm, the OFAC application, the Neo4j Graph Data Science (GDS) implementation, and the audit-defensibility framing that converts a continuous PPR score into a binary “subject to 50 Percent Rule review” determination.

The indirect-exposure problem

A counterparty’s direct sanctions status is straightforward: name appears on the SDN list, or it doesn’t. The indirect-exposure problem is harder. Consider a holding company structure where Person A (SDN-listed) owns 60% of Entity X, Entity X owns 70% of Entity Y, Entity Y owns 80% of Entity Z. Person A’s effective ownership of Entity Z is 0.60 × 0.70 × 0.80 = 0.336 — below 50%, so Z is not automatically sanctioned. But if Person A also owns 40% of Entity W which owns 30% of Entity Z, the cumulative effective ownership from Person A reaches 0.336 + (0.40 × 0.30) = 0.456 — still below 50%, but trending toward review-worthy territory.

This computation cannot be done by SQL joins on the counterparty file. It requires graph traversal along the ownership edges, weighted multiplication along each path, summation across paths from a sanctioned source, and a configurable threshold for triage. The Beneficial-Ownership Lists to Cypher article in this sub-series introduced the cumulative-ownership Cypher query that handles this for known sanctioned sources at moderate graph sizes. PPR is the generalization that scales to large graphs and produces a ranked exposure score across every node, not just per-source-per-target.

Random walks on directed graphs

A random walk on a directed graph is a stochastic process where, at each time step, a “walker” at node v chooses one of v‘s outgoing edges uniformly at random (or weighted by edge properties) and moves to the target. Over many steps, the walker’s location distribution converges to a stationary distribution if the graph is irreducible (every node reachable from every other) and aperiodic (the walker doesn’t get trapped in deterministic cycles). The stationary distribution π satisfies π = π M, where M is the row-stochastic transition matrix derived from the graph’s adjacency structure.

PageRank introduces a damping factor d (typically 0.85): at each step, with probability d the walker follows a random outgoing edge, and with probability 1−d it “teleports” to a random node according to a prescribed restart distribution. The teleportation guarantees irreducibility (every node is reachable in one step from anywhere) and aperiodicity (the random teleport breaks any deterministic cycle), so the stationary distribution exists and is unique regardless of graph structure.

The PageRank vector r satisfies the eigenvector equation r = (1 − d) p + d M^T r, where p is the restart distribution (uniform 1/n for standard PageRank) and M^T is the transpose of the transition matrix. The iterative solution is the canonical power-iteration: start with r_0 uniform, repeatedly compute r_{k+1} = (1 − d) p + d M^T r_k until ||r_{k+1} − r_k|| < ε. For the ownership-graph sizes typical of a single mid-size DD engagement (under 100,000 nodes), convergence happens in 50-150 iterations.

The connection to Markov stationary distributions is exact and worth flagging: PageRank is the stationary distribution of a particular Markov chain on the graph’s nodes. Readers who came in through the Higher-Order Markov Models article in the Stochastic / Markov sub-series will recognize the underlying apparatus. Readers who arrived through this sub-series are getting the random-walk framework introduced for the first time; both end up in the same theoretical place.

Standard PageRank

The Page-Brin formulation (Page et al., 1999; Brin & Page, 1998) was developed for the web-page ranking problem, but the algorithm is general: any directed graph with weighted edges can be ranked by long-run random-walk probability. The damping factor 0.85 is empirical — Brin and Page tuned it against the early web crawl and the value has held up across two decades of subsequent application. The intuition is that 0.85 keeps the walker mostly following links (preserving the graph’s structural signal) while still teleporting enough (15% of steps) to avoid getting permanently stuck in cycles or dead-ends.

For DD ownership graphs, standard PageRank produces a structural-importance ranking: which entities receive the most ownership influence in the network. This is interesting but not the question the OFAC application asks. OFAC doesn’t care about which entity is structurally most important; it cares about which entities receive ownership influence specifically from sanctioned persons. Standard PageRank’s uniform teleportation treats all sources equally; the OFAC question requires a teleportation distribution concentrated on the SDN nodes.

Personalized PageRank

Personalized PageRank (Haveliwala, 2002; sometimes called “topic-sensitive PageRank”) replaces the uniform restart distribution p = 1/n with a source-concentrated distribution where probability mass is assigned only to a chosen subset of nodes (the “source set” or “seed set”). The eigenvector equation is the same form, but p is now non-uniform.

For the OFAC application, the source set is the set of SDN-listed persons (and the sanctioned-entities derived from them under the 50 Percent Rule’s first-order treatment), and the resulting PPR scores at downstream entities measure exposure-from-sanctions specifically. An entity with high PPR is structurally exposed to the sanctioned source set through ownership paths; an entity with low PPR is not. The continuous score then maps to operational triage: PPR above an institution-calibrated threshold routes to analyst review for 50 Percent Rule determination; PPR below the threshold remains in the standard counterparty pool.

GDS library implementation

Neo4j’s Graph Data Science (GDS) library provides production-grade implementations of PageRank and Personalized PageRank. The pattern is two-phase: project the ownership subgraph into a GDS in-memory representation (which decouples the algorithm runtime from the Neo4j transaction layer), then run the algorithm against the projection.

// Phase 1: project the ownership graph into a GDS in-memory representation
CALL gds.graph.project(
  'ownership_graph_v1',
  ['Entity', 'Person'],
  {
    OWNS: {
      type: 'OWNS',
      properties: {
        weight: { property: 'percentage', defaultValue: 0.0 }
      }
    }
  }
);

// Phase 2: compute Personalized PageRank with sanctioned persons as the source set
MATCH (s:Person {sanctioned: true})
WITH collect(id(s)) AS source_node_ids
CALL gds.pageRank.stream(
  'ownership_graph_v1',
  {
    sourceNodes: source_node_ids,
    relationshipWeightProperty: 'weight',
    dampingFactor: 0.85,
    maxIterations: 100,
    tolerance: 1e-6
  }
)
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE score >= 0.001  // exposure threshold — calibrate per engagement
RETURN
  labels(node) AS node_type,
  coalesce(node.legal_name, node.full_name) AS name,
  score AS exposure_score
ORDER BY exposure_score DESC
LIMIT 50;

Three parameters earn the most attention in production tuning. relationshipWeightProperty: 'weight' directs the algorithm to weight propagation by the percentage property on the OWNS edge — majority-ownership edges receive heavier propagation than minority-ownership edges, matching the 50 Percent Rule’s substantive intent. dampingFactor: 0.85 is the Brin-Page default; tuning between 0.70 and 0.95 is the practitioner’s lever, with lower damping producing more locally-concentrated exposure scores (good for tight ownership-chain detection) and higher damping producing more diffuse scores (good for capturing distant cascade effects). tolerance: 1e-6 is the convergence threshold; tighter tolerance produces more stable rankings at modest computational cost.

OFAC 50 Percent Rule application

The OFAC 50 Percent Rule’s authoritative text (OFAC, 2014) treats any entity owned 50 percent or more — directly or indirectly, individually or in the aggregate — by one or more sanctioned persons as itself sanctioned. The legal threshold is binary at 50%; PPR is a continuous score; converting between them is the operational discipline.

The conventional approach is threshold calibration against historical match outcomes. Run PPR against a graph snapshot, sort entities by PPR score, and intersect with the institution’s prior manual sanctions-review determinations. The PPR threshold that maximizes the recovery of historical true positives while keeping false-positive review volume tractable is the threshold the institution adopts. At our internal benchmark on the synthetic 500-entity test graph, the PPR threshold 0.001 recovers 11 of 12 seeded cascade-exposed entities, with two non-flagged-as-exposed entities crossing the threshold for human review (which they correctly clear once the exact ownership-chain percentages are inspected).

The miss in the benchmark (1 of 12) is instructive. The missed entity is owned at exactly the 50% threshold by a sanctioned source through a single chain; the propagation algorithm with damping factor 0.85 attenuates the signal across the chain enough that the cumulative PPR score falls below the institutional threshold. The fix is twofold: lower the damping factor toward 0.75 for the OFAC-specific projection (which propagates ownership influence more aggressively at the cost of more false positives), or supplement PPR with the cumulative-product Cypher query from the Beneficial-Ownership Lists to Cypher article that handles 50%-threshold edges deterministically. Most production pipelines run both: PPR for ranked screening triage, cumulative-product Cypher for the formal 50 Percent Rule determination on entities surfaced by triage.

Worked example

The companion repository ships a synthetic 500-entity ownership graph with 80 person-nodes (5 designated as SDN-sanctioned) and 12 seeded cascade-exposed entities placed at depths 2-5 from the sanctioned sources, with cumulative effective ownership between 40% and 75% (some above and some below the 50% threshold). PPR with the parameters in §”GDS library implementation” recovers 11 of 12 seeded targets at threshold 0.001.

The remaining benchmark assets — parameter-sensitivity study at damping factors 0.70 / 0.85 / 0.95, threshold-calibration utility against a held-out validation set, and a runtime profile across graph sizes from 500 to 50,000 entities — are checked into the companion repository for practitioners to reproduce on their own data.

Audit-defensibility framing

PPR produces a continuous score; OFAC compliance requires a binary determination; the gap between them is where the audit-defensibility risk lives. A counterparty flagged by PPR is NOT automatically sanctioned. PPR is a screening-triage tool that produces a ranked review queue; the formal 50 Percent Rule determination requires applying the cumulative-ownership calculation against the specific entity-chain, evaluating the OFAC guidance’s “individually or in the aggregate” provision, and a credentialed compliance professional’s review and sign-off.

The acceptable operational framing is: “PPR produces a ranked review queue for compliance review under the 50 Percent Rule.” The unacceptable framing is: “PPR identifies sanctioned entities.” The first is a documented analytical procedure with an explicit review step; the second is a determination the algorithm is not authorized to make. The institution’s compliance officer and its OFAC compliance program remain the controlling authorities; PPR informs but does not decide.

The forthcoming Community Detection article in this sub-series takes the PPR-flagged entities and groups them into related-party clusters for engagement-team workflow. The Temporal Graph Patterns article carries PPR into the time dimension — tracking exposure trajectories as ownership changes over reporting periods, which is essential for FinCEN BOI continuing-reporting reconciliation.

References

PageRank theory:

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). “The PageRank Citation Ranking: Bringing Order to the Web.” Stanford InfoLab Technical Report.
Brin, S., & Page, L. (1998). “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Computer Networks and ISDN Systems, 30(1-7), 107-117.
Haveliwala, T. H. (2002). “Topic-Sensitive PageRank.” Proceedings of the 11th International Conference on World Wide Web, 517-526.
Langville, A. N., & Meyer, C. D. (2006). Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press.

Markov chain theory:

Norris, J. R. (1997). Markov Chains. Cambridge University Press, §1.7-1.8.

Regulatory framework:

U.S. Department of the Treasury, Office of Foreign Assets Control (2014). Revised Guidance on Entities Owned by Persons Whose Property and Interests in Property Are Blocked. (The 50 Percent Rule.)
FFIEC. BSA/AML Examination Manual — OFAC Compliance Program section.

Implementation reference:

Neo4j Graph Data Science Library Documentation — gds.pageRank.stream, gds.graph.project, weighted-edge configuration.

Reproducible code: Companion repository at github.com/noahrgreen/dd-tech-lab-companion ships the PPR exposure-screening module: GDS graph-projection scripts, the damping-factor parameter-sensitivity analysis, the threshold-calibration utility, and the synthetic 500-entity ownership graph with 5 seeded SDN sources for benchmark testing.

Random Walks on DD Graphs: PageRank, Personalized PageRank, and Influence Ranking for Counterparty Investigations

The indirect-exposure problem

Random walks on directed graphs

Standard PageRank

Personalized PageRank

GDS library implementation

OFAC 50 Percent Rule application

Worked example

Audit-defensibility framing

References

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Sheepdog Prosperity Partners LLC

Contact

Schedule