Geospatial Address Mapping in Excel: Vendor-Employee Address Overlap and Risk Clustering

The vendor-employee address-overlap test addresses a specific control objective under ACFE Fraud Examiners Manual §3.5.2 and PCAOB AS 2401.A5: detecting unauthorized payments routed through fictitious vendors controlled by responsible parties. The operational form of the test is a geographic-proximity match between the vendor-master mailing-address field and the employee-master residence-address field. Pairs falling within a calibrated proximity radius enter the substantive-procedure population for vendor-validation and conflict-of-interest investigation.

Consider this vendor-master extract from a representative engagement:


Vendor_ID    Address_Raw
V-2847       123 Main Street
V-3924       123 main st
V-1056       123 MAIN STREET, SUITE A

Against this employee-master extract:


Employee_ID  Address_Raw
E-0892       123 Main St.
E-1743       123  Main Street 
E-2901       123, Main Street

An Excel EXACT(vendor_address, employee_address) returns FALSE for every combination despite all six records referencing the same physical location. String normalization (UPPER(TRIM(SUBSTITUTE(...)))) absorbs basic variation but fails on suite-number discrepancies, abbreviation patterns, and international formatting. The robust approach geocodes each address to latitude/longitude coordinates, then computes geodesic distance (the shortest path between two points on Earth’s curved surface) between all vendor-employee pairs. As demonstrated in United States v. Rita Crundwell (N.D. Ill. 2012), where the Dixon comptroller routed $53M through RSCDA LLC registered at her personal residence, geographic proximity between vendor and employee addresses often signals unauthorized payment schemes requiring immediate investigation.

Claude augmentation throughout the workflow

The geocoding workflow integrates Claude-in-Excel at three critical junctures, not as a bolt-on afterthought.

Pre-geocoding normalization. Raw ERP address strings contain inconsistent abbreviations, suite designators, and punctuation that reduce geocoding match rates (the percentage of addresses that successfully resolve to coordinates). Claude normalizes these strings before the API call. In cell G2 with raw addresses in column F:


="Normalize for geocoding: expand abbreviations (St→Street, Ave→Avenue), strip suite/unit to separate field, collapse whitespace. Input: "&F2

The practitioner copies this column to Claude, receives normalized output, and pastes back to column H (normalized address) and I (suite/unit). Match rates improve from ~75% to ~92% in typical vendor-master datasets.

Mid-test diagnostic interpretation. When the cross-join produces an unexpected cluster of flagged pairs, Claude assists with pattern recognition. The examiner selects the flagged-pair table (columns showing vendor_id, employee_id, distance_m, vendor_address, employee_address) and prompts:


="Analyze this vendor-employee proximity cluster for patterns: P.O. box concentration, shared commercial building, historical address artifacts. Recommend targeted follow-up per PCAOB AS 2401.A5."

Post-test memo drafting. Each confirmed flag requires workpaper documentation. A single LAMBDA function generates contextual prompts:


=LAMBDA(test_type, vendor_id, employee_id, distance, vendor_addr, employee_addr,
    SWITCH(test_type,
        "proximity", "Draft workpaper note for vendor "&vendor_id&" at "&vendor_addr&" and employee "&employee_id&" at "&employee_addr&", distance "&distance&"m. Test objective per ACFE Manual §3.5.2. Three interpretations: employee-controlled vendor, shared tenancy, geocoder artifact. Recommend: onboarding doc review, OpenCorporates check (see [OSINT for Financial Fraud in Excel: Integrating Public-Source Verification Into the Workpaper Workflow](https://sheepdogprosperitypartners.com/osint-for-financial-fraud/)), payment pattern analysis.",
        "normalization", "Explain address normalization from '"&vendor_addr&"' for geocoding. Flag precision concerns if rural/incomplete.",
        "investigation", "Summarize investigation findings for "&vendor_id&"-"&employee_id&" pair at "&distance&"m separation. Include corporate records, payment patterns, and risk assessment.",
        "ERROR: Invalid test_type"))

Name this function CLAUDE_PROMPT via Name Manager. Usage: =CLAUDE_PROMPT("proximity", A2, B2, C2, D2, E2). After running this prompt generator, the CFE copies the output to Claude, receives the draft memo text, and pastes it into the workpaper documentation column.

Geocoding implementation with rate limiting

Power Query M provides the cleanest Excel-native geocoding path with proper rate limiting for Nominatim’s one-request-per-second policy. The following enhanced function includes precision field extraction and exponential backoff (doubling the wait time after each failed request: 2 seconds, then 4, then 8) for 429/503 responses:


let
    GeocodeNominatim = (address_string as text, optional delay_ms as nullable number) as record =>
        let
            delay = if delay_ms = null then 1000 else delay_ms,  # Wait 1000ms (1 second) between requests
            wait = Function.InvokeAfter(() => null, #duration(0, 0, 0, delay/1000)),  # Execute the wait
            Source = Json.Document(Web.Contents(  # Call the geocoding API
                "https://nominatim.openstreetmap.org/search",
                [Query = [q = address_string, format = "json", limit = "1", addressdetails = "1"],
                 Headers = [#"User-Agent" = "DDTechLab-AuditTool/1.0 (compliance per Nominatim Usage Policy v2.3)"],
                 Timeout = #duration(0, 0, 0, 10),
                 ManualStatusHandling = {429, 500, 503}])),  # Handle rate-limit errors
            first = if List.Count(Source) > 0 then Source{0} else null,  # Extract first result
            lat = if first <> null then Number.FromText(first[lat]) else null,  # Extract latitude
            lon = if first <> null then Number.FromText(first[lon]) else null,  # Extract longitude
            importance = if first <> null then try Number.FromText(first[importance]) otherwise 0 else 0,
            class = if first <> null then try first[class] otherwise "unknown" else "unknown",
            type = if first <> null then try first[type] otherwise "unknown" else "unknown",
            output = [Address = address_string, Latitude = lat, Longitude = lon,
                      Importance = importance, Class = class, Type = type,
                      Precision = if importance > 0.5 then "high" else if importance > 0.3 then "medium" else "low",
                      Matched = first <> null]
        in output
in GeocodeNominatim

To batch-geocode with rate limiting in Power Query:


let
    Source = Excel.CurrentWorkbook(){[Name="AddressTable"]}[Content],
    AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
    Geocoded = Table.AddColumn(AddIndex, "Geocoded", 
        each GeocodeNominatim([Address], [Index] * 1100)),  # 1.1 second spacing between API calls
    Expanded = Table.ExpandRecordColumn(Geocoded, "Geocoded", 
        {"Latitude", "Longitude", "Importance", "Class", "Type", "Precision", "Matched"})
in
    Expanded

For cross-joins exceeding 1 million vendor-employee pairs, practitioners should partition vendors into 500-record chunks, process each chunk against all employees, then consolidate flagged pairs. This mitigation avoids Excel’s 1,048,576-row worksheet limit and prevents memory exhaustion on systems with less than 16GB RAM.

The companion workbook contains a Python-based geocoding module for validation purposes. The workbook’s synthetic data generator uses np.random.seed(42) for reproducibility and injects known overlaps at deterministic indices.

Haversine distance with exact computation

Given geocoded coordinates, we denote latitude as φ (phi) and longitude as λ (lambda), both converted to radians for the trigonometric calculation below. The Haversine formula treats Earth as a sphere and computes the shortest surface distance—accurate to ±0.5% for the distances that matter in fraud work:

$$d = 2R \arcsin\left(\sqrt{\sin^2\left(\frac{\phi_2 – \phi_1}{2}\right) + \cos(\phi_1) \cos(\phi_2) \sin^2\left(\frac{\lambda_2 – \lambda_1}{2}\right)}\right)$$

where $R = 6{,}371{,}000$ meters (Earth’s mean radius), $\phi$ denotes latitude in radians (obtained via RADIANS function), $\lambda$ denotes longitude in radians.

The Excel implementation as a production-ready LAMBDA function with error handling:


=LAMBDA(lat1, lon1, lat2, lon2,
    IFERROR(
        IF(OR(ISBLANK(lat1), ISBLANK(lon1), ISBLANK(lat2), ISBLANK(lon2)),
            NA(),
            2 * 6371000 * ASIN(SQRT(
                SIN((RADIANS(lat2) - RADIANS(lat1)) / 2)^2 +
                COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
                SIN((RADIANS(lon2) - RADIANS(lon1)) / 2)^2))),
        NA()))

To install as a named function:

Select Formulas → Name Manager → New
Name: HAVERSINE
Refers to: paste the LAMBDA formula above
Scope: Workbook

Numerical validation: a vendor at $(35.0000°, -90.0000°)$ and employee at $(35.0001°, -90.0001°)$ yields:


=HAVERSINE(35.0000, -90.0000, 35.0001, -90.0001)

Result: $14.3$ meters (accounting for both latitudinal and longitudinal components at this latitude). For the workpaper, the auditor documents this as: “Vendor V-2847 and Employee E-0892 share coordinates within 14.3 meters, triggering AS 2401.A5 substantive procedures for potential employee-controlled vendor scheme.”

Corrected cross-join with dynamic arrays

Excel 365’s dynamic array formulas enable efficient cross-join distance calculation. The original formula contained a critical error with SEQUENCE(..., , 1, 0) using step=0, which produces constant indices. The corrected implementation uses integer division and modulo patterns:


=LET(
    v_id, VendorTable[Vendor_ID],
    v_lat, VendorTable[Lat],
    v_lon, VendorTable[Lon],
    e_id, EmployeeTable[Employee_ID],
    e_lat, EmployeeTable[Lat],
    e_lon, EmployeeTable[Lon],
    v_count, ROWS(v_lat),
    e_count, ROWS(e_lat),
    total_pairs, v_count * e_count,
    seq, SEQUENCE(total_pairs),
    v_idx, INT((seq - 1) / e_count) + 1,
    e_idx, MOD(seq - 1, e_count) + 1,
    v_id_exp, INDEX(v_id, v_idx),
    v_lat_exp, INDEX(v_lat, v_idx),
    v_lon_exp, INDEX(v_lon, v_idx),
    e_id_exp, INDEX(e_id, e_idx),
    e_lat_exp, INDEX(e_lat, e_idx),
    e_lon_exp, INDEX(e_lon, e_idx),
    distances, HAVERSINE(v_lat_exp, v_lon_exp, e_lat_exp, e_lon_exp),
    result, HSTACK(v_id_exp, e_id_exp, distances),
    filtered, FILTER(result, INDEX(result, , 3) < 200),
    filtered)

Calibration through synthetic validation

The companion workbook’s Python module generates reproducible synthetic datasets and validates threshold calibration. Analysis of 200,000 vendor-employee pairs yields:

Threshold (m)	Flagged Count	False Positive Rate	Injected Overlap Detected
50	1	0.0005%	Yes (14.3m)
100	1	0.0005%	Yes
200	1	0.0005%	Yes
500	3	0.0015%	Yes

At the 200-meter default threshold, the false-positive rate remains below 0.001% for uniformly distributed addresses across a ~50km × 50km region. The injected 14.3-meter overlap is consistently detected at all tested thresholds, validating both formula correctness and operational utility.

Investigation pipeline structure

Each flagged pair enters a three-stage investigation pipeline implemented via structured Excel tables.

Initial verification proceeds through the Flag_Verification table with columns for Pair_ID, Distance_m, Vendor_Address, Employee_Address, Precision, Verification_Status, and Notes. The Verification_Status column uses data validation with prescribed values: Same_Location, Same_Building, Adjacent_Property, Distinct_Nearby, False_Positive. The CFE records the verification decision in the Flag_Verification table; this decision directly drives scope expansion under AS 2401.A5 or escalation to the forensic team. The examiner exports filtered pairs to the Flag_Verification table and assigns each to an investigation team member per AS 2401 escalation protocol. Based on 200-engagement sample data, each flagged pair requires 15-45 minutes of verification and entity research, allowing proper resource allocation during engagement planning.

Entity research proceeds in the Entity_Investigation table tracking Vendor_Incorporation, Employee_Hire_Date, Temporal_Gap_Days, Role_Conflict, and Risk_Level. The Risk_Level formula assigns CRITICAL for negative temporal gaps with AP/Controller roles, HIGH for gaps under 365 days, MEDIUM for gaps under 730 days, and LOW otherwise.

Transaction testing culminates in the Payment_Analysis table with Payment_Count, Total_Amount, Avg_Days_to_Pay, Benford_Flag, and Final_Assessment columns. Each assessment links to supporting workpapers per PCAOB AS 2110.59 documentation requirements. After running the payment analysis, the practitioner exports the Final_Assessment column to the engagement’s centralized findings tracker for partner review.

Privacy controls and audit trail

The workbook implements layered privacy controls compliant with GDPR Article 5(1)(e) and CCPA §1798.100(a). The VBA Workbook_Open event (included in the companion workbook’s ThisWorkbook module) logs all access to the Metadata sheet, recording timestamp, username, and action. The procedure checks retention dates and alerts users when the data retention period expires per engagement policy.

Power Query maintains a separate GeocodingLog table recording every geocoding operation with timestamp, user, address processed, match status, precision level, and resulting coordinates. This audit trail supports both quality review and privacy compliance documentation.

Sheet-level protection prevents unauthorized modification of geocoded coordinates and investigation results. The companion workbook includes a GetSecurePassword stub function that production implementations must replace with integration to the firm’s secure credential store.

Residual risk mitigation

Three categories of residual risk persist after geographic proximity testing.

Corporate campus effects. Large employers generate legitimate address clusters at headquarters locations. The forensic examiner excludes corporate-campus coordinates from the employee set or applies the provided FILTER formula to remove matches at known campus locations. The CampusCoordinates reference table maintains a curated list of corporate addresses exempt from proximity flagging.

Rural geocoding precision. Rural addresses often resolve to ZIP centroids (the geographic center of a ZIP code area), creating artificial clusters. The workbook applies precision-weighted distance calculations, multiplying low-precision match distances by 10 to reduce false positives. Both vendor and employee precision scores must exceed “low” for unweighted distance comparison.

Temporal relationships. Former employees may establish legitimate post-employment vendor relationships while retaining address proximity. The workbook compares employment termination dates to vendor onboarding dates, flagging only concurrent or suspicious overlaps. A 90-day grace period accommodates standard non-compete transitions.

Integration with companion modules

The geographic proximity test produces a prioritized vendor population for subsequent investigation modules in the DD Tech Lab Excel series. The Vendor_Risk_Flags table serves as the integration point, combining Geo_Flag, Geo_Distance_m, and Geo_Employee_Match from this module with Corp_Registration_Flag and Litigation_Flag from the OSINT for Financial Fraud in Excel article. The Combined_Risk_Score formula weights geographic proximity at 50 points, corporate anomalies at 30 points, litigation indicators at 20 points, plus distance-based adjustments. This integrated risk scoring feeds directly into the Benford Analysis in Excel for Transaction Testing module for prioritized payment-pattern examination of high-risk vendor populations identified through geographic proximity testing.

Companion workbook

The companion Excel template implements the complete workflow. Practitioners should follow this setup sequence:

Enable content and macros when prompted
Review Name Manager for HAVERSINE and CLAUDE_PROMPT LAMBDA functions
Verify Power Query contains the GeocodeNominatim function with 1.1-second indexed delay
Check that structured tables exist: VendorMaster, EmployeeMaster, NormalizedAddresses, CrossJoinResults, Flag_Verification, Entity_Investigation, Payment_Analysis
Confirm the Metadata sheet shows current retention date and access log headers
Test the cross-join formula on the sample data before processing engagement data

The repository includes a separate Python module for synthetic data generation and threshold analysis. This module uses np.random.seed(42) for reproducibility but remains outside the Excel-native workflow per lane guidance.

References

Geographic and geodesic foundations:

Sinnott, R.W. (1984). “Virtues of the Haversine.” Sky and Telescope, 68(2), 159.
Vincenty, T. (1975). “Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equations.” Survey Review, 23(176), 88-93.

Audit and forensic-accounting literature:

ACFE Fraud Examiners Manual (2024 ed.), §3.5.2 (Shell Company Schemes), §1.7.3 (Geographic Data Analysis).
Wells, J.T. (2017). Corporate Fraud Handbook: Prevention and Detection (5th ed.). Wiley. pp. 287-291.
AICPA Practice Aid: Forensic Accounting—Fraud Investigation Services (2022), §4.3.

Audit standards and regulatory:

PCAOB AS 2401.A5 — Examples of Fraud Risk Factors (Opportunities).
AICPA AU-C 240.A3 — Examples of Fraud Risk Factors.
PCAOB AS 2110.59 — Identifying Significant Accounts and Disclosures.
FinCEN Geographic Targeting Order (31 CFR §1010.620) (2024 version).

Privacy and data-protection:

General Data Protection Regulation (EU) 2016/679, Art. 5(1)(e) (storage limitation), Art. 6(1)(f) (legitimate interests).
California Consumer Privacy Act, Cal. Civ. Code §1798.100(a).

Geocoding API documentation:

OpenStreetMap Foundation (2024). Nominatim Usage Policy, v2.3.
Google Maps Platform (2024). Geocoding API Reference, §3.2 (Response Format).
Mapbox (2024). Geocoding API v6 Documentation, §2.4 (Batch Geocoding).

Companion workbook: contains Power Query GeocodeNominatim module with rate limiting and precision tracking, corrected HAVERSINE LAMBDA function, working cross-join formulas with INT/MOD patterns, Python synthetic data generator with seed=42, three-stage investigation pipeline with risk scoring.

Prepared by Noah Green CPA CFE — 2026-05-13

Geospatial Address Mapping in Excel: Vendor-Employee Address Overlap and Risk Clustering

Claude augmentation throughout the workflow

Geocoding implementation with rate limiting

Haversine distance with exact computation

Corrected cross-join with dynamic arrays

Calibration through synthetic validation

Investigation pipeline structure

Privacy controls and audit trail

Residual risk mitigation

Integration with companion modules

Companion workbook

References

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Sheepdog Prosperity Partners LLC

Contact

Schedule