DD Tech Lab: Build a Medicare Outlier Screen, Step by Step

This walkthrough describes a public-records research method. It makes no allegation against any provider; every operator-level specific has been generalized, and the output is a list of leads that require records-level review before anything is called a finding. Not legal advice.

In the companion piece, Catching Fraud With Public Data, I described outsiders, data-analytics firms like Lincoln Analytics, reimbursement analysts, fraud examiners, who turned public Medicare data into multimillion-dollar whistleblower recoveries. The natural question for anyone with the skills in our tech lab is the one the title of this series keeps asking: can we actually do that ourselves?

The front half, yes. This article is the proof. In about a hundred lines of Python, on free public data, we will build the core of what a data-miner relator runs first: a peer-relative outlier screen that finds the providers who bill a given procedure far more intensely than the rest of their specialty. It is the same instinct behind the cases, the Lincoln Analytics settlement over unnecessary vascular procedures began as exactly this kind of billing-pattern question.

And then we will do the harder, more important half: learn why the screen’s loudest output is usually not fraud, and why the discipline to clear a lead is what separates a forensic practitioner from a story generator. This is the second entry in DD Tech Lab; the first, a Medicaid market screen, built an entity-clustering screen. This one builds the billing-outlier screen. Different method, same discipline.

Everything below runs on free public data and two packages beyond pandas. No credentials, no vendor platform.

pip install requests duckdb pandas

The method, and the trap built into it

A data-miner’s screen rests on one idea: within a peer group, most providers cluster, and a few sit far outside. If a hundred vascular surgeons each perform a dialysis-circuit angioplasty about 1.2 times per patient, and one performs it 3.5 times per patient, that one is a statistical outlier worth a look.

The trap is built into that same sentence, and the ACFE (Association of Certified Fraud Examiners) discipline of professional skepticism is the way out of it. An outlier is equally consistent with two stories: a provider doing unnecessary procedures, and a legitimate subspecialty referral magnet who sees the hardest cases in the region. The Fifth and Ninth Circuits made this the law for data-driven whistleblower suits when they dismissed the Integra Med Analytics cases: a statistical anomaly, standing alone, does not state a claim, because it is “consistent with both” fraud and a lawful explanation. Hold that the entire time you read the output. The screen’s job is to rank questions, not to answer them.

The schemes this maps to live on the corruption and asset-misappropriation branches of the ACFE Fraud Tree, billing for services not rendered, medically unnecessary volume, upcoding. And the most important number in fraud detection is not in any of our queries: the ACFE Report to the Nations finds that tips, not analytics, catch the most occupational fraud, roughly three times as often as the next method. Data does not replace the whistleblower. It aims them.

What the screen keys on	Field (CMS, the Centers for Medicare & Medicaid Services)	Why it matters
The procedure	`HCPCS_Cd`	One service, so peers are comparable
The peer group	`Rndrng_Prvdr_Type` (specialty)	A nephrologist and a radiologist have different baselines
Intensity (the signal)	`Tot_Srvcs` / `Tot_Benes`	Procedures per patient, the overuse pattern
Scale (the filter)	`Tot_Srvcs`, `Avg_Mdcr_Alowd_Amt`	A high rate on three patients is noise

Step 1 (beginner): get the data

CMS publishes Medicare Physician & Other Practitioners, by Provider and Service: one row per provider, per HCPCS (Healthcare Common Procedure Coding System) procedure code, per year, with volumes and payment averages. It is free and keyless. Here is the real call that pulls every provider who billed HCPCS 36902, percutaneous angioplasty within a dialysis circuit, the vascular-access procedure family at the center of the Lincoln Analytics matter:

import json, urllib.parse, urllib.request
import pandas as pd

CMS = "https://data.cms.gov/data-api/v1/dataset/92396110-2aed-4d63-a6a2-5d6207d46a29/data"

def fetch_hcpcs(hcpcs, state=None, page=5000):
    rows, offset = [], 0
    while True:
        params = {"filter[HCPCS_Cd]": hcpcs, "size": page, "offset": offset}
        if state:
            params["filter[Rndrng_Prvdr_State_Abrvtn]"] = state
        url = CMS + "?" + urllib.parse.urlencode(params)
        req = urllib.request.Request(url, headers={"User-Agent": "ddlab/1.0"})
        with urllib.request.urlopen(req, timeout=60) as r:
            batch = json.loads(r.read().decode())
        rows.extend(batch)
        if len(batch) < page:
            break
        offset += page
    return pd.DataFrame(rows)

That is the whole data layer. One filtered, paginated call returns a few thousand providers nationwide. (CMS suppresses any provider/service row with fewer than 11 beneficiaries, so the small-volume noise is already gone before you start.)

Step 2 (intermediate): the intensity metric, and your denominator

Identity tells you who exists; volume tells you who is paid; intensity tells you who looks unusual. The signal is services per beneficiary, how many times, on average, a provider performed this one procedure on each patient who got it:

df["Tot_Benes"] = pd.to_numeric(df["Tot_Benes"], errors="coerce")
df["Tot_Srvcs"] = pd.to_numeric(df["Tot_Srvcs"], errors="coerce")
df["srvcs_per_bene"] = df["Tot_Srvcs"] / df["Tot_Benes"]

Know your denominator, the rule that catches most beginners. Tot_Benes is the count of distinct Medicare beneficiaries; Tot_Srvcs is the count of services. Their ratio is procedures per patient, a clean clinical-intensity figure. Confuse one for the other and you will manufacture outliers that are not real. (In the Medicaid screen, the analogous trap was a “total patients” field that was actually patient-months.)

Step 3 (intermediate): score against peers, robustly

Now the only statistically interesting line in the screen. We compare each provider’s intensity to the median of their own specialty, scaled by how spread out that specialty is, using a robust z-score built on the median and the median absolute deviation (MAD), not the mean and standard deviation. Why robust? Because the outliers we are hunting would themselves inflate a mean-and-standard-deviation benchmark, hiding behind the distortion they create. The median barely moves when a few billers go extreme.

peer = df.groupby("Rndrng_Prvdr_Type")["srvcs_per_bene"]
median = peer.transform("median")
mad    = peer.transform(lambda s: (s - s.median()).abs().median())
n      = peer.transform("size")

# 0.6745 puts MAD on the same scale as a standard deviation
df["robust_z"]   = (0.6745 * (df["srvcs_per_bene"] - median) / mad).where(mad > 0)
df["peer_median"] = median.round(2)

leads = df[(n >= 30) & (df["Tot_Srvcs"] >= 20) & (df["robust_z"] >= 3.5)]
leads = leads.sort_values("robust_z", ascending=False)

The three filters in the last line are the judgment of the screen: benchmark only specialties with enough peers to be stable (n >= 30), ignore providers without material volume (Tot_Srvcs >= 20), and flag intensity that sits three and a half robust deviations above the peer median. The full, runnable version of all of this, fetch, score, rank, write, is medicare_outlier_screen.py, shipped with this article.

Step 4: run it, and read it honestly

python3 medicare_outlier_screen.py --hcpcs 36902 --min-peers 30 --z 3.5

On a recent run of the national data, the screen read 1,653 providers for HCPCS 36902 and returned 56 leads. Here is the top of that list, with every identity and re-identifying detail generalized, because naming a provider from a lead-stage screen is exactly the error this article exists to warn against:

Lead	Specialty	Procedures / patient	Specialty peer median	Robust z
Provider A	Diagnostic Radiology	3.5	1.2	14
Provider B	Vascular Surgery	3.3	1.3	11
Provider C	Nephrology	2.8	1.2	11
Provider D	Nephrology	2.6	1.2	9
Provider E	General Surgery	2.4	1.3	9
Provider F	Diagnostic Radiology	2.6	1.2	9

Read that table the way a forensic accountant does, not the way a headline writer does. Provider A performs this procedure about three times as often per patient as the typical radiologist who bills it. That is a real, reproducible, public-data anomaly. It is also not evidence of anything yet. It is the first sentence of a question.

Step 5 (scale): the same screen over the whole file with DuckDB

The API call is perfect for one code. When you want every code, every year, or the full multi-gigabyte download, switch the data layer to DuckDB, which queries the file in place without loading it into memory, the same engine the Medicaid screen used over a quarter-billion rows:

import duckdb
con = duckdb.connect()
con.execute("""
    SELECT Rndrng_NPI, Rndrng_Prvdr_Type,
           SUM(Tot_Srvcs)*1.0 / SUM(Tot_Benes) AS srvcs_per_bene
    FROM 'medicare_provider_service.parquet'
    WHERE HCPCS_Cd = '36902'
    GROUP BY 1, 2
""").df()

If you can write that query, you can run a market-wide screen on a laptop. The scoring logic from Step 3 sits on top unchanged.

Step 6: the cross-checks that add context

Two free joins turn an intensity outlier into a sharper question. The HHS-OIG (the Department of Health and Human Services Office of Inspector General) List of Excluded Individuals and Entities (LEIE) flags any provider barred from federal programs, join it on NPI (National Provider Identifier) and look for billing after an exclusion date. And CMS Open Payments records the money device and drug makers pay physicians, a high intensity of a procedure that uses a specific manufacturer’s hardware, paired with large payments from that manufacturer, is the shape of a kickback question. Neither join proves anything. Both tell you where a records request would be justified.

Reading the output: why most leads should clear

This is the part the cases turn on. Each of those six leads has an innocent story that the data cannot rule out, and a forensic practitioner is obligated to consider it before escalating:

The referral magnet. A radiologist or vascular surgeon who runs the regional dialysis-access program will see the hardest, highest-intensity cases by design. High volume of a legitimate service is what a center of excellence looks like from the outside.
The sicker panel. Some patients genuinely need repeat interventions to keep a dialysis access functioning. The screen sees procedures per patient; it cannot see whether the patient needed them.
The coding artifact. Specialty labels are coarse. An “interventional” subspecialty buried inside a broad specialty code will look anomalous against its mislabeled peers.

This is the Integra problem restated in your own output: a number that is “consistent with both” a problem and good, lawful medicine. The screen cannot break that tie. Only the next layer, medical records, the operative notes, the patient’s clinical history, the program rules, can. A screen that escalates every outlier is not analysis; it is a machine that confirms whatever you went looking for. The professional standard, the one the ACFE keeps restating, is to be able to write “this cleared” with the same rigor you bring to “this merits a records request.”

What this screen cannot see

The public file has no diagnoses, no medical necessity, no operative notes, no modifiers, no beneficiary identity. It cannot prove a procedure was unwarranted, that a patient did not need it, or that anyone acted knowingly, and “knowingly” is the entire liability question under the False Claims Act. It produces leads. The last mile, proving a knowing false claim, needs records, often an insider, a whistleblower attorney, and years of patience. That is the honest boundary, and it is the same boundary the relators in the offense piece had to cross.

Which is the point of the whole series. The barrier to finding a billing anomaly has collapsed: it is a hundred lines of Python and a free dataset, and you just built it. The barrier to proving fraud is exactly as high as it has always been. The numbers find the lead; forensic work and counsel make the case. If you are on the buyer’s side of a deal instead, the same screen becomes a diligence instrument, Don’t Inherit the Fraud points it at the company you are about to acquire.

By Noah Green CPA CFE, for Sheepdog Prosperity Partners. This walkthrough describes a public-records research method; it makes no allegation against any person or entity, and all provider-level specifics have been generalized. Code is illustrative and provided as-is.

Primary sources & tools: CMS Medicare Physician & Other Practitioners · DuckDB · HHS-OIG Exclusions (LEIE) · CMS Open Payments · ACFE Report to the Nations · ACFE Fraud Tree · DOJ: Lincoln Analytics / vascular settlement · Integra Med Analytics dismissal

DD Tech Lab: Build a Medicare Outlier Screen, Step by Step

The method, and the trap built into it

Step 1 (beginner): get the data

Step 2 (intermediate): the intensity metric, and your denominator

Step 3 (intermediate): score against peers, robustly

Step 4: run it, and read it honestly

Step 5 (scale): the same screen over the whole file with DuckDB

Step 6: the cross-checks that add context

Reading the output: why most leads should clear

What this screen cannot see

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Sheepdog Prosperity Partners LLC

Contact

Schedule