Not legal advice; educational only. Cases are described according to their public posture, and civil settlements resolve allegations without an admission of liability unless a court found otherwise.


The public-data screen produces a ranked list of providers who bill far above their peers. It is tempting to read that list as a list of defendants. It is not. It is a list of questions, and the law has been unusually clear about why. This article is the reality-check that keeps the whole enterprise honest: most cases built on statistics alone lose, and the reasons they lose are also the blueprint for the ones that win.

The cautionary tale is Integra Med Analytics, the data-forensics firm whose entire business was mining public Medicare claims for upcoding outliers. It lost its two flagship appeals. Understanding exactly why is the most useful legal knowledge a data-driven relator can have.

The doctrine that sinks most data cases: the “obvious alternative explanation”

When the Fifth Circuit affirmed dismissal of Integra’s case against Baylor Scott & White (2020), and the Ninth Circuit ordered Integra’s case against Providence dismissed (2021), neither court said statistics are inadmissible. They said something more precise: a statistical outlier is, by itself, equally consistent with fraud and with a lawful explanation, and that is not enough to state a claim.

This is the Twombly/Iqbal plausibility standard. When a defendant’s conduct has an “obvious alternative explanation” that is lawful, a plaintiff must plead facts that tend to exclude that innocent explanation. A hospital coding more high-severity diagnoses than its peers might be committing fraud, or it might simply be an early, accurate adopter of coding guidance that CMS itself encouraged. Integra’s complaints showed the anomaly but did not plead facts ruling out the lawful story, so they failed. As one court put it, the data was “consistent with both” the hospital cheating and the hospital being better at lawful coding than its peers.

Two things make these rulings even more sobering for a would-be data relator. First, both decisions are unpublished and non-precedential, influential and widely followed, but not binding law, which means the doctrine is still hardening. Second, Integra in one case had more than statistics, a former insider coder, and still lost, because the insider material never tied to a specific, particularized false claim. Partial corroboration is necessary but not sufficient; it has to produce facts.

Rule 9(b): you usually need to point at an actual false claim

False Claims Act fraud must be pleaded “with particularity” under Federal Rule of Civil Procedure 9(b), the “who, what, when, where, and how.” Data inference collides with this head-on, because the analyst typically cannot name a single specific false invoice. Here the courts have genuinely split, and the circuit you file in can decide your case:

Pleading standard What it demands Circuits
“Representative sample” (strict) Identify at least one actual, specific false claim, time, place, claim submitted, submitter 1st, 4th, 6th, 8th, 11th
“Reliable indicia” (lenient) Particular details of a scheme + “reliable indicia” supporting a strong inference that false claims were actually submitted 3rd, 5th, 7th, 9th, 10th, D.C.

The Supreme Court has repeatedly declined to resolve this split (denying review in cases out of the Sixth, Seventh, and Eleventh Circuits), so it is live law. The practical lesson: in a “representative sample” circuit, a complaint that cannot point to even one identified false claim is at acute risk of dismissal at the threshold. And note the trap, even in the lenient circuits, Integra still lost, because “reliable indicia” is not satisfied by statistics that fail to exclude the innocent explanation. The lenient standard helps; it is not a safe harbor.

The public-disclosure bar: the doctrine aimed straight at public data

This is the rule most directly pointed at the outside analyst working public files. Under 31 U.S.C. § 3730(e)(4), a court must dismiss a case if substantially the same allegations were already publicly disclosed, in litigation, a government report or investigation, or the news media, unless the relator is an “original source.”

Does building a case from public CMS or SBA data trip this bar? It depends, and the nuance matters. The Supreme Court held in Schindler Elevator Corp. v. United States ex rel. Kirk (2011) that a federal agency’s written FOIA response is a “report” within the bar, you cannot launder a public-records request into a qui tam suit. But raw data is not automatically a “public disclosure” of the fraud: the bar triggers only when the public source reveals the allegation itself, not merely the raw material from which an analyst might infer one. That gap is exactly why the Integra courts decided on plausibility grounds and expressly declined to reach the public-disclosure question, it was contestable. (The doctrine’s reach into modern data sources is the subject of serious scholarship, see Andrew Nassar, Modern Public Disclosure: Reading “News Media” in the False Claims Act, 123 Colum. L. Rev. (2023), which analyzes the bar against precisely these data-mining facts.)

The escape hatch is the one a data miner must plan for from day one. Since 2010, an “original source” is someone whose knowledge “is independent of and materially adds to” the public information and who voluntarily discloses to the government before filing. Congress deliberately deleted the old “direct knowledge” requirement, which is favorable to outside analysts, your own investigation and analysis can qualify. But the circuits split again on what “materially adds” means (from the relator-friendly Third Circuit’s “essential factual background” to the defense-friendly view that “substantially similar” allegations cannot add anything), and sophisticated proprietary analysis only helps if it transforms public data into a previously-undisclosed, specific allegation. The operational takeaway: make a documented, voluntary, pre-filing disclosure of your analysis to the government, so that even if a public source exists, your work materially adds.

First to file: data anomalies are reproducible, so the clock is brutal

Under § 3730(b)(5), once a relator files, no one else may bring a related action on the same facts. The Supreme Court clarified in Kellogg Brown & Root Services v. United States ex rel. Carter (2015) that this bar is not perpetual, it blocks later suits only while the first is pending. For data miners the risk is acute and specific: a public-data anomaly is, by definition, reproducible by your competitors. Two analysts can independently find the same outlier in the same public file, and only the first to the courthouse recovers. Speed and confidentiality are not niceties; they are the asset.

What the survivors did differently

So what clears the bar? The instructive case is United States ex rel. Customs Fraud Investigations, LLC v. Victaulic Co. (3d Cir. 2016), where a data-driven complaint survived, because the relator paired the statistics with concrete, particular, partly non-public evidence: import records, photographs of specific unmarked physical product, 221 product listings researched over six months, and an expert declaration. The data was the connective tissue; hard facts of actual falsity were the case. And even Victaulic was a 2 to 1 decision, with a dissent warning courts not to be “fooled by the numbers” and that an expert declaration can dress conclusory allegations “in more technical-sounding terms.” Even the best pro-data precedent is contested and fact-specific.

From the wreckage and the survivors, the playbook writes itself:

  1. Add a non-public thread. Insider corroboration, internal documents, physical evidence, or specifics the public data cannot show, and make the corroboration particular, not just “someone was unhappy.”
  2. Affirmatively kill the innocent explanation. Plead facts excluding the “they’re just better at lawful billing” story, for example, records showing the service was medically unsupported, not merely higher-coded. This is the precise gap that sank Integra.
  3. Tie the data to a specific program rule and at least one identified false claim, mandatory in the strict circuits, advisable everywhere.
  4. Lock in original-source status with a documented, voluntary pre-filing disclosure to the government.
  5. File first, and fast, the anomaly is reproducible.
  6. Mind the forum, the “reliable indicia” circuits are friendlier to inference-based pleading, but statistics must still be paired with particular falsity facts.

The government itself says the quiet part out loud

None of this is the firm’s editorializing. When the Justice Department launched its FOCUS initiative in 2026 to court data-miner relators, it said the Department would prioritize working with data miners who demonstrate “an investment in pre-filing diligence and commitment to analytical rigor, familiarity with program rules, and legally sufficient allegations.” The release names no cases, but that message tracks exactly what the Integra line of decisions already teaches: analytical rigor and particularized, legally sufficient allegations, not raw statistics, are what win. The agency recruiting these cases is, in its own words, asking for precisely what the case law demands.

This is also, at bottom, the ACFE‘s discipline of professional skepticism restated as litigation strategy: you withhold belief from the innocent story and the guilty one until the evidence converges. A screen that only escalates is not analysis. The forensic work, converting an outlier into a particularized, knowingly-false-claim allegation that excludes the lawful explanation, is the entire difference between a complaint that gets dismissed and one the government joins. The screen finds the question; the process and the proof answer it. And if you are buying a company rather than investigating one, the same doctrines tell you how exposed a target’s past billing really is, the subject of Don’t Inherit the Fraud.

By Noah Green CPA CFE, for Sheepdog Prosperity Partners. Educational only; not legal advice and not a substitute for qualified counsel. Court decisions are summarized for a general audience; consult the opinions and a lawyer before acting.


Primary sources: 31 U.S.C. § 3730 · Integra Med Analytics v. Baylor Scott & White (5th Cir. 2020) · Integra Med Analytics v. Providence (9th Cir. 2021) · Customs Fraud Investigations v. Victaulic (3d Cir. 2016) · Schindler Elevator v. Kirk (2011) · KBR v. Carter (2015) · DOJ FOCUS initiative (2026) · Andrew Nassar, Modern Public Disclosure, 123 Colum. L. Rev. (2023)