Coarse-band threshold testing catches major asymmetric clusters but misses two refinements requiring dedicated analysis. Bimodality (two distinct peaks in the distribution) means the file behaves as if two different payment systems were mixed together: a normal operating stream plus a second, less legitimate stream riding beside it. Segment-of-one concentration (a single exact amount appearing far more frequently than statistically expected) means one exact dollar amount repeats far more often than it should in a population where most amounts ought to vary naturally; that pattern often points to amounts being chosen rather than calculated.
The full-resolution diagnostic builds the histogram of recorded amounts at fine bin width, compares the empirical distribution to a log-normal baseline via the Kolmogorov-Smirnov goodness-of-fit test, and separately scans the data for exact-amount duplicates that exceed the expected-frequency floor. Each component runs in Excel; Claude augmentation appears at three workflow points — pre-test parameter calibration, mid-test bin-anomaly inspection, and post-test vendor-investigation prioritization. This technique extends the coarse-band threshold detection from Round-Number Bias and Threshold-Avoidance Pattern Detection, providing the fine-resolution complement that picks up subtler manipulation patterns.
Baseline distribution mechanics
Real disbursement-amount distributions in legitimate operating activity are approximately log-normal. The probability density function:
$$f(x; \mu, \sigma) = \frac{1}{x \sigma \sqrt{2\pi}} \exp\left(-\frac{(\ln x – \mu)^2}{2\sigma^2}\right), \quad x > 0$$
with $\mu$ and $\sigma$ being the mean and standard deviation of $\ln x$. For a $5,000 payment, if $\mu = 8$ and $\sigma = 1$, the probability density equals $\frac{1}{5000 \times 1 \times \sqrt{2\pi}} \exp\left(-\frac{(\ln 5000 – 8)^2}{2 \times 1^2}\right) = \frac{1}{12533} \exp(-0.0085) \approx 0.000079$. The empirical fit of $\mu$ and $\sigma$ to a disbursement file uses the natural-log transformation:
' Compute log-transformed amounts in column B:
=LN(A2)
' Estimate mu and sigma from the log-transformed series:
=AVERAGE(B2:B5251) ' returns mu
=STDEV.S(B2:B5251) ' returns sigma
A clean, well-fit log-normal has $\mu$ in the range 7-9 (corresponding to median amounts of $1,000-$8,000 typical of commercial disbursements) and $\sigma$ in the range 0.8-1.3 (corresponding to roughly two orders of magnitude of spread).
Plain-English translation: if the payment file is mostly ordinary operating disbursements, the amounts usually pile up around a middle band and thin out on both sides. The log-normal model is just the statistical version of that common-sense picture.
Claude-assisted parameter calibration
Before constructing the histogram, the examiner leverages Claude to assess whether the entity’s disbursement profile warrants special bin-width treatment. The prompt template:
Entity disbursement profile analysis for [Entity]:
– Total payments in test period: 5,251
– Min amount: $12.35
– Max amount: $487,922.18
– Median amount: $3,182.44
– Mean log(amount): 8.065
– StDev log(amount): 1.021
Recommend bin-width selection for histogram analysis:
1. Assess whether standard Freedman-Diaconis rule applies or if entity characteristics (heavy tail, multi-currency, known recurring payments) warrant adjustment
2. If adjustment warranted, specify alternative (Scott’s rule for lighter tail, custom fixed-width for known recurring-payment amounts)
3. Provide Excel formula for recommended bin-width calculation
This pre-test calibration ensures the histogram captures the relevant structure without analyst-driven tuning.
Histogram construction with expected-count derivation
Bin-width selection starts with three standard rules.
Sturges’ rule. $k = \lceil \log_2 N + 1 \rceil$ bins. For $N = 5{,}000$: $k = 14$ bins.
Scott’s rule. Bin width $h = 3.5 \sigma / N^{1/3}$. For a log-transformed series with $\sigma = 1.0$ and $N = 5{,}000$: $h \approx 0.20$ on the log scale, corresponding to bins spanning a factor of $e^{0.20} \approx 1.22$ on the original scale.
Freedman-Diaconis rule. $h = 2 \cdot \text{IQR} / N^{1/3}$. Less sensitive to outliers than Scott’s rule. For heavy-tailed disbursement data common in audit contexts, Freedman-Diaconis produces more stable histograms by reducing the influence of extreme values on bin width.
For the working Excel practitioner, the default should be simple: use Freedman-Diaconis first unless the file is tiny or unusually clean. It is the safest default for messy payment data because one or two extreme disbursements do less damage to the bin width.
The Excel computation:
' IQR of the log-transformed series:
=QUARTILE.INC(B2:B5251, 3) - QUARTILE.INC(B2:B5251, 1)
' Freedman-Diaconis bin width on the log scale:
=2 * C1 / (5251^(1/3))
Worked example: if Q3 = 8.92 and Q1 = 7.21 on the log scale, the interquartile range IQR = 1.71, yielding a suggested bin width of 2 * 1.71 / 5251^(1/3) = 3.42 / 17.38 = 0.197, or about 0.20 on the log scale. That means adjacent bins differ by about e^0.20, or roughly 22%, on the original dollar scale.
The histogram itself uses Excel’s FREQUENCY function. With bin upper edges in column E (computed on the log scale as $\mu – 3\sigma + j \cdot h$ for $j = 0, 1, \ldots$ until the upper edge exceeds $\mu + 3\sigma$):
' FREQUENCY returns a spill array (Excel 365) applied to log-transformed column B:
=FREQUENCY(B2:B5251, E2:E50)
' Expected count per bin under fitted log-normal (column G):
=5251 * (LOGNORM.DIST(EXP(E3), C1, D1, TRUE) - LOGNORM.DIST(EXP(E2), C1, D1, TRUE))
The expected-count column enables direct comparison with observed counts for the Claude mid-test diagnostic. Note that bin edges are computed on the log scale (column E contains log values), while the FREQUENCY function is applied to the log-transformed payment amounts in column B, and the LOGNORM.DIST function requires exponentiation of the bin edges to return to the original dollar scale. Parameters mu and sigma are stored in cells C1 and D1 respectively.
Kolmogorov-Smirnov implementation with Lilliefors adjustment
K-S compares the empirical cumulative distribution function (ECDF) to a theoretical CDF (in this case, the fitted log-normal). The test statistic is:
$$D = \sup_x |F_n(x) – F(x)|$$
where $F_n$ is the ECDF and $F$ is the theoretical CDF. Under the null that the data follows the theoretical distribution with pre-specified parameters, $D$ has a known distribution that depends on $N$. The Kolmogorov critical value at $\alpha = 0.05$:
$$D_{\text{crit}} = \frac{1.36}{\sqrt{N}}$$
For $N = 5{,}000$, $D_{\text{crit}} \approx 0.0192$. An observed $D > 0.0192$ rejects the log-normal null at conventional significance.
Critical caveat: When parameters $\mu$ and $\sigma$ are estimated from the test sample itself (as in this workflow), the test becomes a Lilliefors test (Lilliefors 1967, p. 399), and the critical values are tighter. For $N = 5{,}000$ with estimated parameters, the Lilliefors critical value at $\alpha = 0.05$ is approximately 0.0125 rather than 0.0192. The approximation formula for large samples (Dallal & Wilkinson 1986, p. 295):
' Lilliefors critical value approximation (Dallal & Wilkinson 1986):
=0.895 / (SQRT(5251) + 0.01 + 0.85/SQRT(5251))
In practice, the auditor does not need to become a distribution theorist here. The operational rule is: if you estimated the parameters from the same file you are testing, use the tighter Lilliefors-style cutoff and record that choice in the workpaper. If the result is only barely over the line, treat it as a screening signal and confirm with the exact-amount and vendor-level diagnostics before escalating.
In Excel, the ECDF computation:
' Sort amounts ascending, then in column F place sorted log-amounts
' ECDF in column G:
=ROW(F2) / 5251
' Theoretical log-normal CDF in column H:
=LOGNORM.DIST(EXP(F2), C1, D1, TRUE)
' Cell-level K-S deviation in column I:
=ABS(G2 - H2)
' Test statistic:
=MAX(I2:I5251)
' Lilliefors critical value at alpha=0.05:
=0.895 / (SQRT(5251) + 0.01 + 0.85/SQRT(5251))
The companion workbook contains pre-populated synthetic data with seed=42 deterministic generation, demonstrating $D = 0.046$ for a population with injected threshold-avoidance and round-amount anomalies. The sort order must be on either original amounts or log-transformed amounts consistently; the workbook uses log-transformed amounts throughout.
Claude prompt library excerpt for bin-anomaly inspection
The companion workbook’s Prompt Library tab contains three diagnostic templates. The mid-test bin-anomaly inspection prompt:
Histogram anomaly detection for [Entity] disbursements [Period]:
Bin_Lower | Bin_Upper | Observed | Expected
$850 | $1,040 | 142 | 138
$1,040 | $1,271 | 168 | 165
$1,271 | $1,554 | 201 | 194
…
$8,177 | $9,999 | 487 | 241
$9,999 | $12,220 | 58 | 198
$12,220 | $14,938 | 144 | 151
Flag bins where:
1. Observed exceeds Expected by factor >2.0
2. Observed falls below Expected by factor <0.5
3. Adjacent bins show opposite deviations (suggesting artificial boundary)
For each flag, provide interpretation keyed to PCAOB AS 2401.94 (fraud risk factors related to override of controls). Reference [Entity]’s documented approval limits at $10,000 and $25,000.
Implementation reveals both the $8,177-$9,999 spike (threshold avoidance below $10K limit) and the $9,999-$12,220 deficit (suppression just above the limit). Each identified anomaly maps to a specific control weakness: the spike indicates override of the purchase-order approval-limit control (COSO Principle 11 — selecting and developing control activities that mitigate risks), while the deficit indicates potential collusion to split transactions below the threshold, bypassing the three-way match control (COSO Principle 10 — selecting and developing control activities). Similar patterns appeared in SEC v. Computer Associates (2004), where expense-timing manipulation produced comparable histogram anomalies at quarter-end cutoffs.
Exact-amount concentration scanning
K-S detects aggregate-distribution anomalies but does not isolate specific exact-value clusters. For segment-of-one detection, examine counts of each unique amount:
' Column J: unique amounts (using UNIQUE in modern Excel):
=UNIQUE(A2:A5251)
' Column K: count of each unique amount:
=COUNTIF($A$2:$A$5251, J2)
' Column L: flag if count exceeds threshold:
=IF(K2 > 3, "FLAG", "")
Under a continuous distribution, the expected count of any specific exact amount is approximately zero. In real disbursement data, some exact amounts naturally recur (rent, leases, subscriptions, retainers); these produce legitimate exact-amount counts.
The diagnostic flags amounts where the count materially exceeds the legitimate-recurrence baseline. For an unknown amount with no obvious recurring-vendor explanation, a count exceeding 3 in a 5,000-payment file warrants investigation. For amounts matching known recurring patterns, the threshold calibrates against the entity’s vendor master.
The vendor-investigation prompt from the Prompt Library:
Segment-of-one vendor investigation for [Entity]:
– Amount: $5,000.00
– Count in population: 87
– Expected count under fitted log-normal: <1
– Vendors with this exact amount:
– ABC Consulting LLC (42 payments)
– XYZ Services Inc (31 payments)
– Professional Advisory Group (14 payments)
Generate investigation protocol per ACFE Fraud Examiners Manual §3.506 (Shell Company Schemes):
1. OpenCorporates lookup: incorporation date, registered agent, officer overlap with [Entity] employees
2. SEC EDGAR search: any [Entity] employee names appearing as officers/directors of these vendors
3. State business registry: active/inactive status, filing delinquencies
4. Cross-reference vendor addresses against [Entity] employee home addresses
5. Bank endorsement review: sample 10 checks to each vendor, examine endorsement patterns
Frame findings as investigation hypotheses per AICPA AU-C 240.A11, not conclusions.
The output produces the full vendor-validation packet for downstream OSINT execution.
What goes into the workpaper
At minimum, the workpaper should preserve four things per AS 2401.94 documentation requirements:
- the chosen bin-width rule and the actual bin width used,
- the fitted
muandsigma, - the K-S or Lilliefors screening result, and
- the exact-amount vendors or bins selected for follow-up.
Example workpaper entry: “Freedman-Diaconis width = 0.20 on the log scale; fitted log-normal rejected at Lilliefors screening threshold (D = 0.046 versus critical value 0.0125); $5,000.00 repeated 87 times across three vendors flagged for investigation; follow-up expanded to vendor-master validation and source-document pull under AS 2401.94 fraud-risk-factor protocol.”
Bin-width sensitivity documentation
Histogram-of-counts conclusions depend on bin-width choice. Manipulation of bin width could produce or suppress apparent peaks. The workpaper discipline preventing this: commit to Freedman-Diaconis (or document rationale for alternative) before examining the data, and record the bin width as a procedural parameter alongside test results.
For robustness, run a second pass at Scott’s bin width and confirm K-S statistic stability. Instability across bin widths signals artifact sensitivity requiring deeper review.
' Named function BIN_SENSITIVITY for bin-width sensitivity test (requires Excel 365 for LAMBDA/LET/SEQUENCE):
' Returns ratio of maximum to average bin count as peakedness measure
=LAMBDA(bin_width,
LET(
bins, SEQUENCE(50, 1, MIN(B2:B5251), bin_width),
counts, FREQUENCY(B2:B5251, bins),
MAX(counts) / AVERAGE(counts)
)
)
The named function BIN_SENSITIVITY returns the ratio of maximum to average bin count, quantifying histogram peakedness. Applied across a range of bin widths, it documents sensitivity to methodological choice. Note: This function requires Excel 365 for LAMBDA, LET, and SEQUENCE support. If the team is on an older Excel version, the same check can be done with a short manual table of candidate bin widths and observed peak ratios.
Operational failure modes requiring calibration
Three patterns produce false positives. The analyst who ignores these will generate noise that wastes investigative time — a downside that experienced practitioners avoid through proper pre-test stratification.
Multi-currency conversion artifacts. Exchange-rate conversion produces apparent exact-amount duplicates when source amounts share common rounding. This failure mode is particularly acute for entities with subsidiaries reporting in different functional currencies — the consolidated file shows clustering that disappears when each currency is tested separately. Solution: stratify by transaction currency before testing, or examine source-currency amounts directly.
The simplest Excel implementation is usually enough:
=FILTER(tblPayments[Amount], tblPayments[Currency]="USD")
If the file is too large for dynamic arrays, sort by currency and run the histogram once per currency block. The key point is procedural, not elegant: do not let USD, EUR, and GBP payments share the same baseline if exchange-rate rounding is part of the posting flow.
Legitimate recurring payments. Software licenses, facility leases, and professional retainers produce valid exact-amount recurrence. In practice, failure to exclude known-recurring vendors generates investigative false positives that degrade the test’s operational value — a predictable downside for any examiner who has run this test without proper vendor-master calibration. Solution: maintain a recurring-payment register and exclude documented recurrences from flagging.
ERP posting-rule rounding. Some systems apply automatic rounding at journal entry, creating pseudo-round amounts without deliberate choice. SAP’s automatic penny-rounding on certain tax calculations is a classic example that produces histogram spikes unrelated to fraud. Solution: document the ERP’s posting configuration and adjust expected-count baselines accordingly.
Each false-positive pattern maps to a detective control that should have caught it: multi-currency artifacts should trigger the foreign-currency translation control (COSO Principle 12 — deploying through policies and procedures), recurring payments should appear on the recurring-vendor exception report (segregation of duties control), and ERP rounding should be documented in the system-configuration audit trail (IT general controls).
Integration considerations
Frequency-of-amount analysis complements coarse-band threshold testing. The coarse approach catches structural distortions at known approval limits via $500 bands. This article’s method catches both fine-scale bimodality (two generative processes) and segment-of-one fabrication (repeated exact amounts). Together they span the amount-distribution anomaly space documented in ACFE’s Report to the Nations occupational fraud taxonomy.
References
Distribution and goodness-of-fit foundations:
- Kolmogorov, A. (1933). “Sulla determinazione empirica di una legge di distribuzione.” Giornale dell’Istituto Italiano degli Attuari, 4, 83-91.
- Smirnov, N. (1948). “Table for Estimating the Goodness of Fit of Empirical Distributions.” The Annals of Mathematical Statistics, 19(2), 279-281.
- Lilliefors, H.W. (1967). “On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown.” Journal of the American Statistical Association, 62(318), 399-402.
- Dallal, G.E., & Wilkinson, L. (1986). “An Analytic Approximation to the Distribution of Lilliefors’s Test Statistic for Normality.” The American Statistician, 40(4), 294-296.
- Aitchison, J., & Brown, J.A.C. (1957). The Lognormal Distribution. Cambridge University Press.
Bin-width rules:
- Sturges, H.A. (1926). “The Choice of a Class Interval.” Journal of the American Statistical Association, 21(153), 65-66.
- Scott, D.W. (1979). “On Optimal and Data-Based Histograms.” Biometrika, 66(3), 605-610.
- Freedman, D., & Diaconis, P. (1981). “On the Histogram as a Density Estimator: L2 Theory.” Probability Theory and Related Fields, 57(4), 453-476.
Audit and forensic-accounting literature:
- ACFE Fraud Examiners Manual (2024 ed.), §3.5 (Vendor Schemes), §3.506 (Shell Company Schemes).
- Nigrini, M.J. (2012). Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. Wiley.
- ACFE (2024). Report to the Nations: 2024 Global Study on Occupational Fraud and Abuse.
- COSO (2013). Internal Control — Integrated Framework, Principles 10-12 (Control Activities).
Audit standards:
- PCAOB AS 2305 — Substantive Analytical Procedures.
- PCAOB AS 2401 — Consideration of Fraud in a Financial Statement Audit, ¶94 (fraud risk factors).
- AICPA AU-C 240 — Consideration of Fraud in a Financial Statement Audit, ¶A11 (professional skepticism).
- LN, EXP, AVERAGE, STDEV.S, QUARTILE.INC, FREQUENCY, UNIQUE, COUNTIF, LOGNORM.DIST, LAMBDA, LET, SEQUENCE.
Companion workbook: with pre-populated synthetic data (seed=42), containing: Data tab (5,251 rows), Logs tab (LN transforms + parameter estimates), Histogram tab (Freedman-Diaconis bins + FREQUENCY + expected counts), KS tab (sorted ECDF + LOGNORM.DIST + MAX deviation), Duplicates tab (UNIQUE + COUNTIF + flag threshold), Prompt Library tab (Claude templates with cell references).
