Linear Regression for Outlier Detection in Excel: Building the Standardized-Residual Workpaper for Expense-Account Analytics

When utility costs scale with square footage, sales commissions track revenue, or freight charges follow tonnage shipped, the auditor needs more than ratio analysis to spot anomalies. PCAOB AS 2305.15 explicitly endorses detailed expectations over broad comparisons: “expectations developed at a detailed level generally have a greater chance of detecting misstatement.” The regression line through driver-expense data provides that precision. Records deviating beyond calibrated thresholds become substantive-procedure candidates.

This article constructs the complete standardized-residual workpaper in Excel with Claude augmentation at three critical junctures: initial data validation prompts, mid-analysis anomaly interpretation, and final investigation memo drafting. The deliverable spans regression setup via LINEST, residual standardization with explicit degrees-of-freedom accounting, conditional formatting for outlier flagging, and FILTER-based extraction of flagged records for escalation.

Regression precision versus ratio benchmarks

Consider two approaches to analyzing $12M in aggregate utility expense across 500 retail stores. The ratio approach divides total utilities by total revenue, compares to prior year or industry benchmark, and flags the variance. The regression approach plots utilities against square footage store-by-store, fits the line, and flags individual stores whose residuals exceed three standard errors. The first catches aggregate-level misstatement; the second catches store-level anomalies that cancel in aggregation.

AS 2305.14-.16 frames this distinction through “precision of the expectation.” The standard requires the auditor to evaluate whether the expectation can detect misstatement at the tolerable level. For a $50,000 tolerable misstatement in a $12M expense account spanning 500 locations, the aggregate ratio test has low power — a 0.4% variance might reflect normal business fluctuation. The regression residual test has high power at the individual-location level where a $50,000 deviation represents multiple standard errors from expectation.

Excel regression architecture with Claude pre-flight validation

Before running LINEST, the analyst confirms data integrity through a Claude-augmented validation prompt. Store the following in cell J1 as a named range “DataValidationPrompt”:


Review this driver-expense dataset for regression readiness:
- Driver range: [MIN] to [MAX] [UNITS]
- Expense range: $[MIN] to $[MAX]
- Record count: [N]
- Missing values: [COUNT]
- Potential outliers via IQR method: [LIST]

Flag concerns: zero/negative values inappropriate for the account, excessive clustering at round numbers suggesting data manipulation, driver-expense correlation below 0.3 suggesting wrong driver selected.

Map detected anomalies to audit assertions: existence (ghost transactions), accuracy (miscalculated amounts), completeness (missing records).

The analyst populates the bracketed values from Excel summary statistics, submits to Claude, and receives pre-flight warnings before proceeding to regression. When driver-expense correlation falls below 0.3, the auditor documents the weak relationship in the workpaper and considers alternative drivers per AS 2305.16’s requirement to evaluate the precision of the expectation.

Column structure and LINEST implementation

The workpaper uses this invariant column mapping:

Column	Contents	Formula
A	Driver $x_i$ (e.g., square footage)	Input data
B	Observed expense $y_i$	Input data
C	Fitted value $\hat{y}_i$	`=$F$1*A2+$G$1`
D	Raw residual $e_i$	`=B2-C2`
E	Standardized residual $r_i$	`=D2/$H$1`
F1	Slope $\beta_1$	`=INDEX(LINEST($B$2:$B$501,$A$2:$A$501,TRUE,TRUE),1,1)`
G1	Intercept $\beta_0$	`=INDEX(LINEST($B$2:$B$501,$A$2:$A$501,TRUE,TRUE),1,2)`
H1	Standard error $s_e$	`=INDEX(LINEST($B$2:$B$501,$A$2:$A$501,TRUE,TRUE),3,2)`
I1	R-squared	`=INDEX(LINEST($B$2:$B$501,$A$2:$A$501,TRUE,TRUE),3,1)`

For Microsoft 365 users, the complete LINEST spill formula in F1:J5:


=LINEST($B$2:$B$501,$A$2:$A$501,TRUE,TRUE)

This 5×2 array contains: [slope, intercept] in row 1; [SE(slope), SE(intercept)] in row 2; [R², $s_e$] in row 3; [F-statistic, df] in row 4; [SSR, SSE] in row 5. The degrees of freedom in row 4, column 2 equals n-2 for simple regression, critical for t-distribution calculations when assessing coefficient significance.

Residual taxonomy with explicit leverage adjustment

The practitioner encounters three residual formulations:

Raw residual: $e_i = y_i – \hat{y}_i$ measures absolute deviation in the same units as the expense (dollars).

Internally studentized residual (with leverage):

$$r_i = \frac{e_i}{s_e\sqrt{1 – h_{ii}}}$$

where $h_{ii}$ is the $i$-th diagonal of the hat matrix $\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T$. In plain terms, the hat matrix measures how much each observation “pulls” on the regression line based on its driver value distance from the center — observations far from the mean driver value have higher $h_{ii}$ (more pull) and need their residuals adjusted upward to be fairly compared to typical observations. For simple regression, $h_{ii} = \frac{1}{n} + \frac{(x_i – \bar{x})^2}{\sum(x_j – \bar{x})^2}$.

Worked numerical example. Suppose a retail panel with $n=500$ stores, mean square footage $\bar{x} = 12{,}000$, and sum of squared deviations $\sum(x_j – \bar{x})^2 = 1.2 \times 10^{11}$. For a typical 15,000-sq-ft store: $h_{ii} = \frac{1}{500} + \frac{(15{,}000 – 12{,}000)^2}{1.2 \times 10^{11}} = 0.002 + 0.000075 = 0.002075$. For a mega-store at 80,000 sq ft: $h_{ii} = \frac{1}{500} + \frac{(80{,}000 – 12{,}000)^2}{1.2 \times 10^{11}} = 0.002 + 0.0385 = 0.0405$. The leverage adjustment factor $\sqrt{1 – h_{ii}}$ becomes $\sqrt{0.998} = 0.999$ for the typical store and $\sqrt{0.960} = 0.980$ for the mega-store — a 2% difference in the standardization that matters only when chasing the highest-leverage outliers.

Practitioner’s simplified standardized residual: $r_i^* = e_i / s_e$ omits leverage adjustment.

For audit panels of 50-500 observations with moderate driver dispersion, the leverage term $(1 – h_{ii})$ typically ranges from 0.990 to 0.996, making the adjustment negligible for outlier detection. The simplified formula $r_i^* = e_i / s_e$ suffices for the workpaper. When the panel contains extreme driver values (e.g., one mega-store among typical stores), compute the full leverage-adjusted formula:


=D2/(H$1*SQRT(1-(1/COUNT($A:$A)+(A2-AVERAGE($A:$A))^2/SUMPRODUCT(($A$2:$A$501-AVERAGE($A:$A))^2))))

Note: When faced with curved relationships requiring polynomial regression, build the driver matrix with both linear and squared terms as separate columns. First create the squared values in column C (=A2^2), then reference both columns A and C in LINEST:


=LINEST(B2:B501, A2:C501, TRUE, TRUE)

This tells Excel to fit: Expense = β₁×(Square Footage) + β₂×(Square Footage)² + β₀.

Worked polynomial example. A 15,000 sq ft store with the squared column populated would have A2=15000 and C2=225,000,000. If LINEST returns slope coefficients β₁=$0.85/sq ft and β₂=$0.00003/sq ft² with intercept β₀=$1,200, the fitted utility expense is $0.85 × 15,000 + $0.00003 × 225,000,000 + $1,200 = $12,750 + $6,750 + $1,200 = $20,700. The quadratic term ($6,750 component) captures the rising-cost-per-sq-ft behavior that mega-stores exhibit due to HVAC complexity, parking-lot lighting, and 24-hour security — costs the linear model would attribute to a single misfit residual.

Threshold calibration via false-discovery control

The three-sigma rule flags observations where $|r_i^*| > 3$. Under normality, $P(|Z| > 3) \approx 0.0027$, yielding expected false positives $E[\text{FP}] = 0.0027n$. For $n = 500$, expect 1.35 false flags.

Sample-size-adjusted threshold maintains $E[\text{FP}] = 1$:

$$\tau_n = \Phi^{-1}\left(1 – \frac{1}{2n}\right)$$

Excel implementation:


=NORM.S.INV(1-1/(2*COUNT($A$2:$A$501)))

Sample size	Threshold $\tau_n$	Excel formula result
100	2.58	`=NORM.S.INV(0.995)`
500	3.09	`=NORM.S.INV(0.999)`
2,000	3.48	`=NORM.S.INV(0.99975)`

Workpaper disposition. Every observation exceeding $\tau_n$ in absolute value becomes a direct input to the AS 2305.20 substantive testing sample for the affected expense account. The auditor copies the flagged record (location ID, driver value, recorded amount, expected amount, standardized residual, period) into the substantive-testing memo, vouches the recorded amount to its source document, and documents the resolution under AS 2305.21. When the absolute residual exceeds twice the threshold, the auditor additionally considers whether AS 2401 fraud-risk indicators apply and escalates to the engagement partner per the firm’s brainstorming-session protocol.

Claude-augmented anomaly interpretation mid-workflow

After flagging occurs but before investigation begins, Claude interprets the statistical finding. Store this prompt template in cell K1:


Analytical procedure result:
Account: [ACCOUNT] | Period: [PERIOD]
Regression: Expense = [SLOPE] × [DRIVER] + [INTERCEPT]
R² = [R2] | Standard error = [SE] | n = [N]

Flagged observation:
Location: [ID] | Driver: [X] | Actual: [Y] | Expected: [YHAT]
Standardized residual: [R_STD] (threshold: [TAU])

Interpret this deviation considering:
1. Business context (seasonal patterns, one-time events, location-specific factors)
2. Statistical significance (residual magnitude, influence diagnostics)
3. Audit implications (risk of material misstatement at assertion level — existence, accuracy, completeness, classification, cutoff)

Do not conclude fraud; frame as requiring investigation per AS 2305.20.

The Claude response guides whether to pursue vouching, inquiry, or recalculation procedures. When standardized residual exceeds the threshold, the auditor escalates per AS 2401.66-.67 (formerly SAS 99), which requires specific responses to fraud risks including “examining journal entries for evidence of possible material misstatement due to fraud” and “reviewing accounting estimates for biases.”

Complete workpaper with FILTER extraction

The three-section workpaper structure:

Section 1: Regression diagnostics (rows 1-8)


A1: "Regression Summary"
A2: ="Slope: "&TEXT(F1,"#,##0.00")
A3: ="Intercept: "&TEXT(G1,"$#,##0")
A4: ="R-squared: "&TEXT(I1,"0.000")
A5: ="Std Error: "&TEXT(H1,"$#,##0")
A6: ="Sample size: "&COUNT(A:A)
A7: ="Threshold (tau): "&TEXT(NORM.S.INV(1-1/(2*COUNT(A:A))),"0.00")
A8: ="Flagged count: "&COUNTIF(E:E,">"&NORM.S.INV(1-1/(2*COUNT(A:A))))+COUNTIF(E:E,"<-"&NORM.S.INV(1-1/(2*COUNT(A:A))))

Section 2: Data and residuals (starting row 10)

Headers in row 10, data from row 11 onward with conditional formatting:


=OR(E11>$A$7, E11<-$A$7)

Section 3: Flagged records (starting row 520)

Modern Excel (365/2021) with FILTER:


=FILTER(A11:E510, ABS(E11:E510)>A7, "No outliers detected")

Legacy Excel array formula (Ctrl+Shift+Enter):


=IFERROR(INDEX($A$11:$E$510,SMALL(IF(ABS($E$11:$E$510)>$A$7,ROW($E$11:$E$510)-10),ROW(A1)),COLUMN(A1)),"")

Workpaper review note (row 530): “Manager review signature: _____ Date: _____ | Substantive procedures performed on all flagged items per AS 2305.20-21”

Synthetic data generation with deterministic output

Rather than volatile RAND() functions, the workbook uses a VBA seed or paste-as-values approach for reproducibility. VBA seeding:


Sub GenerateDeterministicData()
    Randomize 42  ' Seed for reproducibility
    Dim i As Long
    For i = 2 To 501
        Cells(i, 1).Value = 5000 + Rnd * 45000  ' Square footage
        Cells(i, 2).Value = 1.2 * Cells(i, 1).Value + 200 + _
            Application.WorksheetFunction.Norm_S_Inv(Rnd) * 300
    Next i
    ' Inject anomaly at row 247
    Cells(247, 1).Value = 8500
    Cells(247, 2).Value = 18400
End Sub

Non-VBA alternative (formulas only — works in any Excel version including web). For practitioners without VBA access (Excel for Mac without Developer tab, Excel on the web, locked-down enterprise installs):


A2: =5000+RANDBETWEEN(0,45000)
B2: =1.2*A2+200+NORM.S.INV(RAND())*300

Fill A2:B501 down, then immediately Copy → Paste Special → Values onto the same range. The volatile RAND/RANDBETWEEN calls collapse to fixed values. Manually edit row 247: A247=8500, B247=18400 to inject the demonstration anomaly. This produces a reproducible dataset without macro permissions. The trade-off versus the VBA approach: each practitioner generates their own seed-equivalent values (the Paste Special timestamp acts as the seed), so two practitioners running the workbook fresh will produce different numerical values but identical analytical structure — the regression slope stays near 1.2, R² near 0.94, and the row-247 anomaly always flags.

Claude post-analysis investigation memo

The final Claude augmentation drafts the investigation memo for each flagged record. Enhanced prompt stored in cell L1:


Draft investigation memo per AS 2305.20 for this flagged analytical procedure result:

ENTITY: [Store/Location ID]
ACCOUNT: [Expense account name and GL code]
PERIOD: [Month/Quarter ending]

ANALYTICAL PROCEDURE PERFORMED:
- Regression of [expense] on [driver] across [N] locations
- Model: y = [SLOPE]x + [INTERCEPT], R² = [R2]
- Detection threshold: |standardized residual| > [TAU] (=[NORM.S.INV formula result])

FLAGGED OBSERVATION:
- Driver value: [X] [units]
- Recorded amount: $[Y]
- Expected amount: $[YHAT]
- Standardized residual: [R_STD]
- Deviation: $[Y-YHAT] ([%] of expected)
- Cook's D influence (a combined measure of how much the regression line would shift if this observation were removed — D > 1.0 indicates the point materially changes the fit): =((R_STD^2)/2) * (h_ii/(1-h_ii)) where h_ii = 1/n + (x-xbar)^2/sum((x-xbar)^2)

REQUIRED FOLLOW-UP PROCEDURES:
1. Vouch to supporting documentation: Examine [specific source documents - invoices, contracts, timesheets]
2. Inquiry of management: Document explanation for deviation at [location] during [period]
3. Recalculation: Reperform regression excluding this observation to assess influence
4. Extended testing if warranted: Sample [subsequent period] transactions at same location

DOCUMENTATION REQUIREMENTS (AS 2305.21):
- This memo
- Regression workpaper with formulas visible
- Source documents obtained
- Management inquiry responses
- Conclusion on whether deviation indicates misstatement

Do not conclude on misstatement; state only that substantive testing is required.

The output becomes part of the audit documentation, edited for entity-specific context and firm standards.

Demonstration with deterministic output

The worked example produces these exact results when seeded appropriately:

Regression output:

Slope: $1.197 per sq ft
Intercept: $219.7
R² = 0.941
Standard error: $605.3

Anomaly detection:

Store at row 247: 8,500 sq ft, $18,400 utilities
Expected: $10,394 (= 1.197 × 8,500 + 219.7)
Residual: $8,006
Standardized residual: 13.22 (threshold: 3.09)

The only flagged observation in 500 records matches the injected anomaly, demonstrating both sensitivity and specificity.

Diagnostic limitations and remedies

Curvilinear relationships manifest as systematic patterns in the residual plot. Store-level margins analysis may follow a quadratic pattern where economies of scale plateau at medium-size locations then reverse for mega-stores facing complexity costs. Remedy: polynomial terms or log transformation.


' Quadratic term for curved relationship (proper array construction):
=LINEST(B2:B501, A2:C501, TRUE, TRUE)

Heteroscedasticity (residual variability that grows or shrinks systematically with the driver — visually, a funnel-shaped scatter where small-driver residuals cluster tightly and large-driver residuals spread widely) violates the constant-variance assumption that LINEST’s standard error relies on. When present, the practitioner’s standardized-residual thresholds become too tight in low-driver regions (causing false positives) and too loose in high-driver regions (causing false negatives). The practitioner’s Breusch-Pagan approximation tests whether squared residuals correlate with the driver:


' Auxiliary regression of squared residuals on driver:
=RSQ(D2:D501^2, A2:A501)*COUNT(A:A)
' Compare to CHISQ.INV.RT(0.05,1) = 3.841

The full Breusch-Pagan test regresses squared residuals on all original predictors, computes n×R² from that auxiliary regression, and compares to chi-square critical values with degrees of freedom equal to the number of predictors. Remedy: weighted least squares or log(expense) transformation.

Driver misspecification creates persistent false negatives when the true driver is absent from the model. Heating costs regressed only on square footage miss the climate-zone variable, causing cold-climate stores to appear as false positives every period while warm-climate stores never flag despite potential fraud. Document this limitation explicitly: “Model assumes single-driver relationship; multi-factor variation may mask targeted manipulation in favorable-variance locations.”

Multiple drivers leave unexplained variation. LINEST handles up to 64 independent variables:


=LINEST(B2:B501, A2:F501, TRUE, TRUE)

where columns A through F contain different drivers.

Integration with the DD Tech Lab series

This regression-residual approach complements:

Benford’s Law in Excel: Setting Up the First-Digit Test for Mass-Transaction Screening‘s Benford digit analysis for population-wide screening
Same-Same-Different in Excel: Detecting Identical-Value-Different-Date Patterns Across Vendor Files‘s same-same-different categorical matching
Geospatial Address Mapping in Excel: Vendor-Employee Address Overlap and Risk Clustering‘s geographic cluster detection (forthcoming)
OSINT for Financial Fraud in Excel: Integrating Public-Source Verification Into the Workpaper Workflow‘s employee-vendor overlap analysis referencing Same-Same-Different in Excel: Detecting Identical-Value-Different-Date Patterns Across Vendor Files‘s framework

Each technique targets distinct anomaly patterns. The regression residual excels at continuous-variable relationships where a driver logically explains the expense. Benford catches manipulation in reported amounts. Same-same-different catches duplicate or recycled transactions. Together they form the analytical procedure toolkit.

References

Statistical foundations:

Cook, R.D. (1977). “Detection of Influential Observation in Linear Regression.” Technometrics, 19(1), 15-18.
Belsley, D.A., Kuh, E., & Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley. Specifically Chapter 2 (leverage) and Chapter 3 (influence measures).
Weisberg, S. (2014). Applied Linear Regression (4th ed.). Wiley. Pages 187-192 (internally versus externally studentized residuals).
Chatterjee, S., & Hadi, A.S. (2012). Regression Analysis by Example (5th ed.). Wiley. Section 4.5 (residual analysis).

Audit standards and practice aids:

PCAOB AS 2305.14-.21 — Substantive Analytical Procedures. Paragraph .15 (precision of expectation), .20 (investigation of significant differences), .21 (documentation).
PCAOB AS 2401.66-.67 — Consideration of Fraud in a Financial Statement Audit. Paragraph .66 (examining journal entries), .67 (reviewing accounting estimates for biases).
AICPA AU-C Section 520.05-.06 — Analytical Procedures. Paragraph .A7 (regression analysis as sophisticated analytical procedure).
AICPA Audit Data Analytics Guide (2017). Chapter 4 (regression diagnostics in audit context).

LINEST array function: Microsoft Support Article KB 828533 (complete formula syntax and return array structure).
NORM.S.INV, T.DIST.2T, FILTER, SQRT, INDEX: Excel function documentation.

Companion workbook: includes VBA macro for deterministic data generation, formulas-only fallback for non-VBA environments, complete worked example with Store 247 anomaly at 8,500 sq ft / $18,400 utilities, regression diagnostics section, FILTER-based flagged-records extraction, and the three Claude prompts (pre-flight validation, mid-workflow interpretation, post-analysis investigation memo).