First-Order Markov Modeling for Transaction-Stream Analysis in Audit

Financial-statement auditors and DD analysts work with sequences: journal-entry posting streams, period-over-period account-reconciliation states, customer-vendor transaction chains, adjusting-entry timing through close. The standard analytical toolkit — descriptive statistics, ratio analysis, Benford’s Law tests, linear regression of expectations — treats observations as if they were independent and identically distributed. That assumption is convenient. It is also wrong about most of the data that crosses an auditor’s desk.

A sequence carries information that an unordered bag does not. The order in which transactions occur, the order in which a reconciliation deteriorates, the order in which intercompany balances accumulate — these orderings are themselves evidence. Markov chains are the simplest formal apparatus for extracting that evidence at scale, and they map cleanly onto the risk-assessment and substantive-procedure structure that PCAOB AS 2301 and the COSO monitoring component already require. This article walks the apparatus: where first-order Markov modeling earns its keep in audit and DD, what the math actually says, how to compute it on production audit data, and — equally important — where the approach breaks and a different tool is required.

Subsequent articles in this sub-series extend the framework: Hidden Markov Models for regime-switching detection (the Hidden Markov Models for Earnings-Management Regime Detection in Public-Company Financials article), Markov mixture models for round-tripping and lapping (the Markov Mixture Models for Round-Tripping and Lapping Detection article), random-walk tests for account-reconciliation drift (the Random-Walk and Stationarity Tests on Account Reconciliations article), Markov decision processes for risk-based audit sampling (the Markov Decision Processes for Risk-Based Audit Sampling Under Cost-of-Type-II Constraints article). This first article establishes the foundation those build on.

The Markov property

A discrete-time stochastic process $\{X_t\}_{t \geq 0}$ taking values in a state space $S$ satisfies the Markov property if for all $t \geq 0$ and all states $i_0, i_1, \ldots, i_t, j \in S$:

$$P(X_{t+1} = j \mid X_t = i_t, X_{t-1} = i_{t-1}, \ldots, X_0 = i_0) = P(X_{t+1} = j \mid X_t = i_t)$$

In plain English: the conditional distribution of the next state depends only on the current state, not on the path the process took to get there. The process is memoryless.

The full first-order Markov chain is summarized by its transition matrix $P$, where:

$$P_{ij} = P(X_{t+1} = j \mid X_t = i)$$

$P$ is row-stochastic: each row sums to 1, since the next state must be some state in $S$.

One property that matters for the audit applications below:

Stationary distribution. If $\pi$ is a probability distribution over $S$ satisfying $\pi P = \pi$, then $\pi$ is a stationary distribution of the chain. For an irreducible aperiodic chain on a finite state space, $\pi$ exists, is unique, and is the long-run frequency with which the chain visits each state regardless of the starting state (Norris, 1997, §1.7-1.8). In audit terms: $\pi$ is the long-run account-class distribution the entity’s posting practice converges on; deviations from $\pi$ in a specific period are themselves potential indicators.

The memorylessness assumption is strong. It says that once the current state is known, the entire history adds no predictive information. For many audit applications, this is a useful approximation. For others, it fails — and the failure is itself diagnostic. The article returns to this in the section on where the approach breaks.

Three canonical audit applications

Three settings where first-order Markov modeling earns its keep, each mapped to a specific control objective in PCAOB AS 2301 risk-assessment terms.

Journal-entry posting patterns (substantive analytical procedure under AS 2305). Let $S$ be a coarse account-class partition (e.g., revenue, COGS, operating expense, accrued liability, prepaid asset, cash, intercompany). For each posted journal entry, define the “state” as the account class touched. The sequence of posted entries over an accounting period forms a discrete-time chain. Under business-as-usual posting practice, the transition matrix $P$ has a characteristic shape — heavy diagonal for the same-class follow-on entries that close out a transaction, with specific off-diagonal structure reflecting the chart-of-accounts business logic. Period-end adjusting entries and material fraud schemes both perturb this matrix in identifiable ways. The technique operationalizes the AS 2401 (Consideration of Fraud in a Financial Statement Audit) journal-entry testing requirement at population scale rather than the typical sample.

Account-reconciliation drift (monitoring activities under COSO 2013, Principle 16). Let $S = \{\text{reconciled, partial, unreconciled, suspense}\}$. Track each subsidiary account’s reconciliation state across reporting periods. Healthy operations exhibit fast transitions back to “reconciled” after temporary excursions. Pre-restatement accounts often show drift toward “partial” or “unreconciled” states well before the restatement is announced, with reduced transition probabilities back to “reconciled.” This is a leading indicator under the Dechow et al. (2011) F-score framework, recast in transition-matrix form, and it should be treated as a risk-scoping analytic that informs control-testing scope — not as direct proof that the reconciliation control did or did not operate effectively.

Customer-vendor transaction chains (related-party testing under AS 2410). Round-tripping schemes generate characteristic cyclic structure in the transaction graph: revenue is recognized against a customer who, directly or through intermediaries, returns the cash to the seller through purchases. When the transaction sequence is encoded as a Markov chain on counterparty states, round-tripping appears as elevated probability of cycles — measurable via the cycle structure of $P$. This is the topic of the Markov Mixture Models for Round-Tripping and Lapping Detection article in this sub-series.

Building a transition matrix from data

The empirical transition matrix estimator is straightforward. Given a sequence $x_0, x_1, \ldots, x_T$, define:

$$N_{ij} = \sum_{t=0}^{T-1} \mathbf{1}[x_t = i, x_{t+1} = j]$$

Then the maximum-likelihood estimator of $P_{ij}$ is:

$$\hat{P}_{ij} = \frac{N_{ij}}{\sum_{k \in S} N_{ik}}$$

with the convention $\hat{P}_{ij} = 0$ if state $i$ is never observed.

A minimal Python implementation:


import numpy as np
import pandas as pd

def transition_matrix(sequence, states):
    """
    Empirical first-order transition matrix from a sequence of states.

    Parameters
    ----------
    sequence : iterable of state labels
    states   : list of all possible state labels (defines row/col ordering)

    Returns
    -------
    P_hat : np.ndarray of shape (n_states, n_states), row-stochastic
    N     : np.ndarray of transition counts (same shape)
    """
    idx = {s: i for i, s in enumerate(states)}
    n = len(states)
    N = np.zeros((n, n), dtype=int)
    for prev, curr in zip(sequence[:-1], sequence[1:]):
        N[idx[prev], idx[curr]] += 1
    row_sums = N.sum(axis=1, keepdims=True)
    # Avoid division by zero for never-observed states
    with np.errstate(invalid='ignore', divide='ignore'):
        P_hat = np.where(row_sums > 0, N / row_sums, 0.0)
    return P_hat, N

For audit data, the “sequence” is typically a chronologically ordered journal-entry stream filtered to the relevant period and entity. The state-encoding step — how account classes are bucketed — is where domain knowledge matters most. A too-coarse partition (three states: revenue/expense/balance-sheet) hides everything. A too-fine partition (every individual GL account) makes $\hat{P}$ statistically degenerate. Practical experience: a 7-12 state partition aligned to the entity’s natural chart-of-accounts groupings gives the best signal-to-noise.

Detecting anomalies via deviation from baseline

The auditor’s question is not “what is the transition matrix?” — it is “does this period’s transition matrix look like the baseline?” Two operational tests.

Frobenius norm of the difference. Given a baseline $P^{(0)}$ (from a prior period or a peer-group benchmark) and an observed $\hat{P}$:

$$d_F(\hat{P}, P^{(0)}) = \sqrt{\sum_{i,j} (\hat{P}_{ij} – P^{(0)}_{ij})^2}$$

This gives a single scalar deviation score. Useful for ranking entities or periods. Less useful for diagnosis — it averages away the location of the anomaly.

Chi-squared test on transition counts. Under the null hypothesis that the observed sequence comes from the baseline transition matrix:

$$\chi^2 = \sum_{i,j} \frac{(N_{ij} – E_{ij})^2}{E_{ij}}, \quad E_{ij} = N_{i \cdot} \cdot P^{(0)}_{ij}$$

where $N_{i \cdot} = \sum_j N_{ij}$ is the count of transitions originating from state $i$. When the baseline $P^{(0)}$ is specified independently of the observed data (not re-estimated from the same period under test), the statistic is approximately $\chi^2$-distributed with $|S|(|S|-1)$ degrees of freedom under standard regularity conditions — subtracting one constraint per row from the unconstrained $|S|^2$. When $P^{(0)}$ is estimated from a prior period and substantively close to the period under test, or when expected counts are thin in the cells that matter most, the asymptotic distribution is fragile. In those settings, a Monte Carlo null calibrated from the baseline transition matrix is the conservative choice and produces a cleaner audit artifact than bespoke cell pooling.

The chi-squared test localizes the anomaly to specific transitions $(i,j)$ via standardized residuals $(N_{ij} – E_{ij}) / \sqrt{E_{ij}}$. This is the second value-add over Frobenius: not just “is the period anomalous” but “which transitions drive the anomaly.” That’s the diagnostic that survives audit review.

Practical implementation. Under small-sample conditions (any $E_{ij} < 5$ is the standard warning threshold), exact tests or simulation-based p-values are preferable — the asymptotic $\chi^2$ approximation is unreliable in the cells that matter most. For multi-period testing, Holm-Bonferroni correction across the tested periods or entities controls family-wise error rate at the conventional 0.05 level. At the single-entity worked-example level, the more important discipline is using a null distribution that matches the baseline-generation story.

Where Markov chains fail in audit

The first-order memoryless assumption is the binding constraint. Several common audit scenarios violate it.

Long-memory schemes. Lapping (rolling unauthorized debits across customer accounts) builds up over weeks or months. First-order Markov modeling sees each lapping transaction in isolation, miscategorizes the pattern as ordinary receivables activity, and underdetects. Solutions: higher-order Markov chains (state-space grows exponentially), or HMM with a latent “lapping regime” state (the topic of the Hidden Markov Models for Earnings-Management Regime Detection in Public-Company Financials article).

Regime changes. A legitimate business change — entering a new market, completing an acquisition — produces a genuine shift in the transition matrix that is not fraud. Detecting “shift happened” is straightforward; classifying “shift is legitimate vs. fraudulent” requires either external information (auditor knowledge of the underlying business) or auxiliary models (e.g., Beneish M-score or Dechow F-score factors applied to the same period).

Small samples. A monthly period with only a few hundred posted entries against a 12-state partition produces $\hat{P}$ estimates with high cell-level variance. The chi-squared test is underpowered; deviation scores look noisy. Aggregating across periods restores power but trades away temporal resolution.

Non-stationarity. Seasonality, growth, and ordinary business-cycle variation all produce structured non-stationarity in transition matrices. Naive comparison to a year-ago baseline overstates anomalies. Comparing against a same-period-prior-year baseline, or against a seasonally-decomposed expected matrix, mitigates but doesn’t fully solve the problem.

A worked example on synthetic data

To make this concrete, consider a synthetic dataset: 1,000 journal entries across a 5-state account partition $S = \{\text{Cash, AR, Revenue, COGS, Inventory}\}$. The baseline (fraud-free) transition matrix reflects normal sales-cycle posting:


        Cash    AR    Revenue   COGS    Inventory
Cash  [ 0.10  0.10    0.05    0.05      0.70 ]    (mostly Cash → Inventory purchases)
AR    [ 0.60  0.05    0.30    0.03      0.02 ]    (mostly AR → Cash collections + Revenue postings)
Rev   [ 0.10  0.70    0.05    0.10      0.05 ]    (Revenue → AR is the dominant transition)
COGS  [ 0.05  0.02    0.03    0.15      0.75 ]    (COGS → Inventory reconciliation)
Inv   [ 0.20  0.05    0.10    0.55      0.10 ]    (Inventory → COGS on sales recognition)

We generate 950 entries from this baseline and inject 50 entries forming a round-tripping pattern: Revenue → AR → Cash → COGS → Revenue → AR → Cash → COGS → … (synthetic cycle that should be implausible under the baseline).


import numpy as np
import pandas as pd

np.random.seed(42)

states = ['Cash', 'AR', 'Revenue', 'COGS', 'Inventory']
P_baseline = np.array([
    [0.10, 0.10, 0.05, 0.05, 0.70],   # Cash: mostly → Inventory purchases
    [0.60, 0.05, 0.30, 0.03, 0.02],   # AR: → Cash collections + Revenue postings
    [0.10, 0.70, 0.05, 0.10, 0.05],   # Revenue: → AR is dominant
    [0.05, 0.02, 0.03, 0.15, 0.75],   # COGS: → Inventory reconciliation
    [0.20, 0.05, 0.10, 0.55, 0.10],   # Inventory: → COGS on sales recognition
])
assert np.allclose(P_baseline.sum(axis=1), 1.0)

def transition_matrix(sequence, states):
    idx = {s: i for i, s in enumerate(states)}
    n = len(states)
    N = np.zeros((n, n), dtype=int)
    for prev, curr in zip(sequence[:-1], sequence[1:]):
        N[idx[prev], idx[curr]] += 1
    row_sums = N.sum(axis=1, keepdims=True)
    with np.errstate(invalid='ignore', divide='ignore'):
        P_hat = np.where(row_sums > 0, N / row_sums, 0.0)
    return P_hat, N

def sample_chain(P, states, n, start=0, rng=None):
    rng = np.random.default_rng() if rng is None else rng
    seq = [start]
    for _ in range(n - 1):
        seq.append(rng.choice(len(states), p=P[seq[-1]]))
    return [states[s] for s in seq]

def chi2_transition_statistic(N, P_baseline):
    expected = N.sum(axis=1, keepdims=True) * P_baseline
    mask = expected > 0
    stat = float(((N[mask] - expected[mask]) ** 2 / expected[mask]).sum())
    residuals = np.zeros_like(expected, dtype=float)
    residuals[mask] = (N[mask] - expected[mask]) / np.sqrt(expected[mask])
    return stat, expected, residuals

def monte_carlo_transition_pvalue(P_baseline, states, n_states_in_sequence, observed_stat,
                                  n_simulations=2000, seed=123):
    rng = np.random.default_rng(seed)
    null_stats = np.zeros(n_simulations)
    for i in range(n_simulations):
        sim_seq = sample_chain(P_baseline, states, n_states_in_sequence, rng=rng)
        _, sim_N = transition_matrix(sim_seq, states)
        null_stats[i], _, _ = chi2_transition_statistic(sim_N, P_baseline)
    p_value = (1 + np.sum(null_stats >= observed_stat)) / (n_simulations + 1)
    return float(p_value), null_stats

baseline_seq = sample_chain(P_baseline, states, 950, rng=np.random.default_rng(42))
round_trip = (['Revenue', 'AR', 'Cash', 'COGS'] * 13)[:50]
observed_seq = baseline_seq + round_trip  # 1000 states → 999 transitions

P_obs, N_obs = transition_matrix(observed_seq, states)
d_F = np.linalg.norm(P_obs - P_baseline, ord='fro')
chi2_stat, expected, residuals = chi2_transition_statistic(N_obs, P_baseline)
mc_p_value, null_stats = monte_carlo_transition_pvalue(
    P_baseline, states, len(observed_seq), chi2_stat, n_simulations=2000, seed=123
)

print(f"Frobenius distance: {d_F:.4f}")
print(f"Transition chi-squared statistic: {chi2_stat:.2f}")
print(f"Monte Carlo p-value: {mc_p_value:.4f}")
print("Standardized residuals (|z| > 2 indicates anomalous cells):")
print(pd.DataFrame(residuals, index=states, columns=states).round(2))

Running this with the fixed seeds above produces deterministic output: a Frobenius-distance spike, a transition chi-squared statistic far into the Monte Carlo null tail, and standardized residuals concentrated in the (Cash, COGS) and (COGS, Revenue) cells — precisely the transitions that the injected round-tripping pattern over-represents.

In a real audit setting, the same workflow runs against the entity’s posting stream filtered to the period of interest, with the baseline either drawn from the prior year’s clean period or from a peer-group average. The output is two artifacts: a single deviation score for triage ranking across many entities, and a cell-level residual map that points the engagement team at the specific account-class transitions to investigate.

Practitioner close

This first-order apparatus is most useful when the engagement team needs a disciplined way to ask whether posting order itself contains signal. It is not a substitute for control testing, walkthroughs, or journal-entry support. Its role is narrower and more valuable: identify which transition cells, periods, or entities deserve deeper work, and document that triage in a way another reviewer can reproduce. If the baseline is unstable, if the sequence encoding is weak, or if the fraud hypothesis depends on memory longer than one step, the right answer is to escalate to the higher-order, HMM, or mixture models rather than force first-order Markov to do work it cannot do.

Authority:

Mathematical foundations:

Norris, J.R. (1997). Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics.
Ross, S.M. (2019). Introduction to Probability Models (12th ed.). Academic Press.
Holm, S. (1979). “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian Journal of Statistics, 6(2), 65-70.

Audit and fraud-detection literature:

Beneish, M.D. (1999). “The Detection of Earnings Manipulation.” Financial Analysts Journal, 55(5), 24-36.
Dechow, P.M., Ge, W., Larson, C.R., & Sloan, R.G. (2011). “Predicting Material Accounting Misstatements.” Contemporary Accounting Research, 28(1), 17-82.
Cecchini, M., Aytug, H., Koehler, G.J., & Pathak, P. (2010). “Detecting Management Fraud in Public Companies.” Management Science, 56(7), 1146-1160.

Standards and regulatory framework:

PCAOB AS 2301 — The Auditor’s Responses to the Risks of Material Misstatement.
PCAOB AS 2305 — Substantive Analytical Procedures.
PCAOB AS 2401 — Consideration of Fraud in a Financial Statement Audit.
PCAOB AS 2410 — Related Parties.
COSO (2013). Internal Control — Integrated Framework, Principle 16 (Monitoring Activities).

First-Order Markov Modeling for Transaction-Stream Analysis in Audit

The Markov property

Three canonical audit applications

Building a transition matrix from data

Detecting anomalies via deviation from baseline

Where Markov chains fail in audit

A worked example on synthetic data

Practitioner close

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Sheepdog Prosperity Partners LLC

Contact

Schedule