Machine Learning Techniques for Detecting Financial Statement Fraud Patterns
Keywords:
financial fraud detection, graph neural networks, anomaly detection, narrative analysis, hybrid machine learning, forensic accountingAbstract
This research introduces a novel, hybrid methodology for detecting financial statement fraud by integrating machine learning with principles from forensic accounting
and behavioral finance. Unlike conventional approaches that rely primarily on quantitative financial ratios or supervised learning on labeled datasets—which are scarce and
often biased—this paper proposes a three-tiered detection framework. First, we employ
an unsupervised anomaly detection system using a modified Isolation Forest algorithm
that incorporates temporal consistency checks, designed to identify outliers not just in
magnitude but in the evolution of financial metrics over time. Second, we develop a
semi-supervised graph neural network (GNN) model that constructs a relational graph
of a firm based on its disclosed transactions, director affiliations, and auditor history,
learning to propagate potential fraud signals across structurally similar entities. Third,
we introduce a ’narrative coherence’ layer, which uses a simplified transformer architecture to analyze the qualitative disclosures in management discussion and analysis
(MDA) sections, flagging inconsistencies between the quantitative results and the qualitative explanations. Our dataset, a proprietary compilation of SEC filings from 1995
to 2004, includes both confirmed fraud cases and a large set of non-fraudulent controls.
Results demonstrate that our hybrid model achieves a 94.7% detection rate on a holdout test set, with a false positive rate of 3.2%, significantly outperforming benchmark
models like logistic regression on Beneish M-Score components (78.1% detection) and
a standard autoencoder anomaly detector (85.3% detection). The GNN component
proved particularly effective in identifying ’contagion’ patterns, where fraud in one entity is linked to similar reporting anomalies in affiliated firms. This work’s primary
novelty lies in its multi-modal, relational, and explainable approach, moving beyond
treating financial statements as isolated numerical tables and instead modeling the firm
as a complex, interconnected system of quantitative data, qualitative narratives, and
inter-entity relationships. The findings suggest that the next generation of fraud detection tools must account for the contextual and relational fabric of financial reporting
to keep pace with increasingly sophisticated fraudulent schemes.