Developing a Deep Learning–Based Model for Predicting and Detecting Fraud in Financial Statements
Pages 1-10
https://doi.org/10.22034/kes.2026.2083828.1098
Narges Mehrabi Hashtchin, Gholamreza Soleymani Amiri
Abstract This study develops a data-driven framework for financial statement fraud detection by benchmarking machine learning, deep learning, and hybrid classifiers under a unified, leakage-resistant evaluation protocol. Fraud cases are identified from the U.S. Securities and Exchange Commission’s Accounting and Auditing Enforcement Releases (AAERs) and matched with Compustat data over 1991–2014, producing 122,526 firm-year observations, including 902 confirmed fraud cases. Four structured-input configurations are evaluated: 28 raw financial statement items, 14 financial ratios, their combined set (28+14), and a parsimonious seven-feature subset (six ratios plus Altman’s Z-score). Features are selected using minimum redundancy–maximum relevance (mRMR), class imbalance is addressed via cost-sensitive learning, and performance is assessed with a firm-level 80/20 split and stratified group-based five-fold cross-validation within training. Empirical results indicate that deep and hybrid models consistently outperform classical tabular baselines, reflecting non-linear and interaction-driven fraud signals. The Transformer achieves the most stable and highest overall performance, reaching 0.98898 accuracy and a 0.51087 F1-score under the seven-feature configuration. The combined raw-item and ratio inputs outperform ratios alone, implying incremental predictive value in raw accounting items, while the best overall outcomes are obtained with the parsimonious seven-feature subset. Collectively, the findings support the study’s hypotheses and demonstrate the effectiveness of attention-based modeling for financial statement fraud detection.

























