Developing a Deep Learning–Based Model for Predicting and Detecting Fraud in Financial Statements

Mehrabi Hashtchin, Narges; Soleymani Amiri, Gholamreza

doi:10.22034/kes.2026.2083828.1098

Developing a Deep Learning–Based Model for Predicting and Detecting Fraud in Financial Statements

Document Type : Original Article

Authors

Narges Mehrabi Hashtchin ¹

Gholamreza Soleymani Amiri ²

¹ Ph.D. in Accounting, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran.

² Professor, Faculty of Social Sciences and Economics, Alzahra University, Tehran, Iran.

https://doi.org/10.22034/kes.2026.2083828.1098

Abstract

This study develops a data-driven framework for financial statement fraud detection by benchmarking machine learning, deep learning, and hybrid classifiers under a unified, leakage-resistant evaluation protocol. Fraud cases are identified from the U.S. Securities and Exchange Commission’s Accounting and Auditing Enforcement Releases (AAERs) and matched with Compustat data over 1991–2014, producing 122,526 firm-year observations, including 902 confirmed fraud cases. Four structured-input configurations are evaluated: 28 raw financial statement items, 14 financial ratios, their combined set (28+14), and a parsimonious seven-feature subset (six ratios plus Altman’s Z-score). Features are selected using minimum redundancy–maximum relevance (mRMR), class imbalance is addressed via cost-sensitive learning, and performance is assessed with a firm-level 80/20 split and stratified group-based five-fold cross-validation within training. Empirical results indicate that deep and hybrid models consistently outperform classical tabular baselines, reflecting non-linear and interaction-driven fraud signals. The Transformer achieves the most stable and highest overall performance, reaching 0.98898 accuracy and a 0.51087 F1-score under the seven-feature configuration. The combined raw-item and ratio inputs outperform ratios alone, implying incremental predictive value in raw accounting items, while the best overall outcomes are obtained with the parsimonious seven-feature subset. Collectively, the findings support the study’s hypotheses and demonstrate the effectiveness of attention-based modeling for financial statement fraud detection.

Keywords

Fraudulent Financial Statements

Machine Learning

Deep Learning

Imbalanced Datasets

Subjects

Artificial Intelligence and Ethics in Financial Affairs

Volume 3, Issue 1
April 2026
Pages 1-10

XML

Article View

Advanced Search

Knowledge Economy Studies

Developing a Deep Learning–Based Model for Predicting and Detecting Fraud in Financial Statements

Volume 3, Issue 1
April 2026
Pages 1-10

Home

Submit Manuscript

Contact Us

Knowledge Economy Studies

Developing a Deep Learning–Based Model for Predicting and Detecting Fraud in Financial Statements

Volume 3, Issue 1April 2026Pages 1-10

Files

Share

How to cite

Statistics

Home

Browse

Journal Info

Guide for Authors

Submit Manuscript

Reviewers

Contact Us

Volume 3, Issue 1
April 2026
Pages 1-10