Machine Learning Models Evaluating Environmental Provisions Reporting Consistency

Authors

  • Rachel Reed Author

Keywords:

environmental reporting, machine learning, consistency evaluation, natural language processing, anomaly detection, corporate sustainability

Abstract

This research introduces a novel methodological framework that applies machine learning techniques to the previously unexplored problem of evaluating the internal consistency of environmental provisions reporting within corporate sustainability documents. Traditional assessments
of environmental, social, and governance (ESG) disclosures have relied heavily on manual content analysis, expert scoring, and checklist-based audits, which are often subjective, resourceintensive, and limited in their ability to detect subtle inconsistencies across different sections
of lengthy reports. This paper proposes a cross-disciplinary application of natural language
processing (NLP) and anomaly detection algorithms, originally developed for software code
analysis and network security, to the domain of corporate environmental communication. We
formulate the problem not as a simple classification of report quality, but as a multi-dimensional
consistency evaluation, examining alignment between quantitative targets, qualitative commitments, temporal references, and risk disclosures across a single document. Our methodology
employs a hybrid pipeline combining transformer-based embeddings for semantic similarity,
graph neural networks to model relational dependencies between report sections, and isolation
forest algorithms to flag anomalous discrepancies that may indicate greenwashing or unintentional misreporting. We train and validate our models on a unique corpus of 1,200 corporate
sustainability reports from the Global Reporting Initiative database from 1999-2004, manually
annotated for consistency by a panel of environmental accounting experts. Results demonstrate that our ensemble model achieves a 0.89 F1-score in identifying materially inconsistent
reports, significantly outperforming traditional keyword-matching baselines (0.62 F1-score) and
human expert agreement benchmarks (0.78 Fleiss’ kappa). Furthermore, the model uncovers previously unrecognized patterns of ’selective consistency,’ where companies exhibit high
internal alignment on easily achievable targets while showing significant dissonance on more
stringent or costly environmental commitments. This research contributes a fully automated,
scalable tool for regulators, investors, and auditors to assess reporting integrity, and establishes
a new paradigm for applying computational inconsistency detection to qualitative corporate
disclosures. The findings also provide novel empirical evidence on the structural patterns of environmental reporting in the early 2000s, offering a diagnostic baseline prior to the widespread
standardization of ESG frameworks.

Downloads

Published

2025-08-15

Issue

Section

Articles

How to Cite

Machine Learning Models Evaluating Environmental Provisions Reporting Consistency. (2025). Gjstudies, 1(1), 4. https://gjrstudies.org/index.php/gjstudies/article/view/348