Leakage and the reproducibility crisis in machine-learning-b...

AI-Generated Summary

Machine-learning (ML) methods have gained prominence in the quantitative sciences. Finally, we conduct a reproducibility study of civil war prediction, where complex ML models are believed to vastly outperform traditional statistical models such as logistic regression (LR).

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings

1 However, there are many known methodological pitfalls, including data leakage, in ML-based science.
2 We systematically investigate reproducibility issues in ML-based science.
3 Through a survey of literature in fields that have adopted ML methods, we find 17 fields where leakage has been found, collectively affecting 294 papers and, in some cases, leading to wildly overoptimistic conclusions.

Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex

More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub

Article Details

Source	OpenAlex
Category	🤖 Artificial Intelligence
Published	Aug 4, 2023
Journal	Patterns
DOI	10.1016/j.patter.2023.100804
Citations	649
Authors	Sayash Kapoor, Arvind Narayanan

Leakage and the reproducibility crisis in machine-learning-based science