Leakage and the reproducibility crisis in machine-learning-b...

🤖 Plain-English Summary

Machine-learning (ML) methods have gained prominence in the quantitative sciences. Finally, we conduct a reproducibility study of civil war prediction, where complex ML models are believed to vastly outperform traditional statistical models such as logistic regression (LR).

🔑 Key Findings

However, there are many known methodological pitfalls, including data leakage, in ML-based science.
We systematically investigate reproducibility issues in ML-based science.
Through a survey of literature in fields that have adopted ML methods, we find 17 fields where leakage has been found, collectively affecting 294 papers and, in some cases, leading to wildly overoptimistic conclusions.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📜 Copyright Notice: This page shows only metadata (title, authors, journal, date) and an original AI-generated summary. No abstract or full article text is copied. The original research is the intellectual property of its authors and publisher. ScienceTrace does not reproduce copyrighted content.

← More Artificial Intelligence All Research Articles

📋 Article Details

Category	🤖 Artificial Intelligence
Published	Aug 04, 2023
Journal	Patterns
Authors	Sayash Kapoor, Arvind Narayanan
DOI	10.1016/j.patter.2023.100804
Citations	649
Source	OpenAlex

🗂️ Research Categories

🤖 Artificial Intelligence 🧬 Medicine & Biology ⚛️ Physics & Space Science ⚙️ Engineering & Technology ∑ Mathematics

Leakage and the reproducibility crisis in machine-learning-based science

🤖 Plain-English Summary

🔑 Key Findings

💡 Why This Matters

📋 Article Details

🗂️ Research Categories

🔗 Related Resources

More 🤖 Artificial Intelligence Research