Exploring the limits of predicting births using survey and administrative data

Talk
Fertility
Prediction

Predicting life outcomes is important because it can inform theory and enable targeted interventions. Unlike fields like climate science or medicine, social science struggles to predict lifecourse outcomes. Perhaps such outcomes are inherently unpredictable due to chance, or because conditions necessary for accurate predictions — large, high-quality datasets — are unmet. We study the predictability of one key life event — having a child within three years — under ideal conditions by organising the PreFer [Predicting Fertility] data challenge that draws on high-quality surveys and rich administrative records from the Netherlands. Over a hundred researchers applied advanced techniques to the prediction task, offering a rare test of the predictability of fertility. We show that predictions based on survey data outperform those based on administrative records, although the predictive ability was still far off the theoretical maximum. We show how we can capitalise on the information from different models and datasets. We end by discussing the role of prediction in the social sciences.

Author

Gert Stulp

Published

July 9, 2025

Summary


     Max Planck Institute for Demographic Research

     Rostock, Germany

     Click here for website

Description

Predicting life outcomes is important because it can inform theory and enable targeted interventions. Unlike fields like climate science or medicine, social science struggles to predict lifecourse outcomes. Perhaps such outcomes are inherently unpredictable due to chance, or because conditions necessary for accurate predictions — large, high-quality datasets — are unmet. We study the predictability of one key life event — having a child within three years — under ideal conditions by organising the PreFer [Predicting Fertility] data challenge that draws on high-quality surveys and rich administrative records from the Netherlands. Over a hundred researchers applied advanced techniques to the prediction task, offering a rare test of the predictability of fertility. We show that predictions based on survey data outperform those based on administrative records, although the predictive ability was still far off the theoretical maximum. We show how we can capitalise on the information from different models and datasets. We end by discussing the role of prediction in the social sciences.