Measuring poverty is notoriously difficult. The collection of detailed data on households is time-consuming and expensive. But the marriage of machine learning techniques to lighter collection instruments may transform how the World Bank and its development partners approach poverty measurement. Predicting a household’s poverty status with a handful of easy-to-collect qualitative variables lowers costs, decreases turnaround times, and, ultimately, creates a more solid empirical foundation for policy.
Poverty prediction typically relies on regression models. In this talk, Olivier Dupriez will report on a comparative assessment of machine learning classification algorithms applied to poverty prediction. He will discuss preliminary outcomes of three approaches to build predictive models: crowd-sourcing via a data-science competition that already has 1,500+ data scientists working to develop the best poverty prediction model for three countries; contracting experts; and exploiting some of the newest approaches such as automated machine learning for model development.
In addition, Olivier will also discuss his work to apply machine learning to the Bank’s own knowledge base by automatically extracting topics from 145,000 documents published in the Bank’s Documents and Reports repository. This project aims to improve data and knowledge discovery systems. The application of natural language processing tools is also useful in showing how the coverage of various topics such as agriculture, energy, health, climate change, and others have evolved over time, across regions, and differ by type of document.