An international research team led by Dr Paolo Andrich, a visiting researcher at WorldPop based at the University of Oxford, has developed a novel statistical framework to turn anonymised, biased social media data into reliable population estimates. This approach provides a timely and contemporary alternative to traditional censuses, which are often static and struggle to capture rapid demographic changes during humanitarian emergencies.
Accurate information on the location of people is the foundation of effective rescue and resource allocation. Traditional demographic sources like housing censuses provide only periodic snapshots and fail to capture short-term spatial mobility during crises. By focusing on the Philippines – a country with high social media adoption but frequent exposure to natural disasters – the researchers demonstrated how to calibrate Facebook user counts with 2020 census figures to provide dynamic population monitoring.
A primary challenge in using digital traces is that platforms often mask data in low-population areas to protect identity, a process known as differential privacy. To solve this, the team developed a Bayesian imputation approach, which is a statistical method that estimates missing or censored values by incorporating historical data distributions and prior information to account for underlying uncertainty. This technique successfully restored data coverage for 5.5% of rural areas that were previously “invisible” in the digital dataset.
The research integrates several high-resolution predictors, including nighttime satellite radiance, degree of urbanisation, and demographic composition. These factors act as proxies for socio-economic status, allowing the model to link social media signals to true population levels.
“Having up-to-date information on where people are is critical for humanitarian response in disaster-prone regions,” says Dr Shengjie Lai, Principal Research Fellow at WorldPop. Dr Lai comments: “Our model provides a general framework for using biased social media signals to generate the accurate and timely population data necessary for effective aid delivery in low- and middle-income countries”.
The methodology prioritizes data ethics by using strictly anonymised and aggregated information. The model accounts for spatial correlations and achieved high levels of accuracy, with errors as low as approximately 18% for urban areas and 24% for rural municipalities.
The results were particularly robust when analysed during the 2020 lockdowns and travel restrictions, which ensured that users were likely at their primary residences where they could be verified against census data. As the 2030 deadline for global development goals approaches, this ability to monitor population dynamics at a fine spatial resolution offers a vital tool for ensuring no community is left behind.
The study relied on several critical open-access sources to ensure findings were timely and accurate:
- OCHA Centre for Humanitarian Data: Provided essential subnational administrative boundaries and verified census population figures.
- Eurostat (Statistical Office of the European Union): Established the degree of urbanisation methodology used to classify settlement types across the Philippines and Ethiopia.
- Earth Observation Group at the Payne Institute for Public Policy: Supplied monthly cloud-free nighttime satellite radiance measurements, which served as a proxy for economic activity.
- The Open Data team at Ookla: Provided network performance metrics that allowed the team to model technology uptake and internet access density.
This work was supported by UNFPA, the EPSRC, Horizon Europe, and the National Institute for Health (MIDAS Mobility).
Headline image: Checking cell phone in Divisoria Market, Adam Cohn, 2016 CC BY-NC-ND 2.0
We’re trialling the ‘Deep Dive’ audio summary feature of Google’s NotebookLM. This feature uses AI to create a podcast-like audio conversation between two AI-derived hosts that summarise key points of documents - in this case the full preprint article linked below.
As Google acknowledge that NotebookLM outputs may contain errors, we have been careful to check, edit and validate this audio.
Please contact us to let us know what you think.
Music: My Guitar, Lowtone Music, Free Music Archive (CC BY-NC-ND)

