WorldPop ‘Global2’: Global high-resolution population estimates for 2015-2030

Tomorrow we’re launching WorldPop’s new global gridded population datasets – which we’re calling ‘Global 2’. In this blog I aim to provide an overview of what these new data are, why they matter, how you can use them, and a glimpse of what’s coming next.

What was Global 1?

Back in 2018, the WorldPop team finished building the first ever set of multi-annual global 100x100m gridded estimates of residential population, broken down by age and sex classes – ‘Global1’. This was the culmination of more than a decade of research on scalable geospatial modelling methods that could leverage satellite imagery and other mapping datasets to spatially disaggregate administrative unit-level population counts. By doing so we could produce high resolution gridded population estimates across national, continental and global scales. The annual datasets covering the 2000-2020 period were made openly available to download and have been in wide use by governments, UN agencies, academics, Non-Governmental Organisations and the private sector ever since.

Why make gridded population data?

A major advantage of gridded population data over the more traditional administrative unit-based counts is their flexibility in being able to be summarised at any scale, decision-making unit or landscape feature (e.g. voting ward, flood area or settlement extent) and integrated with other spatially referenced data, as illustrated below. This means you can use these data to calculate things like numbers of women of childbearing age residing within 1 hour travel time of a health facility, providing a vital source of information on people’s vulnerabilities, service access, and use. The flexibility of these data and the fact that small area population data underlie so many fields of research, governance, resource allocation and beyond, meant that open Global 1 data saw wide uptake.

How have Global 1 been used?

Global 1 data formed the demographic basis of Covid-19 transmission models used by governments to plan intervention policies in the UK, US and many other countries. Moreover, the data represent the subnational population data used in health information systems by more than 80 ministries of health covering 3.2 billion people, and as the basis for UN estimates of populations affected by natural disasters and hazards, as well as conflicts. The data were also incorporated into ESRI’s Living Atlas, UN-OCHA’s HDX and UNFPA’s Population Data Portal, among others. Further reading on Global 1’s uses and impacts can be found on our global engagement page and in a recent independent impact report by Dev Afrique.

Why are we moving on from Global 1?

While Global 1’s 2000-2020 estimates remain in widespread use, many aspects of them have become outdated. Moreover, new datasets, increases in computing power and advances in AI since Global 1’s release offer opportunities for substantial improvements. Key reasons for moving on from Global 1 include:

Global 1 was built using the GPWv4 database of subnational census data, which primarily consists of projections built on population totals from the circa-2010 round of censuses, and there is now a huge amount of data published from 200+ countries from the circa-2020 round.
Age and sex classes for Global 1 were derived from surveys, censuses and estimates which are increasingly outdated.
Advances in satellite imagery has enabled rapid progress in the accurate mapping of buildings over continental scales. This has substantially improved our ability to more accurately map populations.
Increasing demands for gridded estimates covering the 2020-2030 period are being received.

This all prompted the Gates Foundation to support WorldPop and partners to develop new global estimates over the past two years.

What are the ‘Global 2’ data?

The construction approach employed for Global 2 is broadly similar to Global 1; we use a machine learning algorithm to do country-specific spatial disaggregations of administrative unit-level population counts. The schematic below highlights this process. Most aspects have, however, undergone substantial updates and improvements as described in the following sections. The Global 2 release statement and accompanying academic papers and tables provide more details.

Population data: Global 1 utilised simple projections of population counts from circa-2010 census data. For Global 2 we constructed a multi-year database of circa-2010 and 2020 boundary-matched subnational population counts, broken down by age and sex classes. These were obtained from censuses, supplemented with data from the United Nations, US Census Bureau, and official government subnational estimates and projections, where census data were lacking. A range of demographic projection methods were then implemented to interpolate and project age/sex-structured estimates between and beyond data timepoints across the 2010-2030 period. This aimed to better capture subnational demographic trends, improving upon the approach used in Global 1. This is an important feature given that we are increasingly seeing population declines in parts of the world after a period of predominant increases. These projection methods all ensured that national totals aligned with estimates published in the UN’s World Population Prospects 2024 edition.

Mastergrid: Producing gridded estimates of residential population across the planet requires a ‘mastergrid’ of cells to map to, which also defines land/water classifications and which cells fall into each country. A new mastergrid of ~100x100m cells was constructed for Global2 using updated and more detailed satellite-derived water datasets to precisely map coastlines and waterbodies.

Built settlement: Humans primarily reside within built structures, clustered together in settlements. Mapping these accurately is therefore vital for accurate population mapping. For Global 2 we took advantage of the substantial recent advances in satellite imagery, AI algorithms and computing power. This involved bringing together satellite-derived settlement and building footprint, height and classification datasets from Google, Microsoft, the Joint Research Centre of the European Commission, OpenStreetMap and the German Space Agency to obtain a global picture of built settlements and changes over time. A machine learning approach was then used to produce annual gridded measures of built settlement for 2015-2030, aligned to the new mastergrid.

Covariates: In addition to built settlements, human population density is known to be influenced by, and correlated with, a variety of environmental and physical phenomena. We assembled a unique set of harmonized 100x100m gridded covariate datasets for use in building Global 1, but they are now outdated. A completely new open library of multi-year geospatial datasets matched to the updated mastergrid was built, covering aspects such as land cover, roads, waterbodies, topography, nighttime lights and protected areas. A paper describing these data is currently available as a preprint on VeriXiv.

Modelling: Over recent decades different approaches have been developed to go from census-based administrative unit-level population counts to small area or gridded estimates. More recently, methods have been developed and implemented for the use of sparsely spread sample or partial census data. Depending on inputs, each method can result in outputs that look quite different. A ‘random forest’ machine learning model was chosen here given its performance against simpler methods, its ability to utilise multiple covariates, and its computational efficiency and scalability. It was used here to learn relationships between the population data for each country and year combination and their respective built settlement and other covariate datasets. For future years beyond the availability of covariates, solely the built settlement growth model metrics were used. These relationships were used to disaggregate population counts from administrative units into the ~100x100m grid cells of the mastergrid for each country. Model fit statistics were produced and grid cell level population predictions were made, ‘constrained’ only to those cells mapped as containing built settlement. These country-specific gridded population estimate datasets were then mosaiced to construct ~1x1km global datasets.

How do these new data compare to Global 1?

Comparisons are ongoing between Global 1 and Global 2 datasets, but these can be difficult to undertake meaningfully given the substantial differences in inputs and methods used. However, visual comparisons provide some insights into the differences and the impacts of the new datasets and methods adopted for Global 2.

Comparison 1: Mapped settlement between Global 1 and Global 2

Global1 (top) and Global2 (bottom). satellite images highlight how many buildings and areas of human settlement were missed for Global1 due to the limitations in imagery and algorithm capabilities at the time of construction. — For a rural area the maps show the built settlement data used as the basis for population mapping for Global 1 (top) and Global 2 (bottom). Built settlement grid cells for 2020 are mapped in black, overlaid on satellite imagery. The images highlight how many buildings and areas of human settlement were missed for Global 1 due to the limitations in imagery and algorithm capabilities at the time of construction. Whereas for Global 2, advances in building feature detection from high resolution satellite imagery using AI techniques has enabled accurate mapping of buildings across the area.

Comparison 2: Mapped population between Global 1 and Global 2

Images showing grid cells estimated to be populated in 2020 by Global1 (top) and Global2 (bottom), overlaid on satellite imagery. — For a rural area, the two images show grid cells estimated to be populated in 2020 by Global 1 (top) and Global 2 (bottom), overlaid on satellite imagery. The single green square in the south of the top image shows the only grid cell estimated to contain people by Global 1. However, areas of human settlement are clearly visible in the satellite image where Global 1 has failed to capture them. In contrast, the Global 2 dataset appears to accurately cover all areas with visible structures and provides population count estimates for each cell.

Comparison 3: Urban population mapping between Global 1 and Global 2

Images which compare Global1 with Global2 population estimates for Niamey, Niger in 2020. — The images above compare Global 1 with Global 2 population estimates for Niamey, Niger in 2020. A high resolution satellite image is shown in the top image for context. The middle image shows the 100x100m gridded Global 1 population estimates, and the bottom image shows the same estimates from Global 2 overlaid on the satellite image. The impacts of improvements in settlement/building mapping and covariates are clear, with internal city variations in populations mapped more precisely in Global 2 and populations in surrounding rural towns, villages and isolated buildings captured, which were previously missed in Global1.

Comparison 4: Urban populations across land covers/uses between Global 1 and Global 2

Image that compare Global1 (middle) with Global2 (bottom) population estimates for Southampton, UK in 2020. — The images above compare Global 1 (middle) with Global 2 (bottom) population estimates for Southampton, UK in 2020. A high resolution satellite image is shown in the top image for context, with selected land cover/use types identified. In the middle and bottom images, the population estimates are made transparent to enable viewing of the satellite image underneath. These images highlight the improvements in population mapping made possible through the introduction of building classification (e.g. residential, commercial, industrial) data into the Global 2 modelling. The Global 1 process utilizes simple settlement mapping extents and covariates and hence predicts roughly the same population densities across the city. The detailed building and classification data utilized for Global 2 ensured that populations were not assigned to parkland or fields, and that much lower numbers of people were mapped to predominantly commercial and industrial areas than to residential areas.

Comparison 5: Population time series mapping between Global 1 and Global 2

Images and graph showing population estimates for 2015-2020 for a border region of southern Romania. — The images and graph above show population estimates for 2015-2020 for a border region of southern Romania. The top two images show Global 1 gridded population estimates for the southern Romania region in 2015 and 2020 overlaid on satellite imagery, highlighting predicted huge growth in the population numbers residing in settlements in the east and west. The two images below these show Global 2 gridded population estimates for the same region in 2015 and 2020 overlaid on satellite imagery. These show little change in population numbers between the two timepoints. The graph plots out predicted population trends from Global 1 and Global 2 for 2015-2021 for the region, with the red circle indicating the population count from the 2021 census. It highlights how the predicted substantial population growth in Global 1 that was based on simple growth assumptions following the 2011 census resulted in significant overestimation of population numbers. In contrast, by utilising both 2011 and 2021 census data, Global 2 estimates align with observed population numbers and trends.

How can I access Global 2 data?

We are continuing to work on expanding the ways that WorldPop data can be openly accessed and used. A number of updates have been made to provide access to the Global 2 data, and we are working on expanding and improving these – this is a key focus of ongoing funding from the Wellcome Trust.

Geotiff format datasets for use in Geographical Information Systems (GIS) and beyond can be obtained through the WorldPop data catalog, as well as UN OCHA’s Humanitarian Data Exchange (HDX). Don’t worry if you still need to download the old Global 1 data – we’ve archived these for population counts here and age/sex structures here. We’re also working with ESRI and Google to make Global 2 data available in Living Atlas and Earth Engine, and you can already explore the new data through this Google Earth Engine App. The WorldPop Application Programming Interface (API) provides access to WorldPop data and has been updated for the Global 2 data.

There are also some new ways to access Global 2 data. We’ve developed a plugin for QGIS – see this video for more information. Finally, we’ve developed a Spatiotemporal Asset Catalog (STAC) API implementation for Global2 to simplify browsing, search and download of the data.

What do I need to be aware of when using Global2 data?

Estimating numbers of people, their demographic characteristics and changes over a 16-year period across the billions of 100x100m grid cells that cover the planet is no easy task! Constructing a multi-year time series of small area population estimates requires a range of input data decisions, methodological assumptions and trade-offs, producing output estimates that are likely to be less accurate than those built for just a single, recent time point or individual country. It is important to think about what you need small area population data for, and whether other datasets may be more appropriate – Global 2 data have a lot of limitations – this page is designed to help you.

We remain a relatively small team in the School of Geography and Environmental Science at the University of Southampton and cannot check and validate every area of the planet across multiple years, so rely on those of you who may be reading this to alert us to potential anomalies, errors and inconsistencies. You may have local knowledge or access to datasets that we were unable to include in our data production process. We are always grateful to hear about this, get your feedback and try our best to fix any issues with the estimates.

Some key assumptions, limitations and issues to be aware of:

The construction of a new and improved mastergrid means that there is a mis-alignment of grid squares between Global 1 and Global 2 that may impact comparisons or analyses that are being transferred to the new data.
These data are ‘top-down’ model disaggregations of larger area counts, and so if these input counts are problematic, then so are our gridded estimates. For some countries where it has been many years since the last census and a lot has changed, you may want to consider our ‘bottom-up’ estimates as an alternative source of census-independent gridded estimates.
The input census and estimate population data represent a global mosaic of data sources across geography and time. The substantial differences between countries in numbers of datasets, administrative unit levels, types of data and quality of data mean that output gridded estimates vary in accuracy between countries and years.
Each country is modelled separately. Past work has shown that this is vital for capturing local relationships with covariates that are missed with global or regional models, which can be infeasible computationally anyway. This means trade-offs though, with some country models trained only on coarse input population data or uncertain estimates. The results can be inconsistencies between countries, with some showing better model fit statistics and capturing realistic ranges of population densities better than others.
Across time, different demographic projection methods were used. For some countries where recent censuses have not been undertaken, or where census data were not available, published estimates from NSOs or UN agencies were used, and these bring substantial uncertainties with them. See the Global 2 release statement for details on these sources.
Where population projections were made, it is expected that the uncertainty will increase for population datasets representing years further away from the input population dataset timepoints. This is also expected to be true for the built settlement growth model, in which several timepoints of input settlement data are interpolated.
We have tried where possible to get two timepoints of subnational census/projection datasets to better capture subnational patterns and trends of demographic change, rather than being reliant on the single timepoint extrapolations from the GPW4 database that led to many problems with Global 1 datasets. For some countries this has led to compromises, where less spatially-detailed population data than for Global 1 are used as input to enable capturing temporal trends. This can result in Global 2 output population maps looking less spatially detailed than Global 1 or not capturing high urban population densities well, but instead providing more reliable population totals and demographic breakdowns across broader subnational areas.
For consistency, all datasets were produced using a fixed set of covariates that were available globally. Therefore, a limited selection of factors considered to be related to population distributions in each country have been considered. This represents a trade-off in the production of generalisable models, in which the accuracy of gridded population datasets for some countries could be improved by considering additional, locally specific factors.
The census-based input population data may not have captured changes caused by rapid onset events responsible for sudden fluctuations of population numbers (e.g. forced displacements due to natural disasters or conflict). Likewise, our projections do not account for future rapid onset events or seasonal and intra-annual population mobility between administrative units.
The Global 2 data are aligned to match the UN World Population Prospects 2024 edition estimates at the national scale, and these represent estimates themselves with their own uncertainties. If your preference is for alternative national totals (e.g. US Census Bureau, IHME, IIASA), then adjustments are straightforward to undertake – just get in contact with the WorldPop team to inquire.

What’s coming next?

Global 2 represents one component of WorldPop’s wider ongoing activities around improving spatial demographic methods and data, and their uptake and use. We have started with population counts, broken down by a set of age and sex classes at 100m and 1km resolution, but will make available additional datasets over the coming months. These will include administrative unit summaries, population density, additional age classes, degree of urbanisation breakdowns and integer versions, among others.

While we are here introducing the new Global 2 datasets, we’re aiming to transition to a more ‘live’ set of global population data products. As new censuses and surveys are conducted, new building maps released and new methods developed, we’re working towards these being more dynamically incorporated into outputs. So, look out for country updates – and Global 2.1, 2.2, etc.

Improving our abilities to more accurately estimate and map populations in the most remote rural and highest density urban settings is a primary research focus. Ongoing work on mapping slum areas, refugee and nomadic populations, and demographic dynamics using AI, geo-embeddings and other new digital datasets will feed into future outputs. Research is also ongoing on mapping future population scenarios following recent support from the Wellcome Trust to establish FuturePop.

Finally, ensuring that methods and datasets can be accessed, understood, adopted and used by decision makers remains a priority. Co-development of country and application-specific small area population estimates is ongoing with national statistical offices, ministries of health and UN agencies around the World. A key component of this is capacity strengthening, and our new training manual and book of methods are now online, with more materials to be added soon. We’re also developing an online WorldPop community of practice, refreshing our demographic portal and are adapting AI large language models to construct a WorldPop AI assistant, enabling simple natural language queries of the data in multiple languages.

Acknowledgements

Global 2 has involved many people and organisations over the past few years. We would like to recognise all the hard work put in my so many across the WorldPop team in the School of Geography and Environmental Science at the University of Southampton, as well as our colleagues in Social Statistics and Demography. The WorldPop portfolio management and operations team have undertaken fantastic work in setting up and running the project work, while Southampton’s iSolutions team who run the Iridis high performance computing cluster have been vital in constructing the new data. Global 2 also represents a wider collaboration, including researchers at the Università degli Studi di Milano Statale in Italy, Jade University in Germany and Columbia University in the US. We are also grateful to the support of staff from the UN Population Division and Google Research for sharing data and feedback, and to those from UN-OCHA’s HDX for hosting and promoting the new data. We are also thankful to those who gave extensive feedback during our Beta testing phase, including those from UNICEF and the UN Convention to Combat Desertification. Finally, we thank the Gates Foundation for their support throughout the production of Global 2.