Population Weighted Density

Population Weighted Density (PWD) is an alternative to conventional approaches to population density that is arguably better suited to some types of research in fields of social science and epidemiology. In this release WorldPop publishes what we believe may be the first set of global estimates for PWD, which we offer at national and subnational levels since 2000. An accompanying paper pending submission (Robin Edwards*, Alessandro Sorichetta**, Maksym Bondarenko** 2021) will provide a full methodological description.

^{* GIS consultant, MRes from UCL Centre for Advanced Spatial Analysis}
^{** WorldPop}

In the above map, each dot represents the location of the Population Weighted Centroid (PWC) of an administrative unit.

The traditional and most widely understood method for calculating an aggregate measure of human population density within any geographical region is simply to divide its total population by the total area (i.e. d = ΣP/ΣA). It has long been recognised in the field of geography and by many other scholars that this method has significant shortcomings for certain types of research, particularly in the human sciences and where the subject matter of interest may be related to the typical density levels experienced by the population, such as in epidemiology.

Population Weighted Density (PWD) – proposed by John Craig in 1984 is a family of methods that – as the name suggests – weight the density values by their corresponding population sizes in the aggregation process. We have utilised three distinct methods to generate PWD estimates:

\[d_{AM} = \sum(P_i \cdot d_i)/\sum(P_i) \tag{1} \]

PWD-G (Geometric Mean)

PWD-G denotes population weighted geometric mean density – based on the weighted geometric mean , for which the most practical calculation relies on log arithmetic, and hence requires all values to be non-zero.

\[d_{GM} = \exp\left( {\sum(P_i \cdot \log d_i)\over \sum{P_i}} \right) \tag{2} \]

PWD-M (Median)

To the above established methods we add a third population weighted median density based on weighted median , as suggested by Ottensmann (2018). The map above gives our subnational PWD-M estimates for 2020.

PWD_M is the median value of the population density taking into account the population weights of the observations. Mathematically, the weighted median problem is the following: given \( n \) unordered real numbers \( x \in R \quad \{x_1, x_2, ... x_n\} \) and associated positive real weighs \( \{w_1, w_2, ... w_n\} \)

\[ \min\limits_{x \in R} f(x) = \sum_{i=1}^{n} {w_i x - x_i} \]

Comparison

PWD-A seems the most commonly used method for PWD estimation – this is the method adopted by the US Census – probably due to it being the most intuitive to understand. However it shares the arithmetic mean’s inherent vulnerability of being highly sensitive to outliers, which are common in population density data. Hence any statistical error in the top-end of density values also has a disproportionate effect on the PWD-A estimate. PWD-G is a more robust metric that for lognormal data distributions will tend towards the median value, and is therefore less vulnerable (though not immune) to outliers. PWD-M by contrast is largely immune to outliers, though this does comes at the cost of being largely insensitive to the region’s top-end densities.

What’s in the release?

In this release we publish four sets of PWD estimates – two based on the WorldPop’s 3 arc-second grid (approximately 100m cell size at the equator) and two on its 30 arc-second grid (approximately 1km at the equator). Estimates are provided for top-level subnational regions as well as at national level. For each set we offer these estimates for each 5 year snapshot since 2000.

In addition to PWD estimates we are offering quantile breakdowns of the PWD distributions – percentiles for national level and deciles for subnational regions. This we hope will be a rich resource for any research that might benefit from more granular perspective on population density.

Source data resolution	National level	Subnational regions (ADM1)
3 arcsecond (“100m”)	PWD-A PWD-G PWD-M PWD percentiles	PWD-A PWD-G PWD-M PWD deciles
30 arcsecond (“1km”)	PWD-A PWD-G PWD-M PWD percentiles	PWD-A PWD-G PWD-M PWD deciles

We also include the population weighted centroids (longitude, latitude) for national and subnational levels based on arithmetic mean.

Usage

We encourage users to explore the data as well as the theory to come to a view on which PWD method and which source resolution are most suited to their research, though for general usage we would advise using PWD-G or PWD-M. For any queries or even just to let us know how the data is used please email us.