Population-weighted european state centers

If you don't want to read a lot, and just want the results of this page, jump to the results section (and be sure to read the disclaimer).

My wife was working on a research project where she needed to find the population-weighted centers (or centroids) of a number of states in Europe. Here are several questions that ask the same thing (I list them since I had to use many different questions to find what I needed from reference librarians):

We wanted the answer to these questions in terms of latitude and longitude. Of course, the geographic center of each country is published, but this might be radically different from the population-weighted center, if the population is mostly to one side of the geographic center. To use an example from the USA, most of New York state's population lives in New York City, so the population-weighted center of New York state would be closer to New York City than the geographic center.

I was employed to help her find this information, but a week of searching turned up nothing that we could find in any published source. So I decided to see if I could calculate these from other information. After sending email to Robert Leddy at the US Census Bureau, I was directed to this very useful website: SEDAC. This has population and position data for most world countries at a county-level, which is suitable for what we want to do. It's not perfect, but very close to it. The idea, then, is to calculate the population-weighted center of a country based on a weighted average of the location of each county in that country, where the weights come from the population of each county. I'm not sure that the word "county" is always applicable here; perhaps "administrative unit" is better.

Finding the data

To get the data I used, I did the following. I am putting the full list of steps to get to the data in case the website changes; if you want just go to the end. At this point, you can choose among several datasets. What I did was choose the region "Europe" (from the menu at the top right), then choose "Centroids" under the menu which says "Get GPWv3 data" (and not the GRUMP data). Then I selected ".csv" format, and "circa 2000" for the year. Then select "GET DATA". The next page will request that you register yourself, but you can also use the "Guest downloads" link if you prefer not to register.

Note that there is a lot of data at SEDAC which can be processed in the way I am using here, I just didn't need to do anything but Europe.

Preparing the data

After I downloaded this ZIP file, I needed to quickly get it into a computable format. Unfortunately, there are more than 98,000 records, so Excel and other spreadsheets didn't do well on it. The best way I found to process it is by dumping it into an SQL database, and then performing queries. Thanks to Greg Speegle for this suggestion, it was much easier than what I was doing before with Perl.

I did the following from within MySQL:

drop table if exists centroids;

create table centroids (     
    ADMINID     float,          -- unique (numeric) id
    NAME1       varchar(100),   -- first  administrative level name*
    NAME2       varchar(100),   -- second administrative level name*
    NAME3       varchar(100),   -- third  administrative level name*
    NAME4       varchar(100),   -- fourth administrative level name*
    NAME5       varchar(100),   -- fifth  administrative level name*
    CODE        varchar(10),    -- unit type code (L=Land, IW=Inland Water)
    ISO3V10     varchar(10),    -- 3-letter country/state code
    COUNTRYNM   varchar(100),   -- English country/state name
    LONG_CEN    float,          -- longitude of the administrative unit centroid in decimal degrees (this is what we need)
    LAT_CEN     float,          -- latitude  of the administrative unit centroid in decimal degrees (this is what we need)
    LONG_LBL    float,          -- longitude of the administrative unit labelpoint in decimal degrees** (this is NOT what we want; see the readme that comes with the data)
    LAT_LBL     float,          -- latitude  of the administrative unit labelpoint in decimal degrees** (this is NOT what we want; see the readme that comes with the data)
    AREASQKM    float,          -- administrative unit area in square km
    P90A        float,          -- UN-adjusted population counts, 1990 (these population counts are what we want)
    P95A        float,          -- UN-adjusted population counts, 1995
    P00A        float,          -- UN-adjusted population counts, 2000
    P05A        float,          -- UN-adjusted population estimates, 2005
    P10A        float,          -- UN-adjusted population estimates, 2010
    P15A        float,          -- UN-adjusted population estimates, 2015
    P90ADENS    float,          -- UN-adjusted population density, 1990, persons per square km
    P95ADENS    float,          -- UN-adjusted population density, 1995, persons per square km
    P00ADENS    float,          -- UN-adjusted population density, 2000, persons per square km
    P05ADENS    float,          -- UN-adjusted population density, 2005, persons per square km
    P10ADENS    float,          -- UN-adjusted population density, 2010, persons per square km
    P15ADENS    float           -- UN-adjusted population density, 2015, persons per square km
);

load data local infile 'centroids/eu_centroids.csv' into table centroids 
        fields terminated by ',' optionally enclosed by '\'' lines terminated by '\n';
delete from centroids where name1 = 'NAME1'; -- get rid of the first line

The key step here is the "load data" command. Of course, for this command to work your data file must be located in the subdirectory "centroids", in a file named "eu_centroids.csv". This is how it comes from SEDAC.

Calculating the population-weighted centers

After getting the data into the database, the rest is easy. Some simple SQL queries tell us what we need to know.

To find the population-weighted center of each country for the year 2000, we just issue the following SQL command:

select countrynm, 
       (sum(lat_cen * p00a) / sum(p00a)) as latitude,
       (sum(long_cen * p00a) / sum(p00a)) as longitude, 
       sum(p00a) as population 
  from centroids 
  group by countrynm;

Results

The reason I wrote all this is because the time it took to find the data and then calculate it was not trivial. However, this seems like something other people might want to do, so I am providing it for others to use. Perhaps you don't care and you just want the data. Here it is: Europe population weighted state centers. This has the same type of data calculated four times — once for each of the population estimates for 1990, 1995, 2000, and 2005 (predicted). Note that this data includes Cyprus, which is not in the Europe dataset from SEDAC; I took it from the same type of dataset for Asia from SEDAC.

Further analysis

After finding the locations, we wanted to know more information like "how far is each population center away from Brussels?". To accomplish this, I fed the information I obtained above into this web page. To automate this, I used curl to do something like the following:

curl -d "lat1=0.0n&lon1=0.0w&lat2=1.0n&lon2=1.0w&units=km" http://jan.ucc.nau.edu/cvm-cgi-bin/latlongdist.pl
Note that I am using the POST method here, as that is the only thing the CGI script understands, but it is easy with curl. Here is a program to calculate the distance from Brussels to each state's population-weighted center, and here are the results: distances from Brussels to Europe's population-weighted state centers.

Disclaimer

The code and the data provided from this web page are my effort, and they are not construed to be accurate or definitive or authoritative. You may use them if you acknowledge me (as well as the sources of the data I used). If you find the information on this page useful, please let me know.

Copyright © 2006 Greg Hamerly
Computer Science Department
Baylor University

valid html and css