Population-weighted european state centers
My wife was working on a research project where she needed to find the population-weighted centers (or centroids) of a number of states in Europe. Here are several questions that ask the same thing (I list them since I had to use many different questions to find what I needed from reference librarians):
- Where would each European country balance if the only thing that had any weight was where people lived?
- Where does the average person live in each country? Here "average" of course does not refer to economics, of course, but refers to "over the whole population".
- Where is the location of the center of population? Here we don't mean the largest city, of course.
I was employed to help her find this information, but a week of searching turned up nothing that we could find in any published source. So I decided to see if I could calculate these from other information. After sending email to Robert Leddy at the US Census Bureau, I was directed to this very useful website: SEDAC. This has population and position data for most world countries at a county-level, which is suitable for what we want to do. It's not perfect, but very close to it. The idea, then, is to calculate the population-weighted center of a country based on a weighted average of the location of each county in that country, where the weights come from the population of each county. I'm not sure that the word "county" is always applicable here; perhaps "administrative unit" is better.
Finding the dataTo get the data I used, I did the following. I am putting the full list of steps to get to the data in case the website changes; if you want just go to the end.
- Go to the main SEDAC web page
- Follow the link to "Data resources"
- Follow the link to "Gridded population of the world"
- Follow the link to "Downloadable data"
Note that there is a lot of data at SEDAC which can be processed in the way I am using here, I just didn't need to do anything but Europe.
Preparing the data
After I downloaded this ZIP file, I needed to quickly get it into a computable format. Unfortunately, there are more than 98,000 records, so Excel and other spreadsheets didn't do well on it. The best way I found to process it is by dumping it into an SQL database, and then performing queries. Thanks to Greg Speegle for this suggestion, it was much easier than what I was doing before with Perl.
I did the following from within MySQL:
drop table if exists centroids; create table centroids ( ADMINID float, -- unique (numeric) id NAME1 varchar(100), -- first administrative level name* NAME2 varchar(100), -- second administrative level name* NAME3 varchar(100), -- third administrative level name* NAME4 varchar(100), -- fourth administrative level name* NAME5 varchar(100), -- fifth administrative level name* CODE varchar(10), -- unit type code (L=Land, IW=Inland Water) ISO3V10 varchar(10), -- 3-letter country/state code COUNTRYNM varchar(100), -- English country/state name LONG_CEN float, -- longitude of the administrative unit centroid in decimal degrees (this is what we need) LAT_CEN float, -- latitude of the administrative unit centroid in decimal degrees (this is what we need) LONG_LBL float, -- longitude of the administrative unit labelpoint in decimal degrees** (this is NOT what we want; see the readme that comes with the data) LAT_LBL float, -- latitude of the administrative unit labelpoint in decimal degrees** (this is NOT what we want; see the readme that comes with the data) AREASQKM float, -- administrative unit area in square km P90A float, -- UN-adjusted population counts, 1990 (these population counts are what we want) P95A float, -- UN-adjusted population counts, 1995 P00A float, -- UN-adjusted population counts, 2000 P05A float, -- UN-adjusted population estimates, 2005 P10A float, -- UN-adjusted population estimates, 2010 P15A float, -- UN-adjusted population estimates, 2015 P90ADENS float, -- UN-adjusted population density, 1990, persons per square km P95ADENS float, -- UN-adjusted population density, 1995, persons per square km P00ADENS float, -- UN-adjusted population density, 2000, persons per square km P05ADENS float, -- UN-adjusted population density, 2005, persons per square km P10ADENS float, -- UN-adjusted population density, 2010, persons per square km P15ADENS float -- UN-adjusted population density, 2015, persons per square km ); load data local infile 'centroids/eu_centroids.csv' into table centroids fields terminated by ',' optionally enclosed by '\'' lines terminated by '\n'; delete from centroids where name1 = 'NAME1'; -- get rid of the first line
The key step here is the "load data" command. Of course, for this command to work your data file must be located in the subdirectory "centroids", in a file named "eu_centroids.csv". This is how it comes from SEDAC.
Calculating the population-weighted centers
After getting the data into the database, the rest is easy. Some simple SQL queries tell us what we need to know.
To find the population-weighted center of each country for the year 2000, we just issue the following SQL command:
select countrynm, (sum(lat_cen * p00a) / sum(p00a)) as latitude, (sum(long_cen * p00a) / sum(p00a)) as longitude, sum(p00a) as population from centroids group by countrynm;
The reason I wrote all this is because the time it took to find the data and then calculate it was not trivial. However, this seems like something other people might want to do, so I am providing it for others to use. Perhaps you don't care and you just want the data. Here it is: Europe population weighted state centers. This has the same type of data calculated four times — once for each of the population estimates for 1990, 1995, 2000, and 2005 (predicted). Note that this data includes Cyprus, which is not in the Europe dataset from SEDAC; I took it from the same type of dataset for Asia from SEDAC.
After finding the locations, we wanted to know more information like "how far is each population center away from Brussels?". To accomplish this, I fed the information I obtained above into this web page. To automate this, I used curl to do something like the following:
curl -d "lat1=0.0n&lon1=0.0w&lat2=1.0n&lon2=1.0w&units=km" http://jan.ucc.nau.edu/cvm-cgi-bin/latlongdist.plNote that I am using the POST method here, as that is the only thing the CGI script understands, but it is easy with curl. Here is a program to calculate the distance from Brussels to each state's population-weighted center, and here are the results: distances from Brussels to Europe's population-weighted state centers.