Nice work...but this is not a perfect database of countries.
* Territories such as Réunion are listed as countries
* Countries that ceased to exist are not listed (Yugoslavia, Soviet Union) although the world's most recent new country (South Sudan) is listed.
* Kosovo is listed, although not recognised by all the world. Transnistria is not listed, even though it has de facto sovereignty.
* Total land area is a disputable amount in several circumstances.
There are several more such examples I could list.
Yes I'm being pedantic. But I have a special interest in this area, and I'm aware of just how hard it is. People are born in one country (Soviet Union) and die in another (Ukraine), even though they never moved from their home city (Odessa).
This is a problem even in professional data sources. I recently did an admittedly somewhat silly analysis, to see what happens if you take the "voting with your feet" metaphor for emigration seriously, and interpret each country's emigrants as a ranked-ballot "vote" for most appealing destination country, with all the ballots tallied using a Condorcet ranking. Results: http://www.kmjn.org/notes/voting_with_feet.html
I thought this might produce complexities I'd have to deal with, since over the 50 year period this data set covers, countries have come and gone a number of times. But to my surprise, the data set somehow uses only current countries, even when giving data points for 1960 or 1970. It doesn't really explain how that happened, either. Did they somehow impute past data to current countries? What does the 1970 figure for Germany, or Serbia, actually mean? Something was too complex and didn't fit the database, so it got swept under the rug? Made my job writing this off-the-cuff blog post much easier, but now I'm left wondering what the data I used actually means, despite it coming from a highly respected source, and used in many academic papers.
edit: Found a paper describing the construction of the World Bank data. It looks like they use what they describe as an admittedly crude method of attributing based on population shares, so e.g. immigrants to "Yugoslavia" are apportioned to Croatia/Serbia/etc. based on population share, assuming the immigration was uniform throughout. Not horrible as a first approximation, probably, though clearly not "right" either. Source: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=969242
> Not horrible as a first approximation, probably, though clearly not "right" either.
You really do have to leave any hope of perfectionism at the door when it comes to migration data.
Even if record-keeping were consistent, a considerable amount of migration goes unrecorded — mass migration due to environmental disasters or conflict, illegal immigration, etc.
There are very few countries that can claim to know the exact number of people within their borders at any given time, let alone where they came from.
People are born in one country (Soviet Union) and die in another (Ukraine), even though they never moved from their home city (Odessa).
People are even born in one city (Leningrad) in one country (Soviet Union) and die in another city (St-Petersburg) in another country (Russia), although they never moved.
> Transnistria is not listed, even though it has de facto sovereignty.
if by 'de facto sovereignty' you mean not being recognized by any country aka their passports being worthless. a database should not make political statements. if they earn recognition, they'll be listed. there are dozens of separatist territories in the world who claim sovereignty. if Transnistria were to be listed as a country - why not Karabakh? not to mention autonomous regions like Iraqi Kurdistan.
Is it even possible to build a database of countries without making political statements? Taiwan is one example where you can either use the ISO database and piss users off (and they will perceive it as a political statement), or you can make a change to the ISO database and make a political statement the other way around.
Indeed, a database of countries is politically tough.
IIRC, the Syrian government banned Facebook from Syria, because Golan Heights (occupied and annexed by Israel, but not recognised by most (or any?) other countries) was listed in Facebook as being in Israel.
By de facto sovereignty, I mean that they control their borders and the land within their borders. I don't mean to make a political statement, but rather acknowledge that despite their lack of recognition they do control this break-away region.
Iraqi Kurdistan is a different situation. They are autonomous, not a breakaway region.
I know it's different, I specified their status. and I have been to Transnistria, so I know how it works there. Also, even the break-away regions behave differently themselves when it comes to their own recognition - f.ex. Transnistria doesn't stamp your passport and on the Trans./Moldova border you don't get an entry stamp of Moldova either (because, well, formally, you are in Moldova), while Karabah gives stamps and with that stamp you can't go to Azerbaijan (my friend was denied entry).
Where I work, we have a very thoroughly compiled list of countries for our user profile "Country" dropdown. It never ceases to amuse me that people signing up for our services are given choices such as:
- Neutral Zone
- Union of Soviet Socialist Republics
- Unknown or Invalid Region
I think it's from the framework we use or something. I suppose it is somewhat reassuring that we're prepared if the USSR ever re-emerges or another Gulf War breaks out.
We have a similarly thorough list of possible languages for another dropdown which is used to indicate the language of the things our users create. If SignWriting ever gets absorbed into Unicode, we'll be prepared with our "Sign Language" dropdown option.
For those who are pedantic when it comes to countries, the OxJS library comes with a useful database. See https://oxjs.org/#doc/Ox.COUNTRIES -- or open the console and type:
In our case we had to simply use two different country tables. One for geolocation and film rights that matches our CDN's database, and one for historical film data such as the production country.
This is one case where I think DRY would be worse because the special cases that apply to one do not apply to the other semantically, and therefore the logic would get very subtle and muddy.
TimeZoneDB provides free time zone database for cities of the world. The database is licensed under Creative Commons Attribution 3.0 License. It contains countries name, time zones, abbreviation, GMT offset, and Daylight Saving Time (DST). The data is available in CSV and SQL format. You can download and implement into your projects for free.
The tz database that this is sourced for is one of the most impressive compilations of information that I have ever seen. Timezones seem like they should be simple, but the reality is very impressive: http://upload.wikimedia.org/wikipedia/commons/9/9f/Tz_map_wo...
This list also has currencies and some other stuff, but a Ruby script to extract what you want should not take more than 10 lines.
This list has the added advantage of being available in pretty much all the languages in the world, in case you need localisation for your project.
Side note: if you want to check if a list is up to date, check for "Aruba" and country code BQ (Caribbean). They changed the status of independent country a year ago so many lists don't have it yet (used to be a Dutch colony).
Some countries have more than one currency code, China has CNY and CNH. http://en.wikipedia.org/wiki/Renminbi#CNH. I'm not going to debate the fact that CNH isn't official even though it is traded.
Only two comments:
1. It is Czech Republic, not Czechia. At least in English.
2. Having the mulitple spoken languages in a single column is against normalization a bit. If one would have to use those values, they would need to parse the csv, etc.
Primary key on idCountry, no unique constraints on name/code/isoAlpha3, and 'default' blank values? It's certainly a start, but it's also a bit questionable.
Geonames has a bunch of freely-available location datasets. I've used it in various apps for cities, countries, postcodes, etc: http://www.geonames.org/export/
Cool work. I thought this was going to be a static .SQL file or something from the title. I was pleasantly surprised to find that it's actually a program[1] to dynamically generate a countries table based on current data. In addition: it's possible to define the columns you want to grab like name of the country, population, languages spoken, and so on.
It would likely also be useful to have the local language name for the countries as an optional column. You can probably source this from the http://naturalearthdata.com source mentioned elsewhere here.
Thanks for this. I think it sounds worthwhile to move my own country table work in PostgreSQL to an extension so everyone else can use it. Thanks for the inspiration/motivation!
I actually faced a similar problem and I ended up writing an almost identical script which generated a JSON or XML file (depending on the user's need).
* Territories such as Réunion are listed as countries
* Countries that ceased to exist are not listed (Yugoslavia, Soviet Union) although the world's most recent new country (South Sudan) is listed.
* Kosovo is listed, although not recognised by all the world. Transnistria is not listed, even though it has de facto sovereignty.
* Total land area is a disputable amount in several circumstances.
There are several more such examples I could list.
Yes I'm being pedantic. But I have a special interest in this area, and I'm aware of just how hard it is. People are born in one country (Soviet Union) and die in another (Ukraine), even though they never moved from their home city (Odessa).