Convert country names and codes to a standard.
require "normalize_country" NormalizeCountry("America") # "United States" NormalizeCountry("United States of America") # "United States" NormalizeCountry("USA", :to => :official) # "United States of America" NormalizeCountry("Iran", :to => :official) # "Islamic Republic of Iran" NormalizeCountry("U.S.", :to => :alpha2) # "US" NormalizeCountry("U.S.", :to => :numeric) # "840" NormalizeCountry("US", :to => :fifa) # "USA" NormalizeCountry("US", :to => :emoji) # "🇺🇸" NormalizeCountry("US", :to => :shortcode) # ":flag-us:" NormalizeCountry("Iran", :to => :alpha3) # "IRN" NormalizeCountry("Iran", :to => :ioc) # "IRI" NormalizeCountry("DPRK", :to => :short) # "North Korea" NormalizeCountry("North Korea", :to => :iso_name) # "Korea, Democratic People's Republic Of" # Or NormalizeCountry.convert("U.S.", :to => :alpha2) # "US" # Set the default NormalizeCountry.to = :alpha3 NormalizeCountry.convert("Mexico") # "MEX" NormalizeCountry.convert("United Mexican States") # "MEX"
Rubygems (part of Ruby):
gem install normalize_country
In addition to trying to convert from common, non-standardized names and abbrivations,
NormalizeCountry will convert to/from the following:
ISO 3166-1 alpha-2
ISO 3166-1 alpha-3
The country's emoji
FIFA (International Federation of Association Football)
International Olympic Committee
Country name used by ISO 3166-1
ISO 3166-1 numeric code
The country's official name
A shortned version of the country's name, commonly used when speaking and/or writing (US English)
A list of valid formats can be obtained by calling
NormalizeCountry.to_a # Defaults to NormalizeCountry.to NormalizeCountry.to_a(:ioc) # Array of IOC codes in ascending order NormalizeCountry.to_h(:ioc) # :ioc => NormalizeCountry.to NormalizeCountry.to_h(:ioc, :to => :numeric) # :ioc => :numeric
A small script is included that can convert country names contained in a DB table or a set of XML or CSV files
shell > normalize_country -h usage: normalize_country [options] SOURCE -h, --help Show this message -f, --format FORMAT The format of SOURCE -t, --to CONVERSION Convert country names to this format (see docs for valid formats) -l, --location LOCATION The location of the conversion
normalize_country -t alpha2 -l 'Country Name' -f csv data.csv normalize_country -t numeric -l countries.code -f db postgres://usr:[email protected]/conquests normalize_country -t fifa -l //teams[@sport = 'fútbol americano']//country -f xml data.xml
If the format is
csv you can spefify a directory instead of a filename
normalize_country -t alpha2 -l 'Country Name' -f csv /home/sshaw/capital-losses/2008
With a format of
csv it will read all files with an extension of
xml the original file(s) will be overwritten with new file(s) containing the converted country names.
To convert an XML file with namespaces just include the namespace prefix defined in the file in the XPath query (
SOURCE argument must be a Sequel connection string. Here
LOCATION is in the format
table.column, which will be updated with the converted name.
Please submit a patch or open an issue.
This code was -to some extent- part of a larger project that allowed users to perform a free-text search by country. Country names were stored in the DB by their ISO names.
Several years later at work we had to extract country names from a web service that didn't standardize them. Sometimes they used UK, other times U.K. It then occured to me that this code could be useful outside of the original project. The web service was fixed but, nevertheless…
Upon further investigation I've found the following:
Carmen: ISO country names and states/subdivisions
countries ISO country names, states/subdivisions, currency, E.164 phone numbers and language translations
country_codes ISO country names and currency data
i18n_data: ISO country names in different languages, includes alpha codes
ModelUN: Similar to this gem but with less support for conversion, it does include US states