在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:4gh/WorldBankData.jl开源软件地址:https://github.com/4gh/WorldBankData.jl开源编程语言:Julia 100.0%开源软件介绍:World Bank Data in JuliaThe World Bank provides access to global development data at data.worldbank.org. The primary collection of development indicators is called World Development Indicators (WDI). This module provides two functions to access and download the data:
It follows roughly the R WDI package. Build StatusInstallationusing Pkg
Pkg.add("WorldBankData") Basic ExamplesGet a DataFrame of the U.S. population: julia> using WorldBankData
julia> df = wdi("SP.POP.TOTL", "US")
60×4 DataFrame
│ Row │ iso2c │ country │ year │ SP_POP_TOTL │
│ │ String │ String │ Float64 │ Float64? │
├─────┼────────┼───────────────┼─────────┼─────────────┤
│ 1 │ US │ United States │ 1960.0 │ 1.80671e8 │
│ 2 │ US │ United States │ 1961.0 │ 1.83691e8 │
⋮
│ 59 │ US │ United States │ 2018.0 │ 3.26688e8 │
│ 60 │ US │ United States │ 2019.0 │ 3.2824e8 │ The WDI indicator Get a DataFrame of the U.S. population from 1980 until 2012 including region data: julia> df = wdi("SP.POP.TOTL", "US", 1980, 2012, extra=true)
33×12 DataFrame. Omitted printing of 5 columns
│ Row │ iso2c │ country │ SP_POP_TOTL │ year │ iso3c │ name │ region │
│ │ String │ String │ Float64? │ Float64 │ String │ String │ String │
├─────┼────────┼───────────────┼─────────────┼─────────┼────────┼───────────────┼───────────────┤
│ 1 │ US │ United States │ 2.27225e8 │ 1980.0 │ USA │ United States │ North America │
│ 2 │ US │ United States │ 2.29466e8 │ 1981.0 │ USA │ United States │ North America │
⋮
│ 32 │ US │ United States │ 3.11557e8 │ 2011.0 │ USA │ United States │ North America │
│ 33 │ US │ United States │ 3.13831e8 │ 2012.0 │ USA │ United States │ North America │ ISO 3 letter country codes are also supported: df = wdi("SP.POP.TOTL", "USA", 1980, 2012) Multiple indicators and countries can be requested: julia> df = wdi(["SP.POP.TOTL", "NY.GDP.MKTP.CD"], ["US","BR"], 1980, 2012)
66×5 DataFrame
│ Row │ iso2c │ country │ year │ NY_GDP_MKTP_CD │ SP_POP_TOTL │
│ │ String │ String │ Float64 │ Float64? │ Float64? │
├─────┼────────┼───────────────┼─────────┼────────────────┼─────────────┤
│ 1 │ BR │ Brazil │ 1980.0 │ 2.35025e11 │ 1.20694e8 │
│ 2 │ BR │ Brazil │ 1981.0 │ 2.63561e11 │ 1.2357e8 │
⋮
│ 65 │ US │ United States │ 2011.0 │ 1.55426e13 │ 3.11557e8 │
│ 66 │ US │ United States │ 2012.0 │ 1.6197e13 │ 3.13831e8 │ By default a wide DataFrame is returned (indicators are columns). The data can also be returned in long format which might be more useful if many indicators are requested: julia> df = wdi(["SP.POP.TOTL", "NY.GDP.MKTP.CD"], ["US","BR"], 1980, 2012, dflong=true)
132×5 DataFrame
│ Row │ iso2c │ country │ year │ indicator │ value │
│ │ String │ String │ Float64 │ String │ Float64? │
├─────┼────────┼───────────────┼─────────┼────────────────┼────────────┤
│ 1 │ BR │ Brazil │ 1980.0 │ SP.POP.TOTL │ 1.20694e8 │
│ 2 │ BR │ Brazil │ 1980.0 │ NY.GDP.MKTP.CD │ 2.35025e11 │
⋮
│ 131 │ US │ United States │ 2012.0 │ SP.POP.TOTL │ 3.13831e8 │
│ 132 │ US │ United States │ 2012.0 │ NY.GDP.MKTP.CD │ 1.6197e13 │ Get a DataFrame of the total population for all countries from 1980 to 2012: using WorldBankData
df = wdi("SP.POP.TOTL", "all", 1980, 2012) ArgumentsThe function wdi(indicators::Union{String,Array{String,1}}, countries::Union{String,Array{String,1}},
startyear::Integer=-1, endyear::Integer=-1;
extra::Bool=false, sourceid::Integer=2, dflong::Bool=false, verbose::Bool=false)::DataFrame It needs a minimum of two arguments: the
SearchingThe most convenient way to explore the database is probably through a web browser at data.worldbank.org. However, the module does provide a search function: One can search for "countries" or "indicators" data. Example for country search by namejulia> using WorldBankData
julia> res=search_wdi("countries","name",r"united"i)
julia> res[!, :name]
3-element DataArray{UTF8String,1}:
"United Arab Emirates"
"United Kingdom"
"United States"
julia> res[!, :iso2c]
3-element DataArray{ASCIIString,1}:
"AE"
"GB"
"US" Example for indicator search by descriptionjulia> using WorldBankData
julia> res=search_wdi("indicators","description",r"gross national expenditure"i)
6x5 DataFrame
...
julia> res[!, :name]
6-element DataArray{UTF8String,1}:
"Gross national expenditure deflator (base year varies by country)"
"Gross national expenditure (current US\$)"
"Gross national expenditure (current LCU)"
"Gross national expenditure (constant 2005 US\$)"
"Gross national expenditure (constant LCU)"
"Gross national expenditure (% of GDP)"
julia> res[!, :indicator]
6-element DataArray{UTF8String,1}:
"NE.DAB.DEFL.ZS"
"NE.DAB.TOTL.CD"
"NE.DAB.TOTL.CN"
"NE.DAB.TOTL.KD"
"NE.DAB.TOTL.KN"
"NE.DAB.TOTL.ZS" The search_wdi() functionThe search_wdi(data::String, entry::String, regx::Regex)::DataFrame
"countries" can be searched for "name", "region", "capital", "iso2c", "iso3c", "income", and "lending". "indicators" can be searched for "name", "description", "topics", "source_database", and "source_organization". The search function uses two DataFrames Note that the last argument to Examples of country searchesjulia> search_wdi("countries","iso2c",r"TZ"i)
1×9 DataFrame. Omitted printing of 2 columns
│ Row │ iso3c │ iso2c │ name │ region │ capital │ longitude │ latitude │
│ │ String │ String │ String │ String │ String │ Float64? │ Float64? │
├─────┼────────┼────────┼──────────┼─────────────────────┼─────────┼───────────┼──────────┤
│ 1 │ TZA │ TZ │ Tanzania │ Sub-Saharan Africa │ Dodoma │ 35.7382 │ -6.17486 │
julia> search_wdi("countries","income",r"upper middle"i)
...
julia> search_wdi("countries","region",r"Latin America"i)
...
julia> search_wdi("countries","capital",r"^Ka"i)
3×9 DataFrame. Omitted printing of 2 columns
│ Row │ iso3c │ iso2c │ name │ region │ capital │ longitude │ latitude │
│ │ String │ String │ String │ String │ String │ Float64? │ Float64? │
├─────┼────────┼────────┼─────────────┼─────────────────────┼───────────┼───────────┼──────────┤
│ 1 │ AFG │ AF │ Afghanistan │ South Asia │ Kabul │ 69.1761 │ 34.5228 │
│ 2 │ NPL │ NP │ Nepal │ South Asia │ Kathmandu │ 85.3157 │ 27.6939 │
│ 3 │ UGA │ UG │ Uganda │ Sub-Saharan Africa │ Kampala │ 32.5729 │ 0.314269 │
julia> search_wdi("countries","lending",r"IBRD"i)
...
Examples of indicator searchesjulia> search_wdi("indicators","name",r"gross national expenditure"i)
...
julia> search_wdi("indicators","description",r"gross national expenditure"i)
...
julia> search_wdi("indicators","source_database",r"Sustainable"i)
...
julia> search_wdi("indicators","source_organization",r"Global Partnership"i)
Tips and TricksExtracting country data from resultsdf = wdi("SP.POP.TOTL", ["US","BR"], 1980, 2012, extra=true)
us_pop = df[df[!, :iso2c] .== "US", :] Year formatFor similarity with the
R WDI package the You can easily convert this to a Date series: using WorldBankData
using Dates
df = wdi("AG.LND.ARBL.HA.PC", "US", 1900, 2011)
df[!, :year] = map(Date, df[!, :year]) PlottingInstall the StatsPlots.jl package with
using WorldBankData
using StatsPlots
df = wdi("SP.POP.TOTL", "US", 1980, 2010)
@df df scatter(:year, :SP_POP_TOTL) Empty/Missing results
julia> df = wdi("EN.ATM.CO2E.KT", "AS")
60×4 DataFrames.DataFrame
│ Row │ iso2c │ country │ EN_ATM_CO2E_KT │ year │
│ │ String │ String │ Missing │ Float64 │
├─────┼────────┼────────────────┼────────────────┼─────────┤
│ 1 │ AS │ American Samoa │ missing │ 1960.0 │
│ 2 │ AS │ American Samoa │ missing │ 1961.0 │
│ 3 │ AS │ American Samoa │ missing │ 1962.0 │
...
julia> df = wdi("EN.ATM.CO2E.KT", ["AS","US"])
120×4 DataFrames.DataFrame
│ Row │ iso2c │ country │ EN_ATM_CO2E_KT │ year │
│ │ String │ String │ Float64⍰ │ Float64 │
├─────┼────────┼────────────────┼────────────────┼─────────┤
│ 1 │ AS │ American Samoa │ missing │ 1960.0 │
│ 2 │ AS │ American Samoa │ missing │ 1961.0 │
...
│ 114 │ US │ United States │ 5.15916e6 │ 2013.0 │
│ 115 │ US │ United States │ 5.25428e6 │ 2014.0 │
│ 116 │ US │ United States │ missing │ 2015.0 │
│ 120 │ US │ United States │ missing │ 2019.0 │
... CacheThe data in the World Bank database changes infrequently. Therefore it makes little sense to download it every time a script is run. MetadataThe Download and store the country and indicator information in csv files: using WorldBankData
using DataFrames
using CSV
CSV.write("country_cache.csv",WorldBankData.get_countries())
CSV.write("indicator_cache.csv", WorldBankData.get_indicators()) These can be used in the script to set the WorldBankData cache
variables using WorldBankData
using DataFrames
WorldBankData.set_country_cache(CSV.read("country_cache.csv"))
WorldBankData.set_indicator_cache(CSV.read("indicator_cache.csv")) From then on the The caches can be reset with Indicator dataIn a similar way the indicator data itself can be cached. using WorldBankData
using DataFrames
using CSV
function update_us_pop_totl()
df = wdi("SP.POP.TOTL", "US")
CSV.write("us_pop_totl.csv",df)
end
df = CSV.read("us_pop_totl.csv") Occasionally update the data by running the |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论