Classifying Countries Based on Life Expectancy
In this project, we process data from the World Health Organization on a country-by-country basis and develop a model to predict whether a country's life expectancy is greater than or less than the world's median life expectancy.
Project Details
- Flatiron School
- Tech:Pandas, Altair, scikit-learn
Details:
- Clean the data of NaN's and check that the data for each country is stationary.
-
Altair visual required getting country code data
source = alt.topo_feature(data.world_110m.url,'countries') map_plot = alt.Chart(source).mark_geoshape().encode( color=alt.Color('Life expectancy :Q', legend=alt.Legend(title='Years')), tooltip='tooltip:N' ).transform_lookup( lookup='id', from_=alt.LookupData(df_2015_map, 'id', ['Life expectancy ', 'tooltip']) ).project( type='equirectangular' ).properties( width=900, height=540, title=('Life Expectency in Years') )
- Implement (a) decision tree and (b) random forest classifier (91% and 93% accuracy, respectively).
- Determine from the models that schooling is the most important feature, followed by HIV/AIDS prevalence. Our conclusions are careful not to confuse these features as causal.