Classifying Countries Based on Life Expectancy

In this project, we process data from the World Health Organization on a country-by-country basis and develop a model to predict whether a country's life expectancy is greater than or less than the world's median life expectancy.

Project Details

  • Flatiron School
  • Tech:Pandas, Altair, scikit-learn

Details:

  1. Clean the data of NaN's and check that the data for each country is stationary.
  2. Altair visual required getting country code data
    source = alt.topo_feature(data.world_110m.url,'countries')
    
    map_plot = alt.Chart(source).mark_geoshape().encode(
      color=alt.Color('Life expectancy :Q', legend=alt.Legend(title='Years')),
      tooltip='tooltip:N'
    ).transform_lookup(
     lookup='id',
     from_=alt.LookupData(df_2015_map, 'id', ['Life expectancy ', 'tooltip'])
    ).project(
     type='equirectangular'
    ).properties(
      width=900,
      height=540,
      title=('Life Expectency in Years')
    )
    
  3. Implement (a) decision tree and (b) random forest classifier (91% and 93% accuracy, respectively).
  4. Determine from the models that schooling is the most important feature, followed by HIV/AIDS prevalence. Our conclusions are careful not to confuse these features as causal.