Walkability is defined as the ease with which people can access amenities in a place without the use of cars. Walkability is associated with equitable access to key resources as well as positive outcomes in health, social bonding and community-building, sustainability, and the economy. While urban areas in the US initially developed around transportation by foot, the mass introduction of cars and motorized vehicles in the 1950s led to urban sprawl, which is an expansion pattern consisting of low-density areas and car-dependent lifestyles1. A report by the Institute for Transportation and Development Policy evaluated the walkability of nearly 1,000 cities globally. The report placed London, Hong Kong, Paris, and Bogotá as cities with highest walkability scores and U.S. cities particularly low on the list as a result of urban sprawl2.
The same report found that the only city in the US to make the top 25 in any category was Washington, D.C. Our personal experience living in Washington, D.C. also mirrors the fact that Washington, D.C. seems like a relatively walkable city compared to other cities in the US. Some of us have also lived in walkable cities around the world and saw first-hand the impact of high walkability on well-being. As our graduate program has brought us together from around the world to Washington, D.C., it would not just be professionally rewarding, but also personally meaningful to explore the city’s walkability.
To explore this topic further, we aim to answer the following data science questions:
How is walkabilty associated with socioeconomic outcomes in Washington, D.C.?
How is walkability associated with health outcomes in Washington, D.C.?
How accessible are neighborhoods in Washington, D.C. by bike?
What is metro ridership in Washington, D.C. like?
What is public sentiment around walkability in Washington, D.C.?
We answer these questions by first collecting, cleaning, and exploring US Census Tract, Capital Bike Share, PLACES Census Health Data Estimation, Washington Metro (WMATA) ridership, and Reddit data. Based on our initial data exploration, we investigate the ease of access of different neighborhoods, or census tracts, by foot, which is our primary measure of walkability. We also explore the correlation between walkability in different tracts and social and health outcomes. We then proceed to investigate the ease of access of tracts by bike by examining the distribution of Capital Bike Share stations and buffered bike lanes throughout the city. Next, we look at the usage of the WMATA metro. Finally, we check sentiment around walkability in the city by analyzing Reddit threads related to this topic.
To learn more about the methodology, please visit our methods page.
1. How is walkabilty associated with socioeconomic outcomes in Washington, D.C.?
Code
import altair as altimport pandas as pdimport geopandas as gpdfrom pathlib import Pathimport requestsimport numpy as npimport warningswarnings.simplefilter(action='ignore', category=FutureWarning)"""IMPORT DATA"""# define data directorydata_dir = Path().absolute().parent.absolute().parent/"data"raw_data_dir = Path().absolute().parent.absolute().parent/"data"/"raw_data"img_dir = Path().absolute().parent.absolute().parent/"data"/"img"# import datawalkability = pd.read_csv(raw_data_dir/"joined_depression_cre_walkability.csv")walkability.loc[:, 'geoid_tract_20'] = walkability.geoid_tract_20.astype(str)nation = pd.read_csv(data_dir/"cleaned_data"/"nation-joined_depression_cre_walkability.csv")# Ingest GEOJSON file of census tracts in DC and grab jsonreq_dc = requests.get('https://raw.githubusercontent.com/arcee123/GIS_GEOJSON_CENSUS_TRACTS/master/11.geojson')json_dc = req_dc.json()# create geopandas dataframe and add in the walkability / outcomes datageo_df = gpd.GeoDataFrame.from_features((json_dc))merged_df = geo_df.merge(walkability, how ='left', left_on ='GEOID', right_on='geoid_tract_20')"""NORMALIZE SCORES ACROSS ALL METRICS"""# convert the walkability score into a scale from 0 to 100 to make it more easier to interpret# original range 1-20# new desired range: 0-100original_range_min =1original_range_max =20new_range_max =100new_range_min =0merged_df.loc[:, 'walkability_score_scaled'] = merged_df.loc[:, 'walkability_score'].apply(lambda x: ((x - original_range_min) / (original_range_max - original_range_min)) * (new_range_max - new_range_min) + new_range_min)nation.loc[:, 'walkability_score_scaled'] = nation.loc[:, 'walkability_score'].apply(lambda x: ((x - original_range_min) / (original_range_max - original_range_min)) * (new_range_max - new_range_min) + new_range_min)# convert the income inequality index score into a scale from 0 to 100 to make it easier to interpret# original range 0-1# new desired range: 0-100original_range_min =0original_range_max =1new_range_max =100new_range_min =0merged_df.loc[:, 'income_inequality_gini_index'] = merged_df.loc[:, 'income_inequality_gini_index'].apply(lambda x: x if x >=0else np.nan)merged_df.loc[:, 'income_inequality_gini_index_scaled'] = merged_df.loc[:, 'income_inequality_gini_index'].apply(lambda x: ((x - original_range_min) / (original_range_max - original_range_min)) * (new_range_max - new_range_min) + new_range_min)nation.loc[:, 'income_inequality_gini_index'] = nation.loc[:, 'income_inequality_gini_index'].apply(lambda x: x if x >=0else np.nan)nation.loc[:, 'income_inequality_gini_index_scaled'] = nation.loc[:, 'income_inequality_gini_index'].apply(lambda x: ((x - original_range_min) / (original_range_max - original_range_min)) * (new_range_max - new_range_min) + new_range_min)# define columns to reportoutcomes_cols = ['walkability_score_scaled','below_poverty_level_perc','income_inequality_gini_index_scaled','hs_grad_perc','households_no_vehicle_perc']for i in outcomes_cols: merged_df[i] = merged_df[i].apply(lambda x: x if x >=0else np.nan) nation[i] = nation[i].apply(lambda x: x if x >=0else np.nan)# flip metric to be percent of households with a carmerged_df.loc[:, 'households_w_vehicle'] =100- merged_df['households_no_vehicle_perc']nation.loc[:, 'households_w_vehicle'] =100- nation['households_no_vehicle_perc']"""CLEAN COLUMN NAMES"""col_mapping = {'below_poverty_level_perc': '% Below Poverty Level','income_inequality_gini_index_scaled': 'Income Inequality Gini Score','hs_grad_perc': '% HS or Higher Degree','households_w_vehicle': '% with a Vehicle','walkability_score_scaled': 'Walkability Score','neighborhood_name': 'Neighborhood Name'}merged_df = merged_df.rename(col_mapping, axis='columns')"""RE-FORMAT DATA"""# turn the dataframe into long data so that the bar chart can be created with each outcome as a barneighborhood_df = pd.melt(merged_df, id_vars ='Neighborhood Name', value_vars = col_mapping.values())neighborhood_df = neighborhood_df.groupby(['Neighborhood Name', 'variable'])['value'].mean().reset_index()walk_scores =dict(zip(list(neighborhood_df[neighborhood_df.variable=='Walkability Score']['Neighborhood Name']),list(neighborhood_df[neighborhood_df.variable=='Walkability Score']['value']) ))neighborhood_df.loc[:, 'Walkability Score'] = neighborhood_df['Neighborhood Name'].map(walk_scores)# reformat to get the averagesnation = nation[outcomes_cols+['households_w_vehicle']]nation.drop('households_no_vehicle_perc', axis='columns', inplace=True)nation_avg = pd.melt(nation, value_vars = [i for i in col_mapping.keys() if'neighborhood_name'notin i])nation_avg = nation_avg.groupby('variable')['value'].mean().reset_index()# create cleaned column for plotting the national averagesnation_avg['National Average'] = nation_avg['variable'].map(col_mapping)# create DC average walkability scoreneighborhood_df['dc_avg_walk'] = merged_df['Walkability Score'].mean()# add URL to the american flag iconnation_avg['flag_url'] ='https://upload.wikimedia.org/wikipedia/commons/d/de/Flag_of_the_United_States.png'"""CREATE VISUALIZATION"""# define a click on the chloropleth map so that it can filter the bar chartclick = alt.selection_multi(fields=['Neighborhood Name'])# create the chloropleth mapchoropleth = (alt.Chart(merged_df, title ="Walkability of DC Census Tracts" ) .mark_geoshape(stroke='white') .transform_lookup( lookup='geoid_tract_20', from_=alt.LookupData(merged_df,'geoid_tract_20', ['Walkability Score', 'Neighborhood Name']) ).encode( alt.Color('Walkability Score:Q', scale=alt.Scale(scheme='redyellowblue', reverse=True ), title ="DC Walkability" ), opacity=alt.condition(click, alt.value(1), alt.value(0.2)), tooltip=['Neighborhood Name:N', 'Walkability Score:Q']) .add_selection(click) )bars = ( alt.Chart(neighborhood_df, title='Outcomes of DC Neighborhoods') .mark_bar() .encode( x = alt.X('variable:N', axis=alt.Axis(labelAngle=-45)), color ='mean(Walkability Score):Q', y = alt.Y('mean(value):Q', sort='x', scale = alt.Scale(domain = [0, 100]) ), tooltip = ['variable:N','mean(value):Q' ] ).properties( width =200, height =300 ).transform_filter(click))# modify the axes and title labelsbars.encoding.y.title ='Avg. Value Across All Census Tracts'bars.encoding.x.title ='Outcome'nation_avg_lines = (alt.Chart(nation_avg) .mark_tick( color="black", thickness=3, size=39, # controls width of tick strokeDash=[1,2] ) .encode( x ='National Average:N', y='value:Q' ))nation_avg_img = (alt.Chart(nation_avg) .mark_image( width=15, height=15) .encode( x='National Average:N', y='value:Q', url='flag_url', tooltip = ['National Average', 'value:Q'] ))# plot the two graphs togetheralt.hconcat(choropleth, (bars+nation_avg_lines+nation_avg_img))
Figure 1 (linked view)
First, we want to investigate whether walkability has an impact on other aspects of peoples’ lives. This visualization vies into the data science question of whether a more walkable neighborhood leads to higher socioeconomic outcomes. One of theories behind this data science question is that a more walkable neighborhood may result in closer proximity to higher paying job opportunities. Our second theory was that the designs of walkable neighborhoods often results in higher economic activity within that neighborhood due to increased foot traffic, which might generate more business within an area.
The left plot is a map of every census tract1 in the District of Columbia and the color of each census tract is encoded with that tract’s walkability score. Certain D.C. neighborhoods are comprised of several census tracts, depending on the population density of that neighborhood.2 Hovering over each census tract will display the name of the neighborhood that it is in, as well as the walkability score for that particular census tract. The right bar graph shows several social outcomes averaged across the entire district. If you click on a certain neighborhood (which may be comprised of more than one census tract) on the map, it will then highlight that neighborhood in the map, and then update the bar graph with the corresponding social outcomes averaged across just that neighborhood. On each bar, the national averages are also displayed, marked by the image of an American flag with a horizontal line indicator as well. Hovering over each bar gives you the value of that social outcome averaged across all the census tracts in that neighborhood, and hovering over each American flag gives you the national average of that social outcome.
Overall, we can see that DC is a highly walkable city, especially in comparison to the rest of the United States. In fact, it has almost double the walkability score as the national average. Accompanying that fact, we see that far fewer households in DC have vehicles in comparison to the national average. Interestingly, we see that DC fares about average for the social outcomes reported on. We that the most walkable parts of the city are concentrated in the city center around downtown, and as one ventures out from the city center the walkability decreases. An interesting finding is that although all edges of the city decrease in walkability, we see that the topmost edges of the city (wards 3 and 4) increase in car ownership, have very low rates of poverty, and higher high school education attainment. The lower edges of the city (wards 7 and 8) have lower walkability scores but still have lower rates of car ownership, higher poverty, and lower high school degree attainment (in comparison with wards 3 and 4). This logically suggests that car ownership is a key factor in economic success in less walkabile areas. In contrast, we see that in highly walkable neighborhoods such as Logan Circle / Shaw, it has significantly lower car ownership even in comparison to the DC average, yet has lower rates of poverty, and higher rates of high school degree attainment .
1 A census tract is a geographic region defined for the purpose of taking a census. There are 179 census tracts in Washington, D.C. 2 Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. A census tract usually covers a contiguous area; however, the spatial size of census tracts varies widely depending on the density of settlement.
2. How is walkability associated with health outcomes in Washington, D.C.?
Code
# IMPORT RELEVANT LIBRARIESimport numpy as npimport pandas as pdimport pandas as pdimport matplotlib.pyplot as pltimport plotly.express as pximport plotly.graph_objects as goimport plotly.offline as pyoimport numpy as npimport jsonimport requestsimport numpy as npimport scipy.statsimport plotly.subplots as spfrom pathlib import Pathimport warningswarnings.simplefilter(action='ignore', category=FutureWarning)# import the csvdata_dir = Path().absolute().parent.absolute().parent/"data"img_dir = Path().absolute().parent.absolute().parent/"data"/"img"raw_data_dir = Path().absolute().parent.absolute().parent/"data"/"raw_data"dc_health_df = pd.read_csv(raw_data_dir/"PLACES__Census_Tract_Data__GIS_Friendly_Format___2022_release (1).csv")# filter for where StateAbbr = DCdc_health_df = dc_health_df[dc_health_df['StateAbbr'] =='DC']# Resetting defaults and import plotly librariesimport plotly.io as piopio.renderers.default ="browser"import plotly.graph_objects as goimport plotly.express as pximport plotly.io as piopio.renderers.default ="plotly_mimetype+notebook_connected"import statsmodels.api as smimport numpy as npfrom sklearn.metrics import r2_score# theming variables # choose the figure fontfont_dict=dict(family='Arial', size=14, color='black' )# isolate only columns with CrudePrev in the namedc_health_df_prev = dc_health_df.filter(regex='CrudePrev')df = dc_health_df_prev# Rename columnsdf = df.rename(columns={'ACCESS2_CrudePrev': '% of Adults without Health Insurance', 'ARTHRITIS_CrudePrev': '% of Adults with Arthritis', 'BINGE_CrudePrev': '% of Adults who Binge Drink','BPHIGH_CrudePrev': '% of Adults with High Blood Pressure','BPMED_CrudePrev': '% of Adults with High Blood Pressure who take Blood Pressure Medication','CANCER_CrudePrev': '% of Adults who were Diagnosed with Cancer','CASTHMA_CrudePrev': '% of Adults who were Diagnosed with Asthma','CERVICAL_CrudePrev': '% of Women who had a Pap Smear in the Past 3 Years','CHD_CrudePrev': '% of Adults who were Diagnosed with Coronary Heart Disease','CHECKUP_CrudePrev': '% of Adults who had a Routine Checkup in the Past Year','CHOLSCREEN_CrudePrev': '% of Adults who had Cholesterol Checked in the Past 5 Years','COLON_SCREEN_CrudePrev': '% of Adults who had a Colonoscopy or similar test in the Past 10 Years','COPD_CrudePrev': '% of Adults who were Diagnosed with COPD (Chronic Obstructive Pulmonary Disease)','COREM_CrudePrev': '% Prevalence of Older Adult Men aged >=65 years who are up to date on preventative health','COREW_CrudePrev': '% Prevalence of Older Adult Women aged >=65 years who are up to date on preventative health','CSMOKING_CrudePrev': '% of Adults who Currently Smoke','DENTAL_CrudePrev': '% of Adults who had a Dental Visit in the Past Year','DEPRESSION_CrudePrev': '% of Adults who were Diagnosed with Depression','DIABETES_CrudePrev': '% of Adults who were Diagnosed with Diabetes','GHLTH_CrudePrev': '% of Adults who reported their Health as not Good','HIGHCHOL_CrudePrev': '% of Adults who were Diagnosed with High Cholesterol','KIDNEY_CrudePrev': '% of Adults who were Diagnosed with Kidney Disease','LPA_CrudePrev': '% of Adults who are Physically Inactive', 'MAMMOUSE_CrudePrev': '% Women aged 50-74 years who had a Mammogram in the Past 2 Years','MHLTH_CrudePrev': '% of Adults who reported their Mental Health as not Good','OBESITY_CrudePrev': '% of Adults who were Obese','PHLTH_CrudePrev': '% of Adults who reported their Physical Health as not Good','SLEEP_CrudePrev': '% of Adults who reported their Sleep as not Good','STROKE_CrudePrev': '% of Adults who were Diagnosed with Stroke','TEETHLOST_CrudePrev': '% of Adults who have lost all of their Natural Teeth'})# list of health metrics for drop down menucolumn_names = df.columns# Creating the initial scatter plotfig = go.Figure(go.Scatter(x=df[column_names[0]], y=df[column_names[1]], mode='markers'))# Label axesfig.update_xaxes(title_text='X Axis')fig.update_yaxes(title_text='Y Axis')# Setting the range for x and y axesfig.update_xaxes(range=[0, 100])fig.update_yaxes(range=[0, 100])for col in column_names:for col2 in column_names: x = df[col] y = df[col2] fig.add_trace(go.Scatter(x=x, y=y, mode='markers', name=col +' vs '+ col2, showlegend=False, visible=False))# Update the visibility of the tracesdef update_visibility(selected_col, selected_col2):for i, trace inenumerate(fig.data):if trace.name == selected_col +' vs '+ selected_col2: trace.visible =Trueelif trace.name == selected_col +' vs '+ selected_col2 +' Best Fit': trace.visible =Trueelse: trace.visible =False# Create the drop-down menus for x (col) and y (col2) axes of the scatter plotcol_dropdown = [{'label': col, 'value': col} for col in column_names]col2_dropdown = [{'label': col2, 'value': col2} for col2 in column_names]# #Define the dropdown menu for x-axisbutton_layer_1_height =1.08x_axis_dropdown = go.layout.Updatemenu( buttons=list([dict(args=[{'x': [df[col]]}, update_visibility(col, col2)], label=col, method='update') for col in column_names]), direction="down", pad={"r": 10, "t": 10}, showactive=True, x=0.06, xanchor="left", y=button_layer_1_height +0.05, yanchor="top")# Define the dropdown menu for y-axisy_axis_dropdown = go.layout.Updatemenu( buttons=list([dict(args=[{'y': [df[col2]]}, update_visibility(col, col2)], label=col2, method='update') for col2 in column_names]), direction="down", pad={"r": 10, "t": 10}, showactive=True, x=0.06, xanchor="left", y=button_layer_1_height, yanchor="top")# Update the layout to include the dropdown menusfig.update_layout( updatemenus=[x_axis_dropdown, y_axis_dropdown], font=font_dict,)# Label axesfig.update_xaxes(title_text='X Axis')fig.update_yaxes(title_text='Y Axis')# Setting the range for x and y axesfig.update_xaxes(range=[0, 100])fig.update_yaxes(range=[0, 100])# Update plot sizingfig.update_layout( width=900, height=900, autosize=False,#margin=dict(t=100, b=0, l=0, r=0),)# add annotationsfig.update_layout( annotations=[dict( text="X Axis:", x=0, xref="paper", y=button_layer_1_height +0.025, yref="paper", align="left", showarrow=False ),dict( text="Y Axis:", x=0, xref="paper", y=button_layer_1_height -0.025, yref="paper", align="left", showarrow=False ) ])# Change background color to defined colorsfig.update_layout( plot_bgcolor='rgb(230, 230, 230)')# Change scatter point color to defined colorsfig.update_traces( marker=dict(color='rgb(112, 14, 1)'))# # # Create a function to update the visibility of the traces based on selected columns# def update_visibility(selected_col, selected_col2):# for i, trace in enumerate(fig.data):# trace.visible = (trace.name == selected_col + ' vs ' + selected_col2)# trace.visible = (trace.name == selected_col + ' vs ' + selected_col2 + ' Best Fit')# Display the scatter plot with dropdown menusfig.show()# Import walkability datadf_walk = pd.read_csv(raw_data_dir/"joined_depression_cre_walkability.csv")dc_health_df.rename(columns={'TractFIPS': 'census_tract'}, inplace=True)df_walk.rename(columns={'geoid_tract_20': 'census_tract'}, inplace=True)# Merge the two dataframesdf_merged = pd.merge(dc_health_df, df_walk, on='census_tract', how='left')# Resetting defaults and import plotly librariesimport plotly.io as piopio.renderers.default ="browser"import plotly.graph_objects as goimport plotly.express as pximport plotly.io as piopio.renderers.default ="plotly_mimetype+notebook_connected"# isolate only columns with CrudePrev in the namedc_health_df_prev = df_merged.filter(regex='CrudePrev')# add the walkability_score column back indc_health_df_prev['walkability_score'] = df_merged['walkability_score']df = dc_health_df_prev# Rename columnsdf = df.rename(columns={'ACCESS2_CrudePrev': '% of Adults without Health Insurance', 'ARTHRITIS_CrudePrev': '% of Adults with Arthritis', 'BINGE_CrudePrev': '% of Adults who Binge Drink','BPHIGH_CrudePrev': '% of Adults with High Blood Pressure','BPMED_CrudePrev': '% of Adults with High Blood Pressure who take Blood Pressure Medication','CANCER_CrudePrev': '% of Adults who were Diagnosed with Cancer','CASTHMA_CrudePrev': '% of Adults who were Diagnosed with Asthma','CERVICAL_CrudePrev': '% of Women who had a Pap Smear in the Past 3 Years','CHD_CrudePrev': '% of Adults who were Diagnosed with Coronary Heart Disease','CHECKUP_CrudePrev': '% of Adults who had a Routine Checkup in the Past Year','CHOLSCREEN_CrudePrev': '% of Adults who had Cholesterol Checked in the Past 5 Years','COLON_SCREEN_CrudePrev': '% of Adults who had a Colonoscopy or similar test in the Past 10 Years','COPD_CrudePrev': '% of Adults who were Diagnosed with COPD (Chronic Obstructive Pulmonary Disease)','COREM_CrudePrev': '% Prevalence of Older Adult Men aged >=65 years who are up to date on preventative health','COREW_CrudePrev': '% Prevalence of Older Adult Women aged >=65 years who are up to date on preventative health','CSMOKING_CrudePrev': '% of Adults who Currently Smoke','DENTAL_CrudePrev': '% of Adults who had a Dental Visit in the Past Year','DEPRESSION_CrudePrev': '% of Adults who were Diagnosed with Depression','DIABETES_CrudePrev': '% of Adults who were Diagnosed with Diabetes','GHLTH_CrudePrev': '% of Adults who reported their Health as not Good','HIGHCHOL_CrudePrev': '% of Adults who were Diagnosed with High Cholesterol','KIDNEY_CrudePrev': '% of Adults who were Diagnosed with Kidney Disease','LPA_CrudePrev': '% of Adults who are Physically Inactive', 'MAMMOUSE_CrudePrev': '% Women aged 50-74 years who had a Mammogram in the Past 2 Years','MHLTH_CrudePrev': '% of Adults who reported their Mental Health as not Good','OBESITY_CrudePrev': '% of Adults who were Obese','PHLTH_CrudePrev': '% of Adults who reported their Physical Health as not Good','SLEEP_CrudePrev': '% of Adults who reported their Sleep as not Good','STROKE_CrudePrev': '% of Adults who were Diagnosed with Stroke','TEETHLOST_CrudePrev': '% of Adults who have lost all of their Natural Teeth'})# list of health metrics for drop down menucolumn_names = df.columns# Creating the initial scatter plotfig = go.Figure(go.Scatter(x=df[column_names[0]], y=df[column_names[1]], mode='markers'))# Label axesfig.update_xaxes(title_text='Walkability Score')fig.update_yaxes(title_text='Y Axis')# Setting the range for x and y axes#fig.update_xaxes(range=[0, 100])fig.update_xaxes(range=[0, max(df['walkability_score'])])fig.update_yaxes(range=[0, 100])for col in column_names: fig.add_trace(go.Scatter(x=df['walkability_score'], y=df[col], mode='markers', name='Walkability vs '+ col, visible=False))def update_visibility(selected_col, selected_col2):return [(trace.name == selected_col +' vs '+ selected_col2) for trace in fig.data]# Create the drop-down menus for x (col) and y (col2) axes of the scatter plotcol_dropdown = [{'label': col, 'value': col} for col in column_names]# Define the dropdown menu for x-axisbutton_layer_1_height =1.08y_axis_dropdown = go.layout.Updatemenu( buttons=list([dict( args=[ {"y": [df[col]], "visible": [(trace.name =="Walkability vs "+ col) for trace in fig.data]} ], label=col, method="update", ) for col in column_names ]), direction="down", pad={"r": 10, "t": 10}, showactive=True, x=0.06, xanchor="left", y=button_layer_1_height, yanchor="top")# Update the layout to include the dropdown menusfig.update_layout( updatemenus=[y_axis_dropdown], font=font_dict)# Label axesfig.update_xaxes(title_text='X Axis')fig.update_yaxes(title_text='Y Axis')# Update plot sizingfig.update_layout( width=900, height=900, autosize=False,#margin=dict(t=100, b=0, l=0, r=0),)# add annotationsfig.update_layout( annotations=[dict( text="X Axis: Walkability Score of the Neighborhood", x=0, xref="paper", y=button_layer_1_height +0.025, yref="paper", align="left", showarrow=False ),dict( text="Y Axis:", x=0, xref="paper", y=button_layer_1_height -0.025, yref="paper", align="left", showarrow=False ) ])def update_visibility(selected_col, selected_col2):return [(trace.name == selected_col +' vs '+ selected_col2) for trace in fig.data]# Change background color to greyfig.update_layout( plot_bgcolor='rgb(230, 230, 230)')# Change scatter point color to redfig.update_traces( marker=dict(color='rgb(112, 14, 1)'))# Display the scatter plot with dropdown menusfig.show()
Figure 2
After examining the relationship between walkability and socioeconomic outcomes, we wanted to investigate the relationship between walkability and health outcomes as well. In the graph above, we can see that on the x-axis we have walkability score and on the y-axis we have the selected health metric. The data is presented at the neighborhood level; in other words, each data point on the scatter plot represents a neighborhood in Washington, D.C. as defined by U.S. Census Tract data.
Upon exploring the different health metrics, we can see that neighborhoods with low walkability fare worse at many health metrics, such as percentage of adults who lost all their adult teeth and percentage of adults who are physically inactive. This makes sense because lower walkability makes it harder to be physically active. In addition, oral health is correlated to diverse health outcomes including oral health. However, there are interactions between health outcomes, such as the impact of poor oral health on physical activity, that might not point to a straightforward causal relationship between walkability and a particular health outcome.3 We can reference the first graph that shows the relationship between pairs of health metrics to understand interactions between health metrics more.
3. How accessible are neighborhoods in Washington, D.C. by bike?
Code
import pandas as pdimport numpy as npimport altair as altimport plotly.graph_objects as gofrom vega_datasets import dataimport requestsimport jsonimport warningsfrom pathlib import Pathwarnings.filterwarnings('ignore')raw_data_dir = Path().absolute().parent.absolute().parent/"data"/"raw_data"bikeshare_df = pd.read_csv(Path().absolute().parent.absolute().parent/"data/cleaned_data/bikeshare_cleaned.csv")# Create list of bikeshare stations outside of DCnondc_stations = [32256,32251,32237,32241,32210,32225,32259,32223,32209,32240,32239,32245,32220,32214,32219,32224,32217,32213,32239,32246,32247,32250,32248,32246,32228,32215,32238,32252,32249,32260,32234,32231,32235,32255,32200,32208,32201,32211,32227,32207,32229,32221,32206,32233,32205,32204,32205,32203,32206,32222,32230,32232,32600,32602,32603,32608,32605,32604,32607,32609,31948,31904,32606,32601,31921,31905,31902,31901,31976,31036,31977,31900,31920,31049,31037,31926,31919,31035,31973,31069,31023,31022,31021,31019,31020,31094,31092,31079,31030,31029,31080,31093,31014,31062,31077,31073,31024,31040,31028,31017,31924,31027,31947,31066,31075,31949,31053,31971,31067,31058,31923,31063,31068,31951,31945,31095,31006,31005,31091,31004,31936,31071,31090,31950,31064,31935,31011,31012,31009,31944,31052,31010,31959,31916,31088,31960,31956,31910,31083,31915,31087,31085,31913,31915,31970,31969,31906,31098,31048,31081,31084,31082,31974,31930,31932,31953,31942,31967,32406,32423,32415,32407,32405,32401,32400,32405,32404,32413,32418,32410,32403,32408,32421,32402,32417,32422,32420,32414,32412,32416,32059,32061,32026,32011,32049,32082,32058,32025,32001,32058,32082,32024,32043,32036,32012,32034,32035,32050,32056,32426,32425,32424,32426,32085,32094,32089,32093,32091,32090,32087,32088,32086,32092,32022,32066,32064,32062,32065,32073,32063,32084,32054,32051,32040,32046,32029,32055,32002,32021,32003,32048,32013,32000,32008,32028,32027,32053,32039,32057,32078,32075,32077,32076,32079,32080,32074,32081,32032,32047,32044,32017,32007,32009,32023,32033,32016,32004,32005,32072,32041,32052,32071,32038,32037,32045,32067,32069,32068,32018,32253,32236,32243,32258,32216,32212,32218,32019,32411,31929,31914,31907,31903,31958,31933,31041,31042,31968,31044,31045,31955,31046,31047,31099,31043,31097,31931,31918,31086,31927,31966,21943,31963,31952,31964,31962,31908,31072,31941,31961,31928,31054,31033,31059,31057,31061,31056,31055,31909,31912,31065,31032,31074,31078,32419,31957,31954,31946,31972,31060,31938,31013,31002,31007,31000,31003,31096,31070,31039,31034,31025,31038,31026,31050,31940,31089,31031,31051,31937,31016,31018,31039,31015,31917,31076,31939,32409]alt.data_transformers.enable('default',max_rows=None)#### BACKGROUND FOR DC MAP # Define background of Washington D.C.response1 = requests.get('https://raw.githubusercontent.com/arcee123/GIS_GEOJSON_CENSUS_TRACTS/master/11.geojson')background = alt.Chart(alt.Data(values=response1.json()), title="Map of D.C. Bike Lanes, Capital Bikeshare Stations, & Routes in March 2023").mark_geoshape( fill="lightgray", stroke='white', strokeWidth=1 ).encode( ).properties( width=600, height=600 )#### BACKGROUND FOR DC BIKE LANE LOCATIONS # Open GeoJSON file for bicycle laneswithopen(raw_data_dir/'Bicycle_Lanes.geojson') as f: data = json.load(f)# Create background of D.C.background_lanes = alt.Chart(alt.Data(values=data)).mark_geoshape( stroke='#d6a320', strokeWidth=1 ).properties( width=600, height=600 )#### MOUSEOVER SELECTION# Create mouseover selectionselect_station = alt.selection_single( on="mouseover", nearest=True, fields=["start_station_name"], empty='none')#### NETWORK CONNECTIONS FOR MAP # Filter non-DC stationstmp1 = bikeshare_df[~bikeshare_df['start_station_id'].isin(nondc_stations)]tmp1 = tmp1[~tmp1['end_station_id'].isin(nondc_stations)]# Keep only relevant columns and drop duplicates to have one row per routetmp1 = tmp1[['start_station_name', 'start_station_id', 'end_station_name', 'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng']].drop_duplicates()# Define connectionsconnections = alt.Chart(tmp1).mark_rule(opacity=0.35).encode( latitude="start_lat:Q", longitude="start_lng:Q", latitude2="end_lat:Q", longitude2="end_lng:Q").transform_filter( select_station)#### POINTS FOR MAP # Filter non-DC stationstmp2 = bikeshare_df[~bikeshare_df['start_station_id'].isin(nondc_stations)]tmp2 = tmp2[~tmp2['end_station_id'].isin(nondc_stations)]# Create boolean columns for rideable type and membet typetmp2['classic_bike'] = np.where(tmp2['rideable_type'] =='classic_bike', 1, 0)tmp2['electric_bike'] = np.where(tmp2['rideable_type'] =='electric_bike', 1, 0)tmp2['docked_bike'] = np.where(tmp2['rideable_type'] =='docked_bike', 1, 0)# Temporary dataframe showing unique station locations with ride counttmp2 = tmp2[['start_station_name','start_station_id', 'start_lng', 'start_lat', 'ride_id', 'classic_bike', 'electric_bike', 'docked_bike']].groupby(['start_station_name', 'start_station_id','start_lng', 'start_lat']).agg({'ride_id': 'count', 'classic_bike': 'sum', 'electric_bike':'sum', 'docked_bike':'sum'}).reset_index()tmp2.rename(columns= {'ride_id':'count_rides', 'classic_bike': 'count_classic', 'electric_bike': 'count_electric', 'docked_bike': 'count_dock'}, inplace =True)tmp2['color'] ='Bike Station'points = alt.Chart(tmp2).mark_circle().encode( latitude="start_lat:Q", longitude="start_lng:Q", color = alt.Color('color:N', title ="Legend", scale = alt.Scale(domain=['Bike Station', 'Bike Lane'],range=['#962e2ec8', '#d6a320'])), size=alt.Size("count_rides:Q", scale=alt.Scale(range=[15, 250]), legend=None), order=alt.Order("count_rides:Q", sort="descending"), tooltip=[ alt.Tooltip('start_station_id:Q', title='\U0001f4cd Start Station ID'), alt.Tooltip('start_station_name:N', title='\U0001f6e3 Start Station Name'), alt.Tooltip('count_rides:Q', title='\U0001f50d Ride Count'), alt.Tooltip('count_classic:Q', title='\U0001f6b4 Classic Bike Count'), alt.Tooltip('count_electric:Q', title='\U0001f6b5 Electric Bike Count'), alt.Tooltip('count_dock:Q', title='\U0001f6b2 Docked Bike Count') ]).add_selection( select_station)alt.themes.enable('vox') # add theme# Show visualization(background + background_lanes + connections + points).configure_view(stroke=None)