Code
GitHub Repository
Please visit our GitHub Repository for all code, data, and website references.
File Structure
Code folder
This section covers the files present in the code/
directory.
avgPF-PAPlotlyScriptwColors.ipynb
involves the creation of the plotly interactive scatterplot showing points for/points against by NFL season since 1999. This script uses theavgPFavgPA1999-2021wColors.csv
data set.avgPtsDt.Rmd
constitutes the creation of the interactive datatable (using the R packageDT
) using theavgPFavgPA1999-2021wColors.csv
data set.Combine_and_Draft_Cleaning.ipynb
includes the intial data cleaning for the raw NFL combine data sourced from thenfl-data-py
package.Combine_and_Draft.ipynb
constitutes more cleaning and processing of data fromCombine_and_Draft_Cleaning.ipynb
as well as the creation of the linked Altair charts for draft position and combine data.player-stats.qmd
involves the processing/cleaning of receiving EPA data and the creation of the heatmap as shown in the final project website. The data used in this file were acquired via thenflreadr
package, and the resulting heatmap is saved to the visualizations directory in the website folder.radar-plot.qmd
involves the processing/cleaning of combine data and the creation of an interactive radar chart using data cleaned in a separate file in this repository, namelycombine_10yr.csv
. This visualization is saved in the visualizations directory in the website folder.timeseries-cleaning.Rmd
covers the processing and cleaning of cumulative offensive efficiency by play over the course of the 2022 NFL season. These data were acquired via thenflreadr
package, and the resulting modified csv file is saved in thedata/
directory astimeseries_logos.csv
andtimeseries_epa.csv
.timeseries-vis.ipynb
involves the creation of a linked Altair time series plot that visualizes cumulative offensive efficiency by team by play over the course of the 2022 NFL season using data cleaned intimeseries-cleaning.Rmd
. The resulting visualization is saved in the visualizations directory in the website folder.win-total-cleaning.qmd
covers the importation, processing, cleaning, and visualization of NFL win total data over the past ~20 seasons (2003-present). These data were acquired via thenflreadr
package, and the cleaned data is saved aswin-totals.csv
in thedata/
directory. THe visualization created in this file is an interactive plotly line plot that changes dependent on the user inputted NFL division (AFC East, NFC South, etc.) and is saved in the visualizations directory in the website folder.
Data folder
This section covers the files present in the data/
directory. Many of these files are cleaned/subsetted versions of data pulled from the nflreadr
or nfl-data-py
packages.
avgPFavgPA1999-2021wColors.csv
contains the cumulative points for and points against totals for each team of each season from 1999 to 2021. These data are a subsetted version of data pulled from thenflverse
GitHub data repository.combine_10yr.csv
contains all combine data (e.g., player names, measurables, performance metrics) for any participating players in the past 10 NFL seasons. The raw combine data are cleaned in theCombine_and_Draft_Cleaning.ipynb
, and these data are further cleaned inCombine_and_Draft.ipynb
file and saved ascombine_clean_10yr.csv
combine_clean_10yr.csv
is the cleaned combine data fromcombine_10yr.csv
processed inCombine_and_Draft.ipynb
draft_10yr.csv
contains all NFL draft data (e.g., draft position, teams, players) for the past 10 NFL drafts. The raw data sourced from thenfl-data-py
package are cleaned in theCombine_and_Draft_Cleaning.ipynb
file.ids_10yr.csv
contains identifying information for collegiate players entering the draft for the past 10 NFL drafts. The raw data sourced from thenfl-data-py
package are cleaned in theCombine_and_Draft_Cleaning.ipynb
file.snap_10yr.csv
contains NFL snap data (i.e., how many snaps a player has in a given game) for the past 10 NFL seasons. The raw data sourced from thenfl-data-py
package are cleaned in theCombine_and_Draft_Cleaning.ipynb
file.timeseries_epa.csv
contains the EPA added/subtracted for each play for each NFL team over the 2022 regular season. The raw play-by-play (pbp
) data were pulled and cleaned fromnflreadr
package in thetimeseries-epa-cleaning.Rmd
file, and saved astimeseries_epa.csv
timeseries_logos.csv
contains the logo information (e.g., team picture URLs, colors, etc.) that are later used in the EPA time series plot. The raw team description (teams
) data were pulled and cleaned fromnflreadr
package in thetimeseries-epa-cleaning.Rmd
file, and saved astimeseries_logos.csv
win-totals.csv
contain the win/loss/tie totals as well as playoff outcomes and division rankings for each NFL team since 2003 (when divisions were realigned to their present status). The raw data sourced from thenfl-data-py
package are cleaned and plotted in thewin-total-cleaning.qmd
file, and the cleaned data are additionally used for plots in thewin-totals-playoffs.ipynb
file.
Image folder
This section covers the files present in the img/
directory.
ANLY-503-Group23-Poster.pdf
is our group’s poster that we presented on 05/01/2023.
Website folder
This section covers the files present in the website/
directory.
The
_book/
directory contains the rendered website itself, including theindex.html
,coding.html
, anddata.html
files which are described later on in this document. The website is too large to have resources embedded within the index.html file, so there are additional resources in this directory.custom.scss
comprises some basic stylistic changes and create the theme of the website, mainly using color.index.ipynb
constitutes the majority of the website, covering the actual visual analysis and hosts all of the visualizations of interest. This file is rendered asindex.html
in the_book
directory.code.ipynb
includes links to the project github repository and a copy of this document. This file is rendered ascode.html
in the_book
directory.data.ipynb
includes a brief description of the data used in this project as well as the packages of interest (used to pull our raw data). This file is rendered asdata.html
in the_book
directory.references.bib
contains all citations used in this project.
Visualizations folder
This section covers the files present in the website/visualizations/
directory. This folder contains all visualizations used in this project.
avgPFavgPAPlotlyScriptColors.html
is the interactiveplotly
scatterplot of points for/points against from 1999-2021.chart1.html
is the linked viewaltair
draft/combine chart.combine-radar-chart.html
is theplotly
interactive combine event percentile by position radar charts (AKA spider plots).player-stats.png
is the staticggplot2
generated receiving EPA heatmap.ptsDT.html
is the interactive data table for points for/points againsttimeseries-epa-vis.html
is the linkedaltair
view of cumulative offensive EPA by offensive plays.win-total-plot.html
is the interactiveplotly
line chart showcasing NFL win totals by division since 2003.win-totals-playoffs.html
showcases playoff outcomes given a regular season win total (e.g., what % of 14 win regular season teams made the playoffs/won the superbowl)