Code

Author
Affiliation

Landon Carpenter, Matthew Moriarty
Alex Pattarini, Michael Varnerin

Georgetown University
M.S. Data Science and Analytics

GitHub Repository

Please visit our GitHub Repository for all code, data, and website references.

File Structure

Code folder

This section covers the files present in the code/ directory.

  • avgPF-PAPlotlyScriptwColors.ipynb involves the creation of the plotly interactive scatterplot showing points for/points against by NFL season since 1999. This script uses the avgPFavgPA1999-2021wColors.csv data set.

  • avgPtsDt.Rmd constitutes the creation of the interactive datatable (using the R package DT) using the avgPFavgPA1999-2021wColors.csv data set.

  • Combine_and_Draft_Cleaning.ipynb includes the intial data cleaning for the raw NFL combine data sourced from the nfl-data-py package.

  • Combine_and_Draft.ipynb constitutes more cleaning and processing of data from Combine_and_Draft_Cleaning.ipynb as well as the creation of the linked Altair charts for draft position and combine data.

  • player-stats.qmd involves the processing/cleaning of receiving EPA data and the creation of the heatmap as shown in the final project website. The data used in this file were acquired via the nflreadr package, and the resulting heatmap is saved to the visualizations directory in the website folder.

  • radar-plot.qmd involves the processing/cleaning of combine data and the creation of an interactive radar chart using data cleaned in a separate file in this repository, namely combine_10yr.csv. This visualization is saved in the visualizations directory in the website folder.

  • timeseries-cleaning.Rmd covers the processing and cleaning of cumulative offensive efficiency by play over the course of the 2022 NFL season. These data were acquired via the nflreadr package, and the resulting modified csv file is saved in the data/ directory as timeseries_logos.csv and timeseries_epa.csv.

  • timeseries-vis.ipynb involves the creation of a linked Altair time series plot that visualizes cumulative offensive efficiency by team by play over the course of the 2022 NFL season using data cleaned in timeseries-cleaning.Rmd. The resulting visualization is saved in the visualizations directory in the website folder.

  • win-total-cleaning.qmd covers the importation, processing, cleaning, and visualization of NFL win total data over the past ~20 seasons (2003-present). These data were acquired via the nflreadr package, and the cleaned data is saved as win-totals.csv in the data/ directory. THe visualization created in this file is an interactive plotly line plot that changes dependent on the user inputted NFL division (AFC East, NFC South, etc.) and is saved in the visualizations directory in the website folder.

Data folder

This section covers the files present in the data/ directory. Many of these files are cleaned/subsetted versions of data pulled from the nflreadr or nfl-data-py packages.

  • avgPFavgPA1999-2021wColors.csv contains the cumulative points for and points against totals for each team of each season from 1999 to 2021. These data are a subsetted version of data pulled from the nflverse GitHub data repository.

  • combine_10yr.csv contains all combine data (e.g., player names, measurables, performance metrics) for any participating players in the past 10 NFL seasons. The raw combine data are cleaned in the Combine_and_Draft_Cleaning.ipynb, and these data are further cleaned in Combine_and_Draft.ipynb file and saved as combine_clean_10yr.csv

  • combine_clean_10yr.csv is the cleaned combine data from combine_10yr.csv processed in Combine_and_Draft.ipynb

  • draft_10yr.csv contains all NFL draft data (e.g., draft position, teams, players) for the past 10 NFL drafts. The raw data sourced from the nfl-data-py package are cleaned in the Combine_and_Draft_Cleaning.ipynb file.

  • ids_10yr.csv contains identifying information for collegiate players entering the draft for the past 10 NFL drafts. The raw data sourced from the nfl-data-py package are cleaned in the Combine_and_Draft_Cleaning.ipynb file.

  • snap_10yr.csv contains NFL snap data (i.e., how many snaps a player has in a given game) for the past 10 NFL seasons. The raw data sourced from the nfl-data-py package are cleaned in the Combine_and_Draft_Cleaning.ipynb file.

  • timeseries_epa.csv contains the EPA added/subtracted for each play for each NFL team over the 2022 regular season. The raw play-by-play (pbp) data were pulled and cleaned from nflreadr package in the timeseries-epa-cleaning.Rmd file, and saved as timeseries_epa.csv

  • timeseries_logos.csv contains the logo information (e.g., team picture URLs, colors, etc.) that are later used in the EPA time series plot. The raw team description (teams) data were pulled and cleaned from nflreadr package in the timeseries-epa-cleaning.Rmd file, and saved as timeseries_logos.csv

  • win-totals.csv contain the win/loss/tie totals as well as playoff outcomes and division rankings for each NFL team since 2003 (when divisions were realigned to their present status). The raw data sourced from the nfl-data-py package are cleaned and plotted in the win-total-cleaning.qmd file, and the cleaned data are additionally used for plots in the win-totals-playoffs.ipynb file.

Image folder

This section covers the files present in the img/ directory.

  • ANLY-503-Group23-Poster.pdf is our group’s poster that we presented on 05/01/2023.

Website folder

This section covers the files present in the website/ directory.

  • The _book/ directory contains the rendered website itself, including the index.html, coding.html, and data.html files which are described later on in this document. The website is too large to have resources embedded within the index.html file, so there are additional resources in this directory.

  • custom.scss comprises some basic stylistic changes and create the theme of the website, mainly using color.

  • index.ipynb constitutes the majority of the website, covering the actual visual analysis and hosts all of the visualizations of interest. This file is rendered as index.html in the _book directory.

  • code.ipynb includes links to the project github repository and a copy of this document. This file is rendered as code.html in the _book directory.

  • data.ipynb includes a brief description of the data used in this project as well as the packages of interest (used to pull our raw data). This file is rendered as data.html in the _book directory.

  • references.bib contains all citations used in this project.

Visualizations folder

This section covers the files present in the website/visualizations/ directory. This folder contains all visualizations used in this project.

  • avgPFavgPAPlotlyScriptColors.html is the interactive plotly scatterplot of points for/points against from 1999-2021.

  • chart1.html is the linked view altair draft/combine chart.

  • combine-radar-chart.html is the plotly interactive combine event percentile by position radar charts (AKA spider plots).

  • player-stats.png is the static ggplot2 generated receiving EPA heatmap.

  • ptsDT.html is the interactive data table for points for/points against

  • timeseries-epa-vis.html is the linked altair view of cumulative offensive EPA by offensive plays.

  • win-total-plot.html is the interactive plotly line chart showcasing NFL win totals by division since 2003.

  • win-totals-playoffs.html showcases playoff outcomes given a regular season win total (e.g., what % of 14 win regular season teams made the playoffs/won the superbowl)