A Visual Journey through the Women’s FIFA World Cup

Authors: Amelia Baier, Sonali Dabhi, Mia Mayerhofer, Tereza Martinkova
Institution: Georgetown University
Check out our GitHub!

Introduction

The 2019 Women’s FIFA World Cup in France saw unprecedented interest in the tournament, with viewership, attendance, and digital engagement reaching record heights across the globe. Four years later, the Women’s FIFA World Cup is set to happen this July in New Zealand and Australia. Despite progress being made, significant inequality persists between men’s and women’s football. Through our data gathering process, we recognized how challenging it is to find concise information and digestible visualizations on women’s football compared with men’s football. Therefore, we intend to highlight this disparity through our first set of visualizations. These visualizations will cover a range of topics including popular phrases over time, overall spending differences between World Cups historically, and more.

As we approach the Women’s World Cup, we are especially motivated and inspired to gain a deeper understanding of various football characteristics surrounding women’s football. We hope to take you on a visual journey into the past and present of women’s football with our second set of visualizations. This set of visualizations will delve into winning trends over time, game statistics on the field, player attributes, and more relating soley to women’s FIFA. By examining these visualizations, we aim to gain important insights into what is in store for the 2023 Women’s World Cup.

The FIFA World Cup: A Uniting Global Event

The FIFA World Cup is typically regarded as the largest sporting events in the world with extensive media attention. What is the public discussing regarding FIFA and the World Cup?

A Game of Inequality: Highlighting Gender Disparity in FIFA

Even today, the inequality between men’s and women’s football persists especially in the areas of funding, coverage, salary, infrastructure, and more. With the visualizations in this section, we hope to not only explore but emphasize this disparity.

Financial Spending: How Men’s and Women’s FIFA Differ

Qatar made headlines with an exorbitant amount spent with the latest World Cup. However, it is common for countries to splurge when selected for the Men’s FIFA World Cup. With the intake in tourism and other factors, hosting the World Cup has its economic advantages. The plot below attempts to map this relationship between the amount a country spent and the total, average, and highest number of people who attended the World Cup.

Code
import pandas as pd
import altair as alt
import warnings
warnings.simplefilter('ignore')
df = pd.read_csv('../data/finan_data_long.csv')
df2 = pd.read_csv('../data/finan_data.csv', index_col=0)
df2['country_year'] = df2['country'] + ',' + df2['year'].astype(str)
df['country_year'] = df['country'] + ',' + df['year'].astype(str)
df3 = df2[df2['country'] != 'Qatar']
selection = alt.selection_single(fields=['country_year'],name='Random')
#color selection for bar plot
list_country = ['Qatar,2022','France,2019','Russia,2018','Canada,2015','Brazil,2014','Germany ,2011','South Africa,2010','China,2007', 'Germany ,2006','USA,2003','S. Korea/Japan,2002', 'USA,1999', 'France,1998','USA,1996']
list_country.reverse()
color = alt.condition(selection,
                      alt.value('#009643'),
                      alt.value('lightgray'), 
                      )
#bar plot for the amount spent (including Qatar)
bar1 = (alt.Chart(df2).mark_bar(size = 20)
       .encode(
        y=alt.Y('amount:Q'),
        x=alt.X('country_year:N', sort = list_country),
        color = color
        ).add_selection(selection)).properties(width=400,height=565, title = {
                                    'text' : ["Spending By Each Country"],
                                    "fontSize": 18,
                                    'subtitle': ["",""], })
#bar customization 
bar1.encoding.x.title = 'Year and Country FIFA Was Held'
bar1.encoding.y.title = 'Money Spent (Billions USD)'

#the second bar plot not including Qatar: 
bar2 = (alt.Chart(df3).mark_bar(size = 20)
       .encode(
        y=alt.Y('amount:Q'),
        x=alt.X('country_year:N', sort = list_country),
        color = color
        ).add_selection(selection)).properties(width=400,height=565, title = {
      "text": ["Spending By Each Country"] ,
      "fontSize": 18,
      "subtitle": ["(Not Including Qatar)","*units different than previous plot"], 
    })

#bar customization 
bar2.encoding.x.title = 'Year and Country FIFA Was Held'
bar2.encoding.y.title = 'Money Spent (Billions)*'

#color selection 2 for the bubble plot
color2 = alt.condition(selection,
                      alt.Color('Gender:N', scale = alt.Scale(range = ['#0099FF', '#FD5109'])),
                      alt.value('lightgray'))

#creating the selection box
category_select = alt.selection_single(fields=['type'], bind=alt.binding_select(options=df['type'].unique()))

#creating the circles: 
plot = alt.Chart(df).mark_circle().encode(
    x=alt.X('x:Q', axis = None),
    y=alt.Y('y:Q', axis = None),
    size=alt.Size('area:Q', scale=alt.Scale(domain=[0, 0.8], range=[0, 130000]), legend=None),
    color=color2,
    tooltip=['country:N', 'year:N', 'attendance:Q'], 
).transform_filter(
     category_select
).properties(
    width=750,
    height=565, 
    title = {
    "text": ["","Attendance for Each Country", ""],
    "fontSize": 16
    }
)

#creating the text so the bubbles are labeled
text = alt.Chart(df).mark_text(align='center', baseline='middle', color = '#F1F0DA', fontSize= 14).encode(
    x=alt.X('x:Q', axis = None),
    y=alt.Y('y:Q', axis = None),
    text='country:N'
).transform_filter(
   category_select
)

source = alt.Chart().mark_text(align='center', baseline='middle', fontSize= 14).encode().properties(
    title = {
    'text': ["", "", "", "", "Data Source: Wikipedia [4] and FIFA Financial Reports [6]"],
    'fontWeight': 'normal',
    'fontSize': 12}
)

bar_comb = alt.concat(bar1, bar2)

# create chart1 without configuration settings - this is so i can work the color selection/linked
#chart1 is the combined plot and text
chart1 = alt.layer(plot, text)

#the final chart, this is so we are will add all the configurations here
chart = alt.VConcatChart(vconcat=[bar_comb, chart1, source], 
                         title=alt.TitleParams(text=['Country Spending Compared to Game Attendance', ""], anchor='middle', fontSize=24),
                         background = '#F1F0DA',
                         center = True,
                         spacing = 10,
                         padding = {"left": 40, "top": 30, "right": 40, "bottom": 60},
                         config={
                             'view': {
                                 'stroke': '#F1F0DA',
                                 'fill': '#F1F0DA'
                             },
                             'axis': {
                                 'grid': False, 
                                 'labelFontSize': 14,
                                 'titleFontSize': 15,
                                 'labelAngle' : -45

                             },
                             'legend': {
                                'titleFontSize' :16,
                                'labelFontSize': 12,
                                'orient': 'bottom'
                             }
                         })
#adding the selection so that way the blob/text will be changed, we add it to the final plot
chart.add_selection(category_select)

Figure 2: A Linked View on FIFA Financial Spending Based on Attendance

The ‘total’ option reflects the sum attendance of all the stadium events, ‘average’ shows the number of people compared to the number of events, and ‘highest’ is the top number of people at one stadium during the overall tournament. Although we see that in the total attendance, Men’s FIFA is higher when looking at the average and highest, Men’s and Women’s FIFA are closer. However, when comparing the amount spent with the average and highest attendance, there is a massive discrepancy between the genders. Although one might attribute this maybe each country wanting to spend only a set amount no matter the gender, it is seen that France held the Men’s FIFA in 1998 and spent 2 billion on preparing for the games, while for the Women’s World Cup in 2019, they spent less than one-fourth of that. The goal of this visualization is so that the viewer can see the gender disparity between the two World Cups. To accurately understand the attendance, hover over the circles to see the number of people.

The Pay Gap: Comparing Men’s and Women’s Player Salaries

One of the first areas that comes to mind when people usually think about gender disparity in not only football, but most other sports as well, is the pay gap. In the visualization below, we hope to bring light to how drastic the difference in salary is between the best men and women’s football players just in the last year. We have also included the table used for the plot below.

Code
# Read in packages
import pandas as pd
import altair as alt
import numpy as np
import warnings
warnings.simplefilter('ignore')
warnings.simplefilter(action='ignore', category=FutureWarning)

# Get the data
salariestable = pd.read_excel("../data/salary_comparison.xlsx")
salariestable.columns = [x.title() for x in list(salariestable.columns)]
# Convert all columns to strings (used for text later)
salariestable = salariestable.astype(str)
salariestable.index += 1 
# Change column names
from IPython.display import Markdown
from tabulate import tabulate
Markdown(tabulate(
  salariestable, 
  headers = ["Player Name", "Salary", "Year", "Gender", "Country"]
))
Table 1: Salaries of the Top Nine Men and Female Football Players 2022-2023
Player Name Salary Year Gender Country
1 Cristiano Ronaldo 200000000 2022 Male Portugal
2 Kylian Mbappé 110000000 2022 Male France
3 Lionel Messi 65000000 2022 Male Argentina
4 Neymar 55000000 2022 Male Brazil
5 Mohamed Salah 35000000 2022 Male Egypt
6 Erling Haaland 35000000 2022 Male Norway
7 Robert Lewandowski 27000000 2022 Male Poland
8 Eden Hazard 27000000 2022 Male Belgium
9 Andres Iniesta 25000000 2022 Male Spain
10 Sam Kerr 513000 2023 Female Australia
11 Alex Morgan 450000 2023 Female United States
12 Magan Rapinoe 447000 2023 Female United States
13 Julie Ertz 430000 2023 Female United States
14 Ada Hegerberg 425000 2023 Female Norway
15 Marta Vieira 400000 2023 Female Brazil
16 Amandine Henry 394000 2023 Female France
17 Wendie Renard 392000 2023 Female France
18 Christine Sinclair 380000 2023 Female Canada
Code
# Load packages
import pandas as pd
import altair as alt
import numpy as np
import warnings
warnings.simplefilter('ignore')
warnings.simplefilter(action='ignore', category=FutureWarning)

# Get the data
salaries = pd.read_excel("../data/salary_comparison.xlsx")
salaries.columns = [x.title() for x in list(salaries.columns)]
female_salaries = salaries[salaries["Gender"] == "Female"]
male_salaries = salaries[salaries["Gender"] == "Male"]

# Prepare the data
female_salaries.reset_index(drop = True, inplace = True)
male_salaries.Salary = male_salaries.Salary.astype(int)
female_salaries.Salary = female_salaries.Salary.astype(int)

# Make our custom color scheme
color_scheme = ["#009643", "#CB4349"]

# Add selection fields
selection = alt.selection_single(fields = ["Player"])

both = alt.Chart(salaries).mark_bar().encode(
    x = alt.X("Salary:Q",
            title = "Salary (USD)",
            sort = "descending",
            scale = alt.Scale(domain = (200500000, 0))),
    y = alt.Y("Player:N", axis = alt.Axis(title = "Player Name", orient = "left"), sort = alt.Sort(field = "Salary", order = "descending")),
    color = alt.Color("Salary:Q", scale = alt.Scale(range = color_scheme))
).properties(
    title = ["Pay Gap Between the Highest Paid Men's and Women's Football Players"], 
    width = 900, 
    height = 225
)

text1 = both.mark_text(
    align = "left",
    baseline = "middle",
    dx = 1
).encode(
    text = "Country:N"
)

both_final = (both + text1)

# Add caption for when salaries were collected
both_final = alt.concat(both_final).properties(title = alt.TitleParams(
        ["**Note: men's salaries are from December 2022 and women's salaries are from March 2023**", " "],
        baseline = "top",
        orient = "top",
        anchor = "end",
        fontWeight = "normal",
        fontSize = 11
    ))


# Men's chart
men_chart = alt.Chart(male_salaries).mark_bar().encode(
    x = alt.X("Salary:Q",
            title = "Salary (USD)",
            sort = "descending",
            scale = alt.Scale(domain = (0, 230000000))),
    y = alt.Y("Player:N", axis = alt.Axis(title = "Player"), sort = alt.Sort(field = "Salary", order = "descending")),
    color = alt.Color("Salary:Q", scale = alt.Scale(range = color_scheme))
).properties(
    title = ["Highest Paid Male Football Players"], 
    width = 390, 
    height = 225
)

textm = men_chart.mark_text(
    align = "right",
    baseline = "middle",
    dx = -3
).encode(
    text = "Country:N"
)

men_chart_final = (men_chart + textm)

women_chart = alt.Chart(female_salaries).mark_bar().encode(
    x = alt.X("Salary:Q",
            title = "Salary (USD)",
            sort = "descending",
            scale = alt.Scale(domain = (600000, 0))),
    y = alt.Y("Player:N", axis = alt.Axis(title = "Player Name", orient = "right"), sort = alt.Sort(field = "Salary", order = "descending")),
    color = alt.Color("Salary:Q", scale = alt.Scale(range = color_scheme))
).properties(
    title = ["Highest Paid Female Football Players (ZOOMED IN)"], 
    width = 390, 
    height = 225
)

textw = women_chart.mark_text(
    align = "left",
    baseline = "middle",
    dx = 3
).encode(
    text = "Country:N"
)

women_chart_final = (women_chart + textw)

# Add caption with data source
women_chart_final = alt.concat(women_chart_final).properties(title = alt.TitleParams(
        [" ", " ", "Data Source: Statista [9] and AS USA [10]"],
        baseline = "bottom",
        orient = "bottom",
        anchor = "end",
        fontWeight = "normal",
        fontSize = 11
    ))

bottom = alt.hconcat(men_chart_final , women_chart_final, spacing = 0)

final = alt.vconcat(both_final, bottom)
final.configure(background = "#F1F0DA").configure_title(fontSize = 15) 

Figure 3: Comparison of Top Men’s and Women’s Player Salaries in the Last Year

From the two plots on top, it is clear that women’s player salaries are drastically less than the men’s. When plotted on the same scale, we are not even able to see any bars show up on the women’s side because their salaries are so low in comparison. To put it into perspective, the highest paid female player, Sam Kerr, made 47 times less this last year than the 9th highest paid male player, Andres Iniesta. While this pay gap has recieved a lot of attention in recent years, there is, unfortunately, much more that needs to be done to bridge this inequality. On other important note to make is how difficult it was to find recent data on female players’ salaries. It is crucial to give more attention to this disparity in order to facilitate meaningful change.

Infrastructure: Exploring Match Locations for Women’s Tournaments

Regarding the many different tournaments leading to the FIFA World Cup, there is a particular trend in the location selection for said tournaments. There are three main tournaments preceding the FIFA World Cup, which is considered to be the main event for women’s football. These three tournaments are:

  • FA Women’s Super League
  • National Women’s Soccer League (NWSL)
  • UEFA Women’s Euro

First, we want to see which countries and specific stadiums are preferred for which competitions. This will act as the basis for the next step, which is identifying which countries have the most considerable amount of players with the highest rankings.

For this visualization, we look at the period between 2018 and 2022. FA Women’s Super Leauge has the most data by having the different stadium locations for 2018, 2019, 2020, and 2021. Already by the name, we can expect the National Women’s Soccer League to have preferred stadiums in the United States when hosting the FIFA World Cup. In fact, the FIFA World Cup location changes the most, as it depends on the host country. For example, in 2019, the FIFA World Cup was hosted in France, with its main stadium being the Parc Olympique Lyonnais, also called the Groupama Stadium in Décines-Charpieu, France.

Code
#IMPORTING LIBRARIES
import plotly.graph_objects as go
import plotly.io as pio
import numpy as np
import pandas as pd
pio.renderers.default = "plotly_mimetype+notebook_connected"

# Importing data
df = pd.read_csv("../data/clean_matches.csv")

# CHANGE DATA TYPE OF YEAR TO STRING SO IT CAN BE ITERABLE
df["year"] = df["year"].astype("str")

# Create subsets of data to create traces
df1_2018 = pd.DataFrame(df[(df['competition_name'] == 'FA Womens Super League') & (df['year'] == '2018')])
df1_2019 = pd.DataFrame(df[(df['competition_name'] == 'FA Womens Super League') & (df['year'] == '2019')])
df1_2020 = pd.DataFrame(df[(df['competition_name'] == 'FA Womens Super League') & (df['year'] == '2020')])
df1_2021 = pd.DataFrame(df[(df['competition_name'] == 'FA Womens Super League') & (df['year'] == '2021')])
df2 = pd.DataFrame(df[df['competition_name']=='NWSL'])
df3 = pd.DataFrame(df[df['competition_name']=='UEFA Womens Euro'])
df4 = pd.DataFrame(df[df['competition_name']=='Womens World Cup'])




# TRACE-1: 
trace1 = (  
    go.Scattergeo(  
        lat=df1_2018['lat'],
        lon=df1_2018['long'],
        text=df1_2018['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#FCC92B',
            opacity=0.8,
            symbol='circle'),
        name = "FA Women's Super League 2018", 
        visible=True))

# TRACE-2: 
trace2 = (  
    go.Scattergeo(  
        lat=df1_2019['lat'],
        lon=df1_2019['long'],
        text=df1_2019['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#009643',
            opacity=0.8,
            symbol='circle'),
        name = "FA Women's Super League 2019",
        visible=True))

# TRACE-3: 
trace3 = (  
    go.Scattergeo(  
        lat=df1_2020['lat'],
        lon=df1_2020['long'],
        text=df1_2020['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#009AFE',
            opacity=0.8,
            symbol='circle'),
        name = "FA Women's Super League 2020", 
        visible=True))


# TRACE-4: 
trace4 = (  
    go.Scattergeo(  
        lat=df1_2021['lat'],
        lon=df1_2021['long'],
        text=df1_2021['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#FF818D',
            opacity=0.8,
            symbol='circle'),
        name = "FA Women's Super League 2021",
        visible=True))

# TRACE-5: 
trace5 = (  
    go.Scattergeo(  
        lat=df2['lat'],
        lon=df2['long'],
        text=df2['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#FF5108',
            opacity=0.8,
            symbol='circle'),
        name = "National Women's Soccer League 2018",
        visible=True))

# TRACE-6: 
trace6 = (  
    go.Scattergeo(  
        lat=df3['lat'],
        lon=df3['long'],
        text=df3['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#8538B1',
            opacity=0.8,
            symbol='circle'),
        name = "UEFA Womens Euro 2022",
        visible=True))

# TRACE-7: 
trace7 = (  
    go.Scattergeo(  
        lat=df4['lat'],
        lon=df4['long'],
        text=df4['text'],
        mode='markers',
        marker=dict(
            size=df['freq'],
            color='#960505',
            opacity=0.8,
            symbol='circle'), 
        name = 'Womens World Cup 2019',
        visible=True))

# COMBINING TRACES
traces = [trace1, trace2, trace3, trace4, trace5, trace6, trace7]

# INITIALIZE GRAPH OBJECT
fig = go.Figure(data=traces)
        
# VARIABLES FOR BUTTON LOCATION
button_height = 0.15
x1_loc = 0.00
y1_loc = 1

# ADD ANNOTATION
fig.add_annotation(
    x=0,
    y=0,
    text='Data Source: StatsBomb [3], Wikipedia [4]',
    showarrow=False,
    visible=True,
)


#DROPDOWN MENUS
fig.update_layout(
    title = 'Location of Stadiums by Tournament and Number of Matches Played', 
    plot_bgcolor='#F1F0DA',
    paper_bgcolor='#F1F0DA',
    #annotations=annotations,
    geo=dict(
        lonaxis=dict(
            range=[-130, 20],  
        ),
        lataxis=dict(
            range=[20,65], 
        ),
        showocean=True,
        oceancolor='#F1F0DA'
    ),
    updatemenus=[
        dict(
            buttons=[
                dict(
                    label="FA Women's Super League",           
                    method="update",                
                     args=[{"visible": [True, True, True, True, False, False, False]}
                    ]
                     ),
                dict(
                    label="National Women's Soccer League 2018",              
                    method="update",          
                     args=[{"visible": [False, False, False, False, True, False, False]}
                         ]
                     ),
                dict(
                    label="UEFA Womens Euro 2022",     
                    method="update",           
                     args=[{"visible": [False, False, False, False, False, True, False]}
                         ]
                     ),    
                dict(
                    label="Womens World Cup 2019",               
                    method="update",          
                     args=[{"visible": [False, False, False, False, False, False, True]}
                        ]
                     )           
            ],
            direction="down",
            showactive=True,  
            pad={"r": 10, "t": 10},  
            x=x1_loc, 
            y=y1_loc,
            xanchor="left",  
            yanchor="top"
        )
    ],
            width = 900
)

# SHOW FIGURE
fig.show()

Figure 4: Stadium Location for Different Women’s Tournaments

Overall, the stadium distribution for all these tournaments is more concentrated in Europe. The one obvious exception, as mentioned earlier, is the National Women’s Soccer League, which has matches strictly in the United States. Therefore, most stadiums are located on the east coast, with the highest number of matches played on the east coast. However, a few stadiums are also on the west coast, and very few are in the Midwest and South.

For the FA Women’s Super League, we can see that different stadiums were selected for various years; however, all of them are located in the United Kingdom, specifically England. The marker size represents the number of matches played at the stadium but for each tournament separately. If we were to compare the FA Women’s Super League with the National Women’s Soccer League, the marker sizes wouldn’t be comparable since the highest number of matches played at any National Women’s Soccer League Stadium is 5, while for the FA Women’s Super League is over 30. Therefore, we added a trace text, allowing the user to hover over any stadium and see how many matches were played there. For the FA Women’s League, we can see the four major football cities with major football clubs in England: London, Manchester, Liverpool, and Birmingham, where Manchester and Liverpool are close enough to overlap on the map. We can also see some stadiums west of London. We would expect these to be the other major cities in England, such as Southampton, Bath, or Bournemouth. Once we hover over these stadiums, we can see that many are in smaller towns that might not be as well known, which is also true for some of the UEFA Women’s Euro Tournament.

Similarly, like for the FA Women’s League and the UEFA Women’s Euro, we can see that the stadium distribution is highly concentrated in England. We can again see clubs in the London and Manchester/Liverpool area; however, yet again, once we hover over the separate stadiums, they are usually located in a smaller city near one of the major cities.

Finally, for the FIFA Women’s World Cup, the host country changes every year, and in 2019, as mentioned before, France hosted the tournament. Therefore, we can see an equal distribution geographically across all of France. However, from the map, we can identify the stadiums around the major French cities, such as Paris in the north of France, Lyon in the central south, and Marseille on the south coastline. West of Marseille, we can also see Montpellier, which has one of France’s bigger stadiums. One of the interesting facts about the Women’s World Cup is that the research shows the Groupama Stadium in Décubes-Charpieu was supposed to be the main stadium for this tournament; however, only three matches have been played there. In comparison, seven matches have been played at the Parc de Princess stadium in Paris. Hovering over the stadiums, we see once again that many of the locations are near major cities, but only a few are actually in the city.

To conclude, when it comes to the location of the stadiums, there is a trend that many of the football stadiums are located close to large cities, but they are not the large stadiums, which will likely host the same tournaments only for men. There is a stigma regarding funding sports events for men and women. Women’s professional sports have been known to be underfunded, which can be one of the reasons for such choices of smaller and less fancy stadiums. However, this hypothesis requires further analysis.

The Rise of Women’s FIFA: Celebrating the Players and their Skill

Now that we have looked more generally into the FIFA World Cup and highlighted the persistent inequality between men’s and women’s football, we would like to focus solely on these women and their teams leading up to the 2023 Women’s World Cup. These visualizations will showcase the performance of the teams and players during past World Cup games, while specifically focusing on the key players set to participate in the upcoming tournament.

Geographical Dynamics: Key Players for the 2023 Women’s World Cup

As previously discussed, the Women’s World Cup is a very high-scale and anticipated event this year. There are many players who are currently at the top of their game, and we would like to showcase these players. In the summer of 2022, ESPN came out with a list of the top 50 players to watch out for in this next World Cup [2]. The table below details the exact list from the article in the order that ESPN ranked the players. The choropleth globe below is based off of this competitive list to gain insight into which countries these key players will be representing this July.

Code
import warnings
warnings.simplefilter('ignore')
warnings.simplefilter(action='ignore', category=FutureWarning)

# Read in packages
import pandas as pd
import numpy as np
import plotly
import requests
import plotly.graph_objects as go
from plotly.offline import plot
import io
# Read in data
top50table = pd.read_csv("../data/top50_women_espn.csv")
# Convert all columns to strings (used for text later)
top50table = top50table.astype(str)
top50table.index += 1 
# Change column names
from IPython.display import Markdown
from tabulate import tabulate
Markdown(tabulate(
  top50table, 
  headers=["Name", "Country", "Club", "Age", "Position", "Rank"]
))
Table 2: ESPN’s List of the Top 50 Upcoming Players
Name Country Club Age Position Rank
1 Alexia Putellas Spain Barcelona 28 Midfielder 22
2 Sam Kerr Australia Chelsea 28 Forward 2
3 Vivianne Miedema Netherlands Arsenal 25 Forward 3
4 Caroline Graham Hansen Norway Barcelona 27 Midfielder 9
5 Pernille Harder Denmark Chelsea 29 Forward 4
6 Catarina Macario United States Lyon 22 Midfielder Not ranked
7 Marie-Antoinette Katoto France Paris Saint-Germain 23 Forward 19
8 Jennifer Hermoso Spain Pachuca 32 Forward 17
9 Aitana Bonmati Spain Barcelona 24 Midfielder Not ranked
10 Ada Hegerberg Norway Lyon 26 Forward Not ranked
11 Wendie Renard France Lyon 31 Defender 11
12 Christiane Endler Chile Lyon 30 Goalkeeper 30
13 Magdalena Eriksson Sweden Chelsea 28 Defender 34
14 Fran Kirby England Chelsea 29 Forward 12
15 Lieke Martens Netherlands Paris Saint-Germain 29 Forward 28
16 Lauren Hemp England Manchester City 21 Forward Not ranked
17 Mapi Leon Spain Barcelona 27 Defender Not ranked
18 Irene Paredes Spain Barcelona 30 Defender Not ranked
19 Rose Lavelle United States OL Reign 27 Midfielder 15
20 Beth Mead England Arsenal 27 Forward Not ranked
21 Debinha Brazil North Carolina Courage 30 Forward 10
22 Lindsey Horan United States Lyon 28 Midfielder 23
23 Stina Blackstenius Sweden Arsenal 26 Forward Not ranked
24 Patri Guijarro Spain Barcelona 24 Midfielder Not ranked
25 Ji So-Yun South Korea Suwon FC 31 Midfielder 18
26 Kadeisha Buchanan Canada Chelsea 26 Defender 33
27 Ellie Carpenter Australia Lyon 22 Defender Not ranked
28 Ashley Lawrence Canada Paris Saint-Germain 27 Defender Not ranked
29 Amandine Henry France Lyon 32 Midfielder 16
30 Kim Little Scotland Arsenal 31 Midfielder 39
31 Lucy Bronze England Barcelona 30 Defender 5
32 Fridolina Rolfo Sweden Barcelona 28 Defender Not ranked
33 Trinity Rodman United States Washington Spirit 20 Forward Not ranked
34 Jessie Fleming Canada Chelsea 24 Midfielder Not ranked
35 Kadidiatou Diani France Paris Saint-Germain 27 Forward 36
36 Alex Morgan United States San Diego Wave 32 Forward 38
37 Sam Mewis United States Kansas City Current 29 Midfielder 1
38 Millie Bright England Chelsea 28 Defender Not ranked
39 Sara Dabritz Germany Lyon 27 Midfielder Not ranked
40 Barbara Bonansea Italy Juventus 31 Forward Not ranked
41 Delphine Cascarino France Lyon 25 Midfielder 21
42 Caroline Weir Scotland Free agent 26 Midfielder 25
43 Asisat Oshoala Nigeria Barcelona 27 Forward 27
44 Jess Fishlock Wales OL Reign 35 Midfielder Not ranked
45 Tabea Wassmuth Germany Wolfsburg 25 Forward Not ranked
46 Lea Schuller Germany Bayern Munich 24 Forward Not ranked
47 Leah Williamson England Arsenal 25 Defender Not ranked
48 Caitlin Foord Australia Arsenal 27 Forward Not ranked
49 Christine Sinclair Canada Portland Thorns 39 Forward Not ranked
50 Jill Roord Netherlands Wolfsburg 25 Midfielder Not ranked
Code
# Read in data
top50 = pd.read_csv("../data/top50_women_espn.csv")
# Convert all columns to strings (used for text later)
top50 = top50.astype(str)
top50.columns = ["name", "country", "club", "age", "position", "rank"]
def is_uk(country):
    if country in ["Scotland", "England", "Wales"]: return "United Kingdom"
    else: return country
top50["country2"] = top50["country"].apply(lambda x: is_uk(x))

# Get summary stats per country
top50_2 = top50.copy()
# Make age column an integer
top50_2["age"] = top50_2["age"].astype(int)
# Make rank column an integer and replace not ranked with 0
def replace_not_ranked(string):
    if string == "Not ranked": return np.nan
    else: return int(string)
top50_2["rank"] = top50_2["rank"].apply(lambda x: replace_not_ranked(x))
stats = top50_2.groupby(["country2"])["age", "rank"].mean().reset_index()
stats["age"] = stats["age"].round(1)
stats["rank"] = stats["rank"].round(1).fillna("Not Ranked")
stats.columns = ["country", "mean_age", "mean_rank"]

# Getting the number of players for each country
country_counts = top50["country2"].value_counts().reset_index()
country_counts.columns = ["country", "count"]
# Get country codes
codes = ["GB", "ES", "US", "FR", "CA", "AU", "NL", "SE", "DE", "NO", "DK", "CL", "BR", "KR", "IT", "NG"]
codes2 = ["GBR", "ESP", "USA", "FRA", "CAN", "AUS", "NDL", "SWE", "DEU", "NOR", "DNK", "CHL", "BRA", "KOR", "ITA", "NGA"]
# Merge with count data frame and stats data frame
merged_counts = pd.merge(country_counts, stats)
merged_counts["code"] = codes
merged_counts["code2"] = codes2
merged_counts = merged_counts.rename(columns = {"count": "counts"})

# Adding a text column with player information for each country
merged_counts["text"] = ""
# Loop through each country in the country count data
for i in range(len(merged_counts)):
    # Set the current country
    country = merged_counts.country[i]
    # Add summary stats for that country
    merged_counts.text[i] = "SUMMARY STATS: " + country.upper()+ "<br>Number in the Top 50: " + str(merged_counts.counts[i]) + "<br>Mean Player Age: " + str(merged_counts.mean_age[i]) + "<br>Mean Player Ranking: " + str(merged_counts.mean_rank[i])

# Make a custom color map
colors = ["#27297F", "#404E7E", "#405C7E", "#406C7E", "#40787E", "#407E78", "#407E6D", "#407E5A", "#437E40"]

# Create the choropleth trace
choropleth_trace = go.Choropleth(
    locations = merged_counts["code2"],
    z = merged_counts["counts"],
    text = merged_counts["text"],
    colorscale = colors,
    autocolorscale = False,
    reversescale = True,
    marker_line_color = "black",
    marker_line_width = 0.5,
    colorbar_title = "Number of Players <br>in the Top 50",
    hovertemplate = "<b>%{text}</b><br>",
    hoverinfo = "name",
)
# Create the figure
globe = go.Figure(data = [choropleth_trace])
globe.update_layout(
    plot_bgcolor = "#F1F0DA",
    paper_bgcolor = "#F1F0DA",
    geo = dict(
        projection_type = "orthographic",
        showland = True,
        landcolor = "#9FC5AA",
        oceancolor = "rgb(152, 190, 217)",
        showcountries = True,
        showlakes = False,
        showocean = True,
        countrycolor = "rgb(30, 56, 38)",
        lakecolor = "rgb(135, 206, 250)",
        bgcolor = "#F1F0DA"
    ),
    title = "Origin Countries of the Top 50 Players to <br>Watch Out for in the 2023 FIFA Women's World Cup",
    title_x = 0.5,
    width = 900,
    height = 700, 
    annotations=[
        dict(
            x = 0.5,
            y = -0.05,
            showarrow = False,
            text = '<br>Data Source: "ESPN FC Women\'s Rank: The 50 best footballers in the world today" [2]',
            xref = "paper",
            yref = "paper",
            font = dict(
                size = 12,
                color = "black"
            )
        )
    ]
)
# Show the figure
globe.show()

Figure 5: Locations of the Key Players Participating in the 2023 World Cup

This globe serves as a choropleth plot showing where the key players to watch out for come from. We can see that many of these top performing players come from the United Kingdom and will play on the Scottish, Welsh, and English teams in the upcoming World Cup. Spain and the United States are also home to several of these key players. By hovering over each country, we are able to see different summary statistics of the key players that call their respective country home. For example, of the six American players in ESPN’s list, the average age was around 26.3; meanwhile, of the six Spanish players in this list, the average age is around 27.5. It will be exciting to see how these key players perform in the upcoming World Cup. We hope to gain insight into how they will perform from the following visualizations.

Player Attributes: Comparing Performance at the 2019 Women’s World Cup

We imagine that taking a closer look into the most recent Women’s World Cup in 2019 will help provide insight into what we can expect this July, with most of the key players in ESPN’s FC rank list set to particpate. We took a subset of the data from the 2019 World Cup with the women who made it on ESPN’s list. A total of 13 players out of the 50 on the list scored during the 2019 World Cup. We will utilize the visualization below to compare their performance metrics when scoring their goals.

Code
# Load packages
import pandas as pd
import altair as alt
import numpy as np
from datetime import datetime, time

# Get the data
stats = pd.read_csv("../data/wwc_2019_match_shots.csv")
top50 = pd.read_csv("../data/top50_women_espn.csv")

# Make our custom color scheme
color_scheme = ['#0099FF', '#009643', '#CB4349', '#FF818C', '#FCC92B', '#FD5109', '#CE6DD3','#FA8F38', '#8538B1', '#4983F8', '#A9DDD6', '#A2F17D', '#0C0582', '#960505']

# Make a new minute timestamp column
stats["timestamp_minute"] = stats["minute"] + stats["second"]/60
stats["timestamp_minute"] = stats["timestamp_minute"].apply(lambda x: round(x, 2))
# Rename player column
stats = stats.rename(columns = {"player.name": "player_name"})
# Standardize Alex Morgan name
stats["player_name"] = stats["player_name"].replace("Alexandra Morgan Carrasco", "Alex Morgan")
# Get the top 50 player names
top50_names = list(top50.Name)
# Filter match data for only the players in top 50
stats_top50 = stats[stats["player_name"].isin(top50_names)].reset_index(drop = True)

# Add selection fields
selection = alt.selection_single(fields = ["player_name"], name = "Random")

## UPPER LEFT BAR CHART - # goals per player
bar1 = (alt.Chart(stats_top50)
 .mark_bar()
 .encode(y = "count()",
         x = alt.X("player_name:N",
         sort = alt.EncodingSortField(field = "timestamp_minute", op = "count", order = "ascending"), axis = alt.Axis(labelAngle = 45)),
         color = alt.condition(selection, alt.Color("player_name:N", scale = alt.Scale(range = color_scheme), title = "Player Name"), alt.value("lightgray"))
).add_selection(selection
).properties(
    title = {"text": "Number of 2019 World Cup Goals by Player"}, 
    width = 450, 
    height = 225
))
bar1.encoding.x.title = "Player Name"
bar1.encoding.y.title = "Number of World Cup Goals"

# LOWER LEFT BAR CHART - # mean timestamp of player's goals
bar2_intitial = (alt.Chart(stats_top50)
 .mark_bar()
 .encode(y = 'mean(timestamp_minute):Q',
         x = alt.X('player_name:N',
         sort = alt.EncodingSortField(field = "timestamp_minute", op = "mean", order = "ascending"), axis = alt.Axis(labelAngle = 45)),
         color = alt.condition(selection, alt.Color("player_name:N", scale = alt.Scale(range = color_scheme), title = "Player Name"), alt.value("lightgray"))
         #color = alt.condition(selection, alt.value("#009643"), alt.value("lightgray")))
).add_selection(selection
).properties(
    title = {"text": "Average Time These Players Scored"}, 
    width = 450, 
    height = 225
))
bar2_intitial.encoding.x.title = "Player Name"
bar2_intitial.encoding.y.title = "Mean Time of Goal (min)"


# HALFTIME LINE AND OVERTIME HORIZONTAL LINES 
halftime_text_h = alt.Chart(pd.DataFrame({"y": [47], "text": ["Halftime"]})).mark_text(color = "#505050",  align = "left").encode(x = alt.value(5), y = "y", text = "text")
overtime_text_h = alt.Chart(pd.DataFrame({"y": [92], "text": ["Overtime"]})).mark_text(color = "#505050",  align = "left").encode(x = alt.value(5), y = "y", text = "text")
halftime_h = alt.Chart(pd.DataFrame({"y": [45]})).mark_rule(color = "#505050").encode(y = "y")
overtime_h = alt.Chart(pd.DataFrame({"y": [90]})).mark_rule(color = "#505050").encode(y = "y")
 
bar2 = alt.layer(bar2_intitial, halftime_h, overtime_h, halftime_text_h, overtime_text_h)

# Add caption with data source
bar2 = alt.concat(bar2).properties(title = alt.TitleParams(
        [" ", " ", "Data Source: ESPN FC Women's Rank 2022 Article [2] and Statsbomb [3]"],
        baseline = "bottom",
        orient = "bottom",
        anchor = "end",
        fontWeight = "normal",
        fontSize = 11
    ))

# HALFTIME LINE AND OVERTIME VERTICAL LINES 
halftime_text_v = alt.Chart(pd.DataFrame({"x": [47], "text": ["Halftime"]})).mark_text(color = "#505050",  align = "left").encode(y = alt.value(5), x = "x", text = "text")
overtime_text_v = alt.Chart(pd.DataFrame({"x": [92], "text": ["Overtime"]})).mark_text(color = "#505050",  align = "left").encode(y = alt.value(5), x = "x", text = "text")
halftime_v = alt.Chart(pd.DataFrame({"x": [45]})).mark_rule(color = "#505050").encode(x = "x")
overtime_v = alt.Chart(pd.DataFrame({"x": [90]})).mark_rule(color = "#505050").encode(x = "x")
 

## SCATTER PLOT 1
scatter1_i = (alt.Chart(stats_top50)
 .mark_circle(size = 45)
 .encode(x = alt.X("timestamp_minute:Q"),
         y = "TimeInPoss:Q",
         color = alt.condition(selection, alt.Color("player_name:N", scale = alt.Scale(range = color_scheme), title = "Player Name"), alt.value("lightgray"))
  ).properties(
    title = {"text": "Possession Time Before Scoring by Player and Game Minute"}, 
    width = 450, 
    height = 167
))
scatter1_i.encoding.x.title = "Game Minute"
scatter1_i.encoding.y.title = "Time in Possession (ms)"

scatter1 = alt.layer(scatter1_i, halftime_v, overtime_v, halftime_text_v, overtime_text_v)


## SCATTER PLOT 2
scatter2_i = (alt.Chart(stats_top50)
 .mark_circle(size = 45)
 .encode(x = alt.X("timestamp_minute:Q"),
         y = "avevelocity:Q",
         color = alt.condition(selection, alt.Color("player_name:N", scale=alt.Scale(range = color_scheme)), alt.value("lightgray"))
).properties(
    title = {"text": "Average Ball Velocity When Scoring by Player and Game Minute"}, 
    width = 450, 
    height = 167
))
scatter2_i.encoding.x.title = "Game Minute"
scatter2_i.encoding.y.title = "Average Ball Velocity (m/s)"

scatter2 = alt.layer(scatter2_i, halftime_v, overtime_v, halftime_text_v, overtime_text_v)

## SCATTER PLOT 3
scatter3_i = (alt.Chart(stats_top50)
 .mark_circle(size = 45)
 .encode(x = alt.X("timestamp_minute:Q"),
         y = "DistToGoal:Q",
         color = alt.condition(selection, alt.Color("player_name:N", scale=alt.Scale(range = color_scheme)), alt.value("lightgray"))
).properties(
    title = {"text": "Distance to Goal When Scoring by Player and Game Minute"}, 
    width = 450, 
    height = 167
))
scatter3_i.encoding.x.title = "Game Minute"
scatter3_i.encoding.y.title = "Distance to Goal (m)"

scatter3 = alt.layer(scatter3_i, halftime_v, overtime_v, halftime_text_v, overtime_text_v)


bar1 & bar2 | scatter1 & scatter2 & scatter3

chart1 = alt.vconcat(bar1 , bar2)
chart2 = alt.vconcat(scatter1 , scatter2, scatter3)
alt.hconcat(chart1, chart2, spacing = 5).configure(background = "#F1F0DA").configure_title(fontSize = 15) 

Figure 7: A Linked View of Key Player Performance Metrics at the 2019 World Cup

The first bar plot shows the total number 2019 World Cup goals scored for each of the players. We can see that the top three players with the most World Cup goals in 2019 were Vivianne Miedema, Alex Morgan, and Caroline Graham Hansen. The second bar plot shows the average time these players scored their goals by game minute. We can see which players scored earlier in the game and which players scored later in the game on average. Jill Rood, for example, scored her goals towards the end of the games.

The three scatter plots on the right are connected to these barplots. One can click on a player’s bar in the barplots on the left side and that player’s information will be highlighted in the three scatter plots on the right. These scatter plots show each player’s time in possession before their goals, the ball’s average velocity when scoring their goals, and the player’s distance to goal when scoring throughout a standard game’s time period (including overtime). To reiterate, this data is based off of the goals scored in the 2019 Women’s World Cup.

Field Stats: Shots & Outcomes at the 2019 Women’s World Cup Final

Below we see an innovative view of a modified scatter plot using matplotlib and plotly (not plotly express) where we creatively converted the coordinate plane into a football field. This view allows the audience to take a bird’s eye view into the actual shots taken during the last Women’s FIFA World Cup Final. Furthermore, the plot highlights how the offensive strategy taken by the US players contributed to their winning success compared to the shots taken by the Netherlands team.

Now that we had the opportunity to view some player’s overall game statistics, especially Alex Morgan’s dominant presence, let’s look at some field statistics that specifically happened at the women’s FIFA World Cup Final in 2019 between the United States and Netherlands. The plot below shows a soccer field with the geo locations of shots taken amongst all players. The left side represents shots taken by players on the Netherlands team while the right side represents shots taken by US players. The outcome of those shots are also viewable by color. Options include “Blocked”, a shot that was stopped from continuing by a defender. “Goal”, a shot that was deemed to cross the goal-line by officials. “Off T”, a shot that’s initial trajectory ended outside the posts. The last outcome is “Saved”, a shot that was saved by the opposing team’s keeper. By hovering over the shots on the field, you can see even more statistics such as the player’s name, time the ball was in their possession, the minute of the game, the body part used to take that shot, and the distance they were from the goal.

Code
import plotly.graph_objects as go
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import seaborn as sns
import pandas as pd
 
# read in the data
shots = pd.read_csv("../data/shots.csv")
net_shots = pd.read_csv("../data/net_shots.csv")

# subset the data
us_shots = shots[['location.x','location.y','shot.outcome.name','player.name','shot.body_part.name','TimeInPoss','DistToGoal','minute']]
net_shots = net_shots[['location.x','location.y','shot.outcome.name','player.name','shot.body_part.name','TimeInPoss','DistToGoal','minute']]

# subtract the x-coordinates from the maximum x-coordinate value to get mirror image on field
net_shots['location.x'] = 120 - net_shots['location.x']

# join two dataframes
data_merge = pd.concat([us_shots,net_shots],axis=0) 

# round time in poss
data_merge['TimeInPoss'] = data_merge['TimeInPoss'].round()
data_merge['DistToGoal'] = data_merge['DistToGoal'].round()

#009643 (green) #FF818C (pink) #F1F0DA (cream) #0099FF (blue) #FCC92B (yellow) #FD5109 (orange) 

fig = go.Figure()
#create traces
fig.add_trace(
    go.Scatter(
        x=data_merge[data_merge['shot.outcome.name'] == 'Goal']['location.x'],
        y=data_merge[data_merge['shot.outcome.name'] == 'Goal']['location.y'],
        name='Goal',
        mode='markers',
        marker_color='#FF818C',
        customdata=data_merge[data_merge['shot.outcome.name'] == 'Goal'][['player.name', 'TimeInPoss', 'DistToGoal','minute']],
        text=data_merge[data_merge['shot.outcome.name'] == 'Goal']['shot.body_part.name'],
        hovertemplate='<br><br>Player: %{customdata[0]}<br>Time in Possession (ms): %{customdata[1]}<br>Distance to goal (m): %{customdata[2]}<br>Body part: %{text}<br>Game minute: %{customdata[3]}'
    )
)
fig.add_trace(go.Scatter(
    x=data_merge[data_merge['shot.outcome.name'] == 'Off T']['location.x'], 
    y=data_merge[data_merge['shot.outcome.name'] == 'Off T']['location.y'],    
    name='Off T',
    mode='markers',
    marker_color='#0099FF',
    customdata=data_merge[data_merge['shot.outcome.name'] == 'Off T'][['player.name', 'TimeInPoss', 'DistToGoal','minute']],
    text=data_merge[data_merge['shot.outcome.name'] == 'Off T']['shot.body_part.name'],
    hovertemplate='<br><br>Player: %{customdata[0]}<br>Time in Possession (ms): %{customdata[1]}<br>Distance to goal (m): %{customdata[2]}<br>Body part: %{text}<br>Game minute: %{customdata[3]}'
    ))
fig.add_trace(go.Scatter(
    x=data_merge[data_merge['shot.outcome.name'] == 'Blocked']['location.x'], 
    y=data_merge[data_merge['shot.outcome.name'] == 'Blocked']['location.y'],    
    name='Blocked',
    mode='markers',
    marker_color='#FCC92B',
    customdata=data_merge[data_merge['shot.outcome.name'] == 'Blocked'][['player.name', 'TimeInPoss', 'DistToGoal','minute']],
        text=data_merge[data_merge['shot.outcome.name'] == 'Blocked']['shot.body_part.name'],
        hovertemplate='<br><br>Player: %{customdata[0]}<br>Time in Possession (ms): %{customdata[1]}<br>Distance to goal (m): %{customdata[2]}<br>Body part: %{text}<br>Game minute: %{customdata[3]}'
    ))
fig.add_trace(go.Scatter(
    x=data_merge[data_merge['shot.outcome.name'] == 'Saved']['location.x'], 
    y=data_merge[data_merge['shot.outcome.name'] == 'Saved']['location.y'],    
    name='Saved',
    mode='markers',
    marker_color='#FD5109',
    customdata=data_merge[data_merge['shot.outcome.name'] == 'Saved'][['player.name', 'TimeInPoss', 'DistToGoal','minute']],
        text=data_merge[data_merge['shot.outcome.name'] == 'Saved']['shot.body_part.name'],
        hovertemplate='<br><br>Player: %{customdata[0]}<br>Time in Possession (ms): %{customdata[1]}<br>Distance to goal (m): %{customdata[2]}<br>Body part: %{text}<br>Game minute: %{customdata[3]}'
    ))


# update traces
fig.update_traces(mode='markers', marker_size=12,marker_line_width=0.5)

# update layout
fig.update_layout(plot_bgcolor='#009643', 
                xaxis=dict(gridcolor='#009643',range=[0, 120],showticklabels=False), 
                height=600,
                width=800,
                paper_bgcolor='#F1F0DA',
                yaxis=dict(range=[0, 80], gridcolor='#009643',showticklabels=False),
                annotations=
                [dict(x=0.5,y=1.15,xref='paper',yref='paper',text='Shot Locations and Outcomes',showarrow=False,
                font=dict(family='Arial',size=18,color='black')),
                dict(x=0.5,y=1.08,xref='paper',yref='paper',text="Women's World Cup 2019 - USA vs Netherlands",showarrow=False,
                font=dict(family='Arial',size=14,color='black')),
                dict(x=0.1,y=-0.10,xref='paper',yref='paper',text='Netherlands Team',showarrow=False,
                font=dict(family='Arial',size=18,color='black')),
                dict(x=0.9,y=-0.10,xref='paper',yref='paper',text='United States Team',showarrow=False,
                font=dict(family='Arial',size=18,color='black')),
                dict(x=1.16,y=-0.18,xref='paper',yref='paper',text='Data source: StatsBomb [3]',showarrow=False,
                font=dict(family='Arial',size=10,color='black'))
                ])

# add shapes
fig.add_shape(type='line', x0=60, y0=0, x1=60, y1=80, line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=0, y0=0, x1=120, y1=80,
              line=dict(color='white', width=0.5))
fig.add_shape(type='rect',
              x0=0, y0=0, x1=60, y1=80,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=0, y0=18, x1=18, y1=62,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=102, y0=18, x1=120, y1=62,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=0, y0=30, x1=6, y1=50,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=114, y0=30, x1=120, y1=50,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=0, y0=36, x1=0.5, y1=44,
              line=dict(color='white', width=0.6))
fig.add_shape(type='rect',
              x0=120, y0=36, x1=119.5, y1=44,
              line=dict(color='white', width=0.6))

fig.add_shape(type='circle',
              x0=49, y0=29, x1=71, y1=50,
              line=dict(color='white', width=0.6)
             )

fig.show()

Figure 8: Overview of Shots and Outcomes at the 2019 Women’s World Cup Final

Overall, the plot clearly shows that more shots were taken by US players than Netherland’s players. You can also see that no shots on the Netherland’s side resulted in a goal while two shots taken by US players resulted in a goal. This is in line with the final score of the game 2-0 (US vs Netherlands).

An interesting observation was that the majority of the Netherland’s players who took a shot had very little time with the ball compared to US players. On average, Netherlands players who took a shot were in possession of the ball for only 9 ms while US players were in possession of the ball for 23 ms on average. Most notably, Alex Morgan who took a shot that ended up outside the posts had the longest possession of the ball. Because Alex Morgan is a prominent US soccer player, visualizing her moves on the field is key. For this game, she shot with her left foot every time, and always in the second half of the game. In fact, although no shot of hers results in a goal, Alex Morgan made up over a quarter of all shots taken during the final; a very high percentage and indicative of her impressive skills on the field.

With this plot in mind, we are excited to see what the 2023 World Cup final has in store. Which team will ultimately take the crown as the 2023 Women’s FIFA champions?

Conclusion

Summary of Plots

We began this analysis by looking into which terms are the most associated with the FIFA World Cup. However, the main terms appear to be associated more with men’s FIFA. Therefore, we decided to explore the differences between funding men’s and women’s FIFA World Cup based on attendance. Although total men’s FIFA attendance is higher, men’s and women’s FIFA are closer when looking at the average and highest attendance. Based on average and highest attendance, there exists a massive discrepancy between the genders. Furthermore, we wanted to see if the financial difference reflected in the tournament over the years. Thus, Figure 3 depicts individual teams’ development and potential for growth. Knowing the rankings over the years, we looked into how the financial discrepancies affect the venues for different tournaments. Based on the four main women’s tournaments, it was found that when it comes to the location of the stadiums, there is a trend that many of the football stadiums that host women’s tournaments are located close to large cities. This is likely due to the lack of funds to afford the venue since we know from Figure 2 that it is not due to low attendance.

Looking back at our first figure, the plot showed Mbappé to be one of the terms most associated with FIFA, which according to Forbes, is the highest-paid football player in the world 1. Therefore, we decided to look into the key female players heading into this next World Cup. Figure 5 shows an interactive globe to discover where the key female players come from. Alex Morgan, who is ranked 38th and plays for the United States, has become very well known due to her talent in recent years. We can explore more of her skills in Figure 6 by seeing how many goals she scored in the 2019 World Cup, when she typically scores her goals throughout a game, and more.

Final Thoughts

Despite the women’s World Cup being the world’s largest women’s sporting tournament, finding concise information and digestible visualizations is extremely challenging compared to men’s football. Therefore, this analysis hopes to catalyze an investigation into the world of women’s football and visually explore its essential aspects. All of the visualizations can be helpful tools for team managers, betting agencies, or football fans that want to explore previous trends and predict the trends for the women’s FIFA World Cup 2023. We explored with this analysis the strengths and weaknesses of different teams, players, and tournaments, as well as the overall financial discrepancy in the men’s and women’s FIFA tournaments. We hope that your main takeaway from this analysis is how underrated, underfunded, and under-analyzed women’s football. In the next couple of years, we hope that the coverage and spending for women’s football will increase. Currently, it does seem as if FIFA is trying to increase coverage of the women’s World Cup. In fact, in 2021, they finally published their first ever analysis of the landscape that is women’s football 7. With trends showing the growth that women’s football has made in the past 20+ years, one can hope that there will no longer exist a discrepancy between the men’s and women’s football in the future.

References

[1] Birnbaum, J. (2022, October 7). The World’s Highest-Paid Soccer Players 2022: Kylian Mbappé Claims No. 1 While Erling Haaland Debuts. Forbes. https://www.forbes.com/sites/justinbirnbaum/2022/10/07/the-worlds-highest-paid-soccer-players-2022-kylian-mbapp-claims-no-1-while-erling-haaland-debuts/?sh=360a1ee8629d

[2] “ESPN FC Women’s Rank: The 50 Best Footballers in the World Today.” ESPN, ESPN Internet Ventures, 27 June 2022, https://www.espn.com/soccer/blog-espn-fc-united/story/4685632/espn-fc-womens-rank-the-50-best-footballers-in-the-world-today.

[3] StatsBomb. (2023, April 11). StatsBomb data: Event data. StatsBomb. Retrieved April 24, 2023, from https://statsbomb.com/what-we-do/hub/free-data/

[4] Wikipedia. (2023, April 16). Latitude and longitude of cities. Wikipedia. Retrieved April 24, 2023, from https://en.wikipedia.org/wiki/Geographic_coordinate_system

[5] Twitter. (2019). Get Twitter API. Twitter. Retrieved April 24, 2023, from https://api.twitter.com/2/tweets

[6] FIFA. (2022). Finances. Retrieved April 24, 2023, from https://www.fifa.com/about-fifa/organisation/finances

[7] FIFA. (2022, March 22). FIFA publishes first-ever comprehensive analysis of the elite women’s football landscape. FIFA.com. https://www.fifa.com/media-releases/fifa-publishes-first-ever-comprehensive-analysis-of-the-elite-women-s-football-l

[8] Jürisoo, M. (2022, August 1). Women’s international football results. Kaggle. Retrieved April 24, 2023, from https://www.kaggle.com/datasets/martj42/womens-international-football-results

[9] Published by Statista Research Department, & 18, J. (2023, January 18). Highest paid footballers worldwide 2022. Statista. Retrieved May 2, 2023, from https://www.statista.com/statistics/266636/best-paid-soccer-players-in-the-2009-2010-season/#:~:text=As%20of%20December%202022%2C%20Cristiano,dollars%20in%20off%2Dfield%20income.

[10] Gorostieta, D. (2023, March 14). Who are the highest-paid women’s soccer players in the world? Diario AS. Retrieved May 2, 2023, from https://en.as.com/soccer/who-are-the-highest-paid-womens-soccer-players-in-the-world-n/

Appendix

Our Color Scheme

We crafted our color palette based of the 2023 FIFA World Cup colors seen at the top of this webpage. Below is the exact color palette used throughout our visualizations. We picked and chose colors from this palette on a case by case basis depending on whether a plot required a discrete or continuous color palette in addition to how many colors where needed overall for a plot.