2024 ANNUAL SHOWCASE

We are excited to announce our annual showcase schedule, which will feature 10 (!!) amazing sports analytics projects. Our showcase will take place on March 27th, from 6-9 pm at the Bahen Centre for Information Technology, at 40 St George St, Toronto, Ontario in room BA2145. The presentations are listed in order below:

SMT Data Challenge: This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs. This project won $1000 as winners of the Graduate Division. Authors: David Awosoga, Aaron White, Daniel Hocevar, Jaden Majumdar. University of Toronto, University of Waterloo.

2024 Big Data Cup: Zone Entries. Authors: Zeke Weng, Samarth Sharma, Daniel Du, Alexander Khan. University of Toronto

2024 NFL Big Data Bowl: Every Step You Take: Measuring A Defender’s Moves. This project won $5000 as runners-up! Authors: Hassaan Inayatali, Aaron White, Daniel Hocevar, Jaden Majumdar. University of Toronto.

2023 Linköping Hockey Analytics Conference: Comparing Defensive Pressures Using Possession Retention Probability and Expected Goals. This project was a finalist for the student competition! Authors: David Awosoga, Justin Cue, Aujin Li, Jaden Majumdar, Juno Wang. University of Toronto, University of Waterloo

NBA Player Scorecards. Author: Fauzan Lodi, University of Waterloo

Evaluating Player Influence in the National Hockey League using Player Puck Tracking DataAuthor: Hassaan Inayatali, University of Toronto.

NHL Positivity Index. Authors: Jacob Winch, Tanmay Munjal, Heiby Lau, Alexander Bradley, Yukesh Subedi, Arden Monaghan, William Luo. University of Alberta.

2024 NFL Big Data Bowl: Tackles Above Average. Authors: Jacob Winch, Colton Schneider, Siddhartha Chitrakar. University of Alberta.

2024 Big Data Cup: Special Teams. Authors: Victor Hau, Jason Alizadeh, Kishan Vethanayagam, Matthew Braho. University of Toronto.

2024 NFL Big Data Bowl: Tackle Redistribution: ‘T’ in Tackle is for Team. Author: Zeke Weng. University of Toronto.

While this is an in-person event, we plan to stream all presentations using the following Zoom link:

Zoom Link: https://utoronto.zoom.us/j/88541237672
Password: UTSPAN

BRENDAN KUMAGAI – WINNING SUBMISSION FOR THE 2022 BIG DATA DERBY

Wednesday, January 17th, 2024, 7pm EST

BRENDAN KUMAGAI (Data Scientist in Zelus Analytics’ Hockey Research & Development team)

Check out his winning submission:

https://www.kaggle.com/code/bkumagai/bayesian-velocity-models-for-horse-race-simulation

David Radke – Presenting Multiagent Challenges in Team Sports Analytics

Wednesday, November 15th, 2023, 7pm EST

Dr. David Radke (SR RESEARCH SCIENTIST WITH CHICAGO BLACKHAWKS)

Check out the slides from the seminar:

https://drive.google.com/file/d/1z-3xeY-PuL-nOv5QR3CPWBCNwpC6v400/view?usp=sharing

Multiagent systems is a sub-field of artificial intelligence (AI) concerned with how multiple agents interact in an environment. In the first part of this talk, Dr. David Radke presented methods to characterize the available passing space between any two players using NHL puck and player tracking data. In the second part of this talk, Dr. David Radke discussed some of the ways researchers anticipate multiagent systems will be used to revolutionize team sports analytics in sports classified as “invasion games” (i.e., ice hockey, basketball, and soccer). Success in invasion game sports requires high degrees of inter-player coordination and cooperation. Dr. David Radke made connections between areas of research in multiagent systems and existing problems in team sports analytics, and in turn, show how sports will help drive multiagent systems research forward.

UTSPAN Hour #26: Panel Data Regression to Project NBA Player Performance

Wednesday, March 15, 2023 8pm EST
Ismail Benchekroun

Resources:
https://colab.research.google.com/drive/1FcRWp3UiRshDK2stzT0u7QXXgdG7uwrB?usp=sharing
https://docs.google.com/presentation/d/185ATJdxkNjShxuB9HFt8KxKGuHFbBR1aqA01WnBDFno/edit#slide=id.p

2023 Annual Showcase

Members of the UTSPAN community got together to present captivating sports analytics projects that they have been working on.

The program was:

6:10pm – 6:30pm
Hayden Groer, Bhanu Mamillapalli, Daniel Lim, Siddhi Bansal, Adam Mainella
A Metric-Driven Analysis of Defending the Golden State Warriors
Submitted to the 2023 Syracuse University Basketball Analytics Competition

6:30pm – 7:00pm
Hassaan Inayatali, Daniel Hocevar, Aaron White
Between the Lines: How Do We Measure Pressure in the NFL?
Finalist for the 2023 NFL Big Data Bowl ($10,000 prize money)

7:00pm – 7:15pm
Dinner

7:15pm – 7:40pm
David Awosoga, University of Waterloo
Beyond the Boxscore: Applications of Team and Individual Performance Evaluation Metrics in U SPORTS Basketball
Undergraduate thesis, University of Lethbridge
Slides

7:40pm – 8:00pm
Aaron White, Ethan Baron, (Daniel Hocevar, Kabir Malik)
RIPP: Holistic Player Evaluation with Region-Based Isolated Player Performance
Winner of the undergraduate category at the 2022 Big Data Cup ($1,000 prize money)
Slides

UTSPAN Hour #8: March Madness Bracket: How Odds, Probabilities, and Machine Learning Can Be Used to Predict Game Winners

Thursday, March 10th, 2022, 8pm EST
Colin Conant

Resources:

UTSPAN Hour #1: How Can I Get Data For Sports Analytics Projects?

Monday, January 10th, 2022, 8pm EST
Hassaan Inayatali

Resources:

Julien Guyon – Risk of Collusion in FIFA World Cup and a New Tournament Design

Friday, December 4th, 2020, 6pm EST
Julien Guyon (Bloomberg, NYU, Columbia)
(1) Risk of Collusion: Will Groups of Three Ruin the FIFA World Cup?
(2) “Choose Your Opponent”, a New Tournament Design

Please find a recording of the presentation below.

In 2026, the FIFA World Cup will for the first time gather 48 men’s national teams. It will consist of a group stage made of 16 groups of three, with the best two teams in each group advancing to the knockout stage. Using groups of three raises several fairness issues, including risk of match fixing and schedule imbalance. In this article we examine the risk of collusion. The two teams who play the last game in the group know exactly what results will let them advance to the knockout stage. Suspicion of match fixing occurs when a result qualifies both of them at the expense of the third team of the group, and can seriously tarnish the tournament. We quantify how often this is expected to happen and explain how to build the match schedule so as to minimize the risk of collusion. We also quantify how the risk of collusion depends on competitive balance. Moreover, we show that forbidding draws during the group stage (a rule considered by FIFA) does not eliminate the risk of match fixing, and that surprisingly when draws are forbidden the 3-2-1-0 point system does not do a better job at decreasing the risk of collusion than the 3-0 point system. Finally we describe alternate formats for a 48 team World Cup that would eliminate or strongly decrease the risk of collusion. Then, we present a new knockout format for sports tournaments, that we call “choose your opponent”, where the teams that have performed best during a preliminary group stage can choose their opponents during the subsequent knockout stage.

Ioannis Ntzoufras – A Unified Bayesian Model for Volleyball Data

Tuesday, November 24th, 2020, 11am EST
Ioannis Ntzoufras (Athens University of Economics and Business)
ntzoufras@aueb.gr
Bayesian Quest for Finding a Unified Model for Predicting Volleyball Games

See a recording of the presentation below.

Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for these volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.

Queen’s Geometric Sport Analysis Group – Geometrically Modeling Soccer and Basketball Games

Friday, November 20th, 2020, 6pm EST
Dan Forestall, Emily Hunter, Sara Stephens & Maia Gibbon (Queen’s Geometric Sport Analysis Group)
Geometrically Modeling Soccer and Basketball Games

To watch the presentation, see below.

With a focus on understanding the impact of player position, and how it evolves over time, we look at several 3-dimensional models of soccer & basketball games. This work builds off of that of Pleurer, Spearman, and others.

David Perdomo Meza – Stylistic Representation of Team Playing Style Using Latent Dirichlet Allocation

Monday, November 16th, 2020, 4pm EST
David Perdomo Meza (Twenty3 Sport)
Email: dperdomomeza@gmail.com
Twitter: @dperdomomeza1
Title: Tactical Insight through Stylistic Representation of Team Playing Style Using Latent Dirichlet Allocation

Check out the recording below:

We’ll showcase the application of Latent Dirichlet Allocation as a topic modelling technique on football statistics to obtain a mixture model representation of “team playing style” – and go through its application to evaluating tactical choices in team head to heads in the English Championship.

Jeremy Alexander – Impact of Team Numerical Advantage in Australian Rules Football

Monday, November 2nd, 2020
Jeremy Alexander (Victoria University)
The Impact of a Team Numerical Advantage on Match Play in Australian Rules Football

See a recording below.

The advent of player tracking technologies has supported a more detailed approach to the match analysis of invasion sports. Studies to date that have investigated tactical team behaviour by measuring how players occupy different sub-areas on a playing field in football have inferred performance by assessing a team’s capacity to generate a numerical advantage over a specific area. As such, a limited understanding exists between a team’s numerical advantage and the impact on match play in a continuous manner. Therefore, the primary aim of this study was to provide a proof of concept that determines the relationship between a team numerical advantage and match play in a continuous manner. The secondary aim was to determine how ball position and match phase influence how players occupy different sub-areas of play in Australian Rules Football.

Tutorial: Web Scraping

Information
Our first tutorial took place on Friday, October 23rd at 7:00pm EST. If you missed it, check out the notebook we used at https://utoronto-my.sharepoint.com/:u:/g/personal/eth_baron_mail_utoronto_ca/EaRu6yXz9M1MuO5VBZ0ed6EBocTIFdpsiTgoDMmasYtqNA?e=73asnW.

Learn how to web scrape
Web Scraping is a valuable tool, especially for those that enjoy creating independent coding projects. It allows for more creative projects by giving the user the ability to obtain their own niche data. This tutorial will focus on the fundamentals of how to web scrape, with additional focus on how a webpage is broken down and how to clean your obtained data.

Kenneth Brent Smale – Transitioning from Academia to Industry: Analytics in Pro Sports

Friday, October 30th, 2020
Kenneth Brent Smale (Los Angeles Angels, Apex Skating)
Transitioning from Academia to Industry: Analytics in Pro Sports

Check out the recording below.

As a student, the bulk of your training in analytics comes in the classroom and is heavily involved in the theory and simple strong signal-to-noise examples. In reality, and particularly in sports, things get much noisier with true data and different personalities and stakeholders. Kenneth Smale will talk through just how analytics differs from academia to the industry and provide guidance on how to make the transition as easy as possible.

Dani Chu – It’s Fun Getting Into (Foul) Trouble

Wednesday, October 14th, 2020, 6:00pm EST
Dani Chu (Seattle Kraken)
It’s Fun Getting Into (Foul) Trouble

This project investigates the fouling time distribution of players in the National Basketball Association. A Bayesian analysis is presented based on the assumption that fouling times follow a Gamma distribution. Methods are developed that will allow coaches to better manage their players under the threat of fouling out.

See a recording of the presentation below.

Abdullah Zafar – Mathematical Modelling in Professional Sport

Friday, October 9th, 2020, 6:00pm EST
Abdullah Zafar (Sports Performance Analytics Inc.)
Mathematical Modelling in Professional Sport

How to quantify actions in sport in order to build metrics, get insights, and drive performance? In this talk, we will overview, compare and contrast approaches using football (soccer) data from the Danish Superliga; focusing on how we can model the movement of a team using flow fields and dynamical systems, derive metrics to quantify team tempo, and then demonstrate the utility and application to the physical training of players as well as team performance as a whole. We will then break down tempo further using topological time series analysis to better understand the dynamics of a football match and highlight the difference in teams during goal-scoring moments.

For a recording of the presentation see below.