Introduction to Analytics in… Soccer

Written by: Valentin Stolbunov

Soccer, or football, or footy, or “the beautiful game” is the world’s most popular sport. When trying to prove this to a fan of North American sports, a soccer fan’s best weapon is usually global TV audience numbers. The 2014 Super Bowl had an audience of about 160 million viewers worldwide. The same year, the FIFA World Cup final had a global audience of about 1 billion. So, yeah, soccer is popular.

The recent sports analytics movement, however, didn’t originate from the world’s most popular sport. Most would agree it started with baseball and then spread to other North American sports – hockey, basketball, and football (the one with helmets). Compared to these sports, the use of advanced or “fancy” stats in soccer is still in the early stages.

Read more Introduction to Analytics in… Soccer

Introduction to Analytics in… Baseball

Written by: Kurtis Judd

Whether you’re simply interested in following home run races, or using programming languages to predict next year’s MVP, it’s hard to argue that baseball isn’t a statistics driven sport. Every event in the game is so discrete, that it’s a statistician’s dream of clean, easy to work with data.

Read more Introduction to Analytics in… Baseball

Introduction to UTSPAN

Written by: Valentin Stolbunov

Our mission statement

At the University of Toronto Sports Analytics Group, we aim to connect members who share an interest in the field. We also aim to support the analytics process and help members explore their own interests. Members can work together to find and manage data, to develop and test analytic models, and to present and publish their findings. Last but not least, we hope to connect members with industry professionals.

Our methodology

It is clear from our earlier introduction to sports analytics that Alamar’s structure of the sports analytics process is about feeding the decision makers. This is an indication of the place sports analytics has in a sports organization. However, because we do not operate within any particular organization, the structure needs to be revised slightly.

Fig3

 

We do not have decision makers to whom we would provide our findings. Instead, our work is motivated by a question or a general area of interest. In many ways, this is similar to a coach asking the analytics team a question along the lines of “who is the best player we can sign for around $3 million?”. So instead of using a methodology that is designed to feed into a larger framework, we have altered the structure to instead begin with an element of motivation.

With no decision makers to facilitate, the information systems are no longer used to support the final element in the process. As the final element themselves, these systems must now focus on presenting the findings of the data and/or models in the same effective and efficient manner. Hence their new name: Presentation of Results. This presentation may be verbal (an essay-style argument that Player A is a better “scorer” than Player B), or visual (a visualization which shows the two players’ scoring habits), or both.

Our methodology will follow the sports analytics process above. A more specific, question-based, step-by-step process would look something like this,

  1. What is the area of interest we are looking to explore?
  2. Given (1), what type of data do we require? How do we best obtain and organize this data?
  3. Given (1) and (2), what type of analytic models, if any, should we use?
  4. Given (2) and (3), what is the best way to present our findings?

A basic example

Let us assume we were looking to determine who is currently the most dominant scorer in the NBA. This problem is ultimately as complicated as you would like to make it, but for the purposes of this demonstration we will keep it simple. The answers to our questions above would look something like this,

  1. Area of interest: offensive performances in the NBA
  2. Data required: points per game and field goal percentage this season
  3. Analytic models: none
  4. Presentation of results: rank players based on offensive data

Fig4

A more advanced example

A more difficult problem would be determining the best “lock down defender” in the NBA. Some sort of model would probably be necessary this time and the answers would look something like this,

  1. Area of interest: 1 on 1 defensive performances in the NBA
  2. Data required: defensive statistics (steals, blocks) as well as change in opponent’s shooting percentage, preferably adjusted for opponent’s time on the court and
  3. Analytic models: obtain distributions for defensive statistics and the defender’s impact on the opposing player’s offensive performance
  4. Presentation of results: use average statistics to compare defending ability and variance of statistics to compare consistency, rank players

Fig5

Introduction to Sports Analytics

Written by: Valentin Stolbunov

Defining: Analytics

Before jumping to Google and adding “wiki” to the end of my search query, I thought I’d try to define analytics myself. When I think of analytics, I usually think of finding patterns in data and using those patterns to answer questions. Wikipedia says I am not too far off:

Analytics is the discovery and communication of meaningful patterns in data.

The important thing to note at this point is that analytics is a process. In fact, it is an interdisciplinary process which usually brings together mathematics, statistics, computer science, predictive methods, data visualization, and other fields of study.

It is also important to note that analytics relies on the presence of data. This ultimately differentiates the term from “analysis” and unfortunately creates confusion when trying to decide if what you are doing is analytics or “data analysis”. For our intents and purposes, the two are essentially the same process. However, the field has been dubbed “sports analytics” and not “sports data analysis”, so we will accept the name and move on.

Before continuing to the sports side of things, it should be noted that the term “analytics” may also be used to describe the results of this process. For example, “the analytics of our last project suggest that…” is a perfectly valid sentence. However, in my experience I find that here it is best to just replace “analytics” with “analytical findings” or “results” and reserve the term “analytics” for the process through which these results are obtained.

Defining: Sports Analytics

This is where Wikipedia does not offer much help – nor does it need to. Sports analytics is essentially the analytics process, as described above, applied to sports.

It is the process of using sports-related data (anything from player statistics to game day weather) to find meaningful patterns (strong correlations, hidden trends, etc.) and communicate those patterns (using graphs, charts, essays, etc.) to help make decisions.

Fig1

 

In his book, Benjamin Alamar presents a helpful graphic to illustrate the overall sports analytics process. In his framework, sports analytics consists of four elements: data management, analytic models, information systems, and the decision maker.

I have done my best to provide both Alamar’s definition of each element and my own thoughts on their uses and values:

  • Data management: This includes any and all processes associated with acquiring, verifying and storing data. The data management element is ultimately about facilitating the modelling and information-extraction elements. As mentioned earlier, you can’t have analytics without data.
  • Analytic models: This element is essentially the process of applying statistical tools to data. The use of models to “forecast” player or team performance is often the most popular goal, but it is by no means a necessity. The models may or may not offer insight into the future. It is most accurate to say that they are concerned with using mathematics and statistics to describe the data.
  • Information systems: Unlike the previous two elements, the information systems are slightly more abstract. The purpose of these systems is to extract and present the data and/or model results as effectively and efficiently as possible. A scouting report is a good basic example.
  • Decision makers: The end goal of analytics is to extract relevant and insightful information from the data and present it to the decision makers. In modern sports these tend to be the coaching staff or management, however, players themselves may also benefit from the whole process.

What is the history of the field?

Although many professionals believe that modern model-heavy sports analytics is at a point of exciting growth, the field of sports analytics is by no means new. Technically speaking, any time anyone has ever used data to make a decision related to sport, they were conducting analytics. However, the general consensus is that sports analytics began sometime in the 19th century with baseball. The data (basic statistics such as hits and pitches) was collected with good old pencil and paper. It was then used create scouting reports which a coach or manager would use to make decisions about their team.

Referring back to Alamar’s graphic of the entire process, this type of analytics would lack a modelling element but still follow a logical flow toward the decision maker. These 19th century baseball decisions to be made were perhaps fewer and less detailed, but not necessarily easier.

Fig2

 

What is the current state of the field?

We now have two things which we didn’t have in the 19th century of baseball analytics. The first of these is more sports nerds. Sports have grown in popularity and fans have become much more demanding of information. More often than not sports arguments include statistics, even if they are about whether or not “number of rings” is a statistic. Everyone and their parents have a fantasy team and compulsively refresh Twitter in hopes of finding out how long Derrick Rose will be out for this season.

The second thing we now have, which in some ways overlaps with the first, is more data. The recent advances in technology have affected just about every aspect of life, and sports is no different. The following things have all contributed to the recent growth in the field:

  • The improvements in computing power and digital memory
  • The increased quantification of our world (aka the ultimate buzzword: “big data”)
  • The advances made in solving complex engineering problems like vision and inference

Modern sports analytics uses database management systems and things like SQL where pen and paper were once the norm. Analytical models from machine learning and data mining are now used to help sort through the data and find patterns. Models are now updated in real time and together with innovative visualization techniques are the new breed of information system.

With more data, and more people interested in sports analytics, organizations are doing their best to gain every possible advantage in every aspect of sports from training routines to player recruitment and valuation.

What does the future hold?

The field is growing.

More and more sports organizations are hiring analytics “teams” and “departments”, usually composed of professionals with STEM (science, technology, engineering, mathematics) degrees. The media appears to be following suit by recruiting data science professionals to find and visualize the unique trends that their viewers want to see. There is no reason to believe that these new opportunities will stop popping up or disappear all together.

In addition, social media has helped connect fans and form communities of the analytically-inclined. Whether out of personal interest or in hopes of being noticed, more fans will create stats-based blogs and continue to explore the numbers of their sport.

The field has also not gone unnoticed in academia. If conferences like MIT’s Sloan, journals like Quantitative Analysis in Sports, and new courses offered by top universities are any indication, institutions have noticed the growth in sports data and are interested in conducting research in the field.

Sports analytics is sometimes discounted as just an invention of weird metrics. But it is much more than that. From engineering solutions in data, like SportsVu, to innovative information systems, like shot charts, the future of the field is in ultimately working to advance each step of the whole process.