Interpreting Shot Based Expected Goals
This post is meant to highlight some of the flaws with how people think about shot based expected modeling and provide a probability theory approach to goals in soccer games.
So right now a really hot topic in soccer is shot based expected goals, or expected goals, and the rough notion is that for every shot you can calculate the probability that it ends up in a goal. Most teams or data analyst then sum up all of the shot probabilities to produce a number that is the teams shot based expected goals for a match. Here is a great video on a data analyst for NYCFC talking about expected goals. If you look at the talk he goes in and analyzes a game where he describes nycfc taking 14 shots and having 1.1 expected goals and Orlando City to have .9 expected goals from 7 shots. If you are thinking this game probably ended in a tie, then you are mistaken and this article might help you understand the data better than that simple number, or the time graph later in the linked youtube video.
First I decided to make a fake example to understand the data better, consider a game in which team A shoots the ball 10 times each with a 10% chance probability of scoring, and team B shoots the ball two times each with a 50% chance of scoring. This makes it so that each team is expected to score 1 goal on average, but more interestingly is the probability distribution of total goals each team should be expected to score. For team B we can compute that four out comes exist for the chances they take, shots 1 and 2 both score, they both miss, shot 1 goes in and shot 2 misses, or shot 2 goes in and shot 1 misses. Since both chances score with 50% probability, each of these outcomes is equally likely. We can do this same type of analysis for the first game as well but there are over one thousand combinations of shot outcomes, so instead I aggregate them all up. and get a chance distribution.
You can see here that the probability distributions are not the same even though they scored the same number of goals on average. You can verify this because intuitively it is impossible to score 3 goals from 2 chances and thus the red bars only go up to 2. The blue bars could hypothetically go up to 10 since all 10 shots could go in, but the probability of that happening is almost like that of winning the lottery. The point of this visualization is that just because you are expected to score 1 goal does not mean it is most likely that you will score 1 goal.