Predicting the pass, in-game shape, player pressure: welcome to next gen of football analytics - The Athletic

2022-06-11 01:33:57 By : Mr. Chris Zhang

Football analytics continues to innovate.

For many years there have been two branches of football data. One is event data, which logs everything that happens on the pitch, such as passes, shots, interceptions or tackles. 

The other is tracking data, which logs the locations of every player on the pitch at a rate of 25 frames per second — in case you needed clarification, that is a lot of rows on a spreadsheet across a 90-minute game.

Much like listening to a football game on the radio, event data outlines the on-ball actions that an individual makes, which can provide a healthy amount of information to extract valuable insight from. However, focusing solely on such on-ball actions means that all other acting — and interacting — events occurring on the field of play are not accounted for.

In contrast to this ‘spotlight’ event-based data, tracking data zooms out to provide a floodlit, high-definition view of the remaining 21 players on the pitch — crucially, contextualising how those 21 players influence the actions made by the one who’s on the ball.

Join the two branches together and the lens through which we can view football gains greater focus.

This “contextual event data” has increased in popularity in recent years, and opens up a pandora’s box of possibilities to quantify more components of the game.

Sports data and analytics company Stats Perform has answered the latest call.

This week, it announced a new series of advanced football metrics from its live “Opta Vision” data feeds, creating a single, merged dataset across multiple leagues worldwide. 

So with clubs, media and fans seeking to understand football at the most detailed level, just how much of a game-changer could this be within the football landscape?

Pressure on the ball & ‘breaking the lines’

“We’ve had tracking data in various forms for probably 15 years through various providers and technology companies,” says Ben Mackriell, Stats Perform’s vice-president of data, AI and pro products. 

“On the professional-club side, tracking data has predominantly been used for physical performance analysis: assessing speeds, distances and comparing that to training data from GPS.

“There has been a rise in the use of it for tactical analysis, but it has required data-science capabilities to really get anything out of it.”

Of course, tracking data can be difficult to access to the same degree across all leagues.

While Premier League clubs have multiple in-built cameras within their stadiums, thanks to their official partner Second Spectrum, lesser-covered leagues — for example, the Cypriot First Division — don’t have the same luxury. 

To combat this, companies such as SkillCorner use artificial intelligence to extract tracking data remotely — essentially, independently from fixed stadium installations. Crucially, Opta Vision’s tracking data sources combine both these methodologies to create both depth and breadth of coverage.

In recent years, companies such as SciSports, Second Spectrum, SkillCorner and StatsBomb, with the latter’s release of their “360” dataset have looked to combine event and tracking data. Crucially, Stats Perform’s new release looks to create an automated process via computer vision, allowing them to feed this information to the user in real-time.

So what new, in-depth insight can be gleaned from using this merged dataset?

Mackriell breaks it down simply into four major outputs.

The first is the increased understanding of the pressure a player gives and receives on the ball. Rather than simply log an event that a player made a pass, this new dataset shines a light on how many players were nearby, and how much pressure is being put on the ball in every event.

For example, as Manchester City’s Kevin De Bruyne receives a pass from Joao Cancelo below, with two Wolves players applying pressure to him…

…it is noticeably different to a pass he receives in a similar location of the pitch in this grab from a game against Arsenal, where very little pressure is being applied.

The locations of other players who didn’t touch the ball add further context to the event of a pass being received, and can importantly show which players are putting the most pressure on the ball. 

The second output is a greater understanding of ‘line-breaking passes’ — i.e. how many players does the ball eliminate with a single pass?

There is no better case study than Liverpool’s Thiago.

As you can see in the example below, Thiago picks the ball up deep in his own half and takes out five of Manchester United’s players with a single pass to Jordan Henderson in the centre circle.

By contrast, in this next grab against Southampton, Thiago plays a simple pass to find Diogo Jota, also in the centre circle, but no players have been bypassed with this one despite the end location in both examples being very similar.

In the traditional event data, both balls would be logged as little more than a forward pass, but the location of the opposition players showed just how valuable the pass was in penetrating through United’s midfield in the first example.

Mackriell identifies the growing popularity of these modern, advanced metrics in the analytics world and how such methods of describing the game are well-established in the football lexicon — the only difference now is that these terms can be better quantified.

“We know that is something that has become more common from other providers recently, and is something we wanted to do through computer vision and an automated process, because it becomes more scalable and allows you to do it faster but also live, ” Mackriell says.

“These things like ‘pressure’ and ‘line-breaking passes’ have become such common terminology for fans. Not just coaches and pundits, but your average fan.”

The third major output of the new dataset is a pass prediction model which allows you to better understand a player’s decision-making. 

What passing option does a player choose? What were the risks and the rewards of that decision? Did they pick the most threatening option, or the safest one?

“We do this with coaches after every game. We go back through the game and we evaluate, ‘Did the player make the right decision in these situations?’” Mackriell says. “You look at pass success — it doesn’t really give you any indication as to whether there was a better option, or whether that was the right option.”

The new model can help to quantify this.

Let’s look at another example.

Below, we see Real Madrid’s new signing Aurelien Tchouameni has three pass options when playing for previous club Monaco, all of which have a less than 50 per cent chance of being completed — quantified as “Expected Pass Completion ” or xPass.

Pass models have been around for some time, but as Mackriell outlines, this new one allows you to see all of the options available to a player: “We’re making a prediction based on the location of the player, how much pressure and how much distance is between others around the ball, how likely are those players to make an interception or put significant pressure on the player receiving the ball.”

The pass options also carry a potential reward, quantified by “Expected Threat” or xThreat — as outlined previously by The Athletic — which is the likelihood of a shot occurring within the next 10 seconds if the pass is completed. 

For Tchouameni, t he pass with the highest potential reward, from a threat perspective, is a pass in to the feet of the centrally-placed Gelson Martins (0.27 xThreat). However, a pass to Cesc Fabregas at the far post has a higher likelihood of being completed (43 per cent xPass). 

Tchouameni chose the latter option and completed the pass, as Fabregas got to the ball ahead of Paris Saint-Germain goalkeeper Keylor Navas to set up an open goal…

…for Kevin Volland to tap the ball into.

Two of the three options available to Tchouameni in our example offered a high reward (more than 0.2 xThreat) but had a less than 50 per cent chance of being completed. Ultimately, the Frenchman’s decision-making was correct when weighing up the situation for Monaco to potentially score.

Taking all the passes that met this criteria of “high risk, high reward” passes in Ligue 1 during the 2020-21 season our example comes from, we see Monaco attempted more difficult pass options than any other team as they fought to a third-placed finish behind Lille and PSG.

At the player level, this analysis can help to re-evaluate how important an individual is towards their team’s effort, as Mackriell highlights.

“Making the safe pass or keeping the ball moving might be the best option a player has,” he says, “but we often criticise in the stadium when they go backwards — but that might have been the best decision at that point, and we can now quantify that.”

A closer look at team shape

The fourth and final output of the new dataset drills deeper into the subtleties of a team’s shape within a game — given that our current explanation of team formations is ultimately flawed.

“We’ve been using formations forever as a way of describing the general structure of a team. But shape is the dynamic shift in that during 90 minutes. It’s based on ball location, the context of the game, in and out of possession.” Mackriell outlines. “A team playing 4-4-2 doesn’t tell you anywhere near enough information about what the shape of the team is and what they are trying to achieve.”

As football fans, we are familiar with starting formations, but the variations within these footballing systems are becoming more widely understood. Knowing that a team’s shape changes depending on the context of the game — as simple as the differences between their set-up with and without the ball — is more commonplace.

“We aimed to build something that, at any moment in the game, can tell you the true shape of the team, the nuances within that shape, and what shape is most effective in creating chances based on that shape,” Mackriell says.

Let’s look at another example from France’s Ligue 1.

It’s Monaco again, and this time their defender Benoit Badiashile, circled and highlighted, has the ball midway inside his own half.

During this phase of play, Stats Perform’s Shape Analysis model detects that Monaco are operating in a 4-2-4 shape in possession – with right-back Ruben Aguilar getting high upfield on the right-hand side to create a forward line of four offensive players.

This analysis can detect the most common shapes adopted during a match, or an entire season, but also establish which shapes are the most efficient in generating goal-scoring opportunities.

For example, analysing Monaco’s shape when they had the ball over the course of 2020-21 shows how frequently they featured wide players in advanced areas to create an attacking four, or five — 46 per cent of their time in possession.

This can then go a step further and establish which of these shapes was the most efficient, in terms of generating a goalscoring threat.

This is shown using another advanced metric, Possession Value (PV), which measures the probability that a team in possession will score a goal in the next 10 seconds – similar to the xThreat metric, but focusing on goal probability rather than shot probability.

According to the Possession Value output, the most efficient shape Monaco used during that 2020-21 season was 2-4-4, which generated 4.7 PV per 100 possessions — slightly ahead of the 4-4-2 comprising advanced wide players, as shown below.

The importance of capturing such fundamental aspects of the game using data has been crucial to the new release. Crucially, this is to reflect the most common conversations that are occurring among coaching staffs.

“For me, it’s about trying to put this data into everyday football language,” Mackriell says. “What we’re trying to do with a lot of this content is take things that are already familiar to coaches and analysts, things that we talk about in training grounds every day, and try to translate that into something that can be accepted by fans.”

The next generation of football analytics is upon us.