Questioning Possession-Adjusting Metrics: Ben Griffis
Yes. I finally found someone smarter than me to write for The Weekly Rondo.
Hey guys, this is Cameron writing right now. I want you to welcome Ben Griffis as he writes the first ever guest piece for The Weekly Rondo!
Ben is mostly known for his statistical coverage of the entirety of Asian football - or at least most of it!
Ben will be talking about possession adjusting stats, or pAdj, and why he seems to be losing faith in them. He uses the Scottish Professional Football League (SPFL) as his case study.
Ben is one of my favorite creators on Twitter because of how well he can break down complicated concepts into easy to understand ideas. Its because of Ben that my analytical understanding is so strong. I encourage all coaches to at least dabble in data analysis or at least become familiar with it for their own good.
It is because of data analysis that I’ve been able to take statistical findings and apply them to my training sessions.
Follow Ben on Twitter and visit his website. Tell him you came from The Weekly Rondo!
Thanks again Ben!
The use of Possession-adjusting metrics in football (pAdj), particularly defensive metrics like tackles, interceptions, and the like, has become a big thing. Like Hansel in Zoolander, pAdj is so hot right now.
I’ve explored a lot of pAdj methods, even making a thread a while back on two main methods: StatsBomb’s complex formula and a much more easy-to-understand formula. I’ve also done a lot of work in flipping the pAdj calculation to allow us to adjust in-possession metrics like passes, shots and whatnot.
Despite my previous uses of pAdj metrics, I’m in a phase now where I just don’t (really) have the same faith in pAdj metrics as I did a few months ago. Maybe one day I’ll come back around to it. but for now, I’ve been trying to refrain from using pAdj as widely as I used to. I’ve even gone as far as to not post any pAdj metrics outside of my player radars, where I have to include some form of pAdj metrics lest the wrath of football twitter descend upon me. So then, why do I dislike pAdj right now? I’m glad you asked.
Let’s first start with the basic theory driving pAdj metrics, that a player’s official game stats are not reflective of their true ability and can’t be accurately used in player-to-player comparisons. Adjusting for possession allows us to see what a player’s numbers might theoretically be if both teams (in all matches played) had exactly 50% possession. In theory, this is great and vital for comparing players. Because it addresses a major factor limiting the comparability of players in vastly different systems: opportunity.
On average, Manchester City players have almost 65% of the ball in most of their games, according to FBRef. This is the highest average possession recorded by any team in the UEFA “big 5” leagues. Logically, that means they only have the opportunity to defend for about 35% of their games. Compare that to Nottingham Forest at the other end of the spectrum, who average about 38% possession. This means Forest players have the opportunity to defend for over 60% of the time they’re playing.
Naturally, if the exact same player spent half a season at City, and half at Forest, they should record more defensive actions per game while playing for Forest rather than City, all else equal. Conversely, they would likely have more passes during their time at City than at Forest. This is the opportunity I’m talking about.
Thus, adjusting metrics for possession has, at its core, the idea that we’re controlling for each players’ opportunity to defend or attack. In theory, that’s wonderful. And I think the math behind it all is sound.
But by adjusting for possession and addressing this one issue in football data, we introduce another issue. However this time, the issue is more gruesome, because many people don’t understand that we’ve just introduced another issue into the mix. This makes the interpretation and possibly conclusions from pAdj data even worse than non-adjusted data we’ve tried to fix.
That new issue? Team (and league) styles. There’s more than just team (and league) styles, but this is the biggest factor we can see.
Think of a random 55-60% possession team. How do they play with the ball? How do they pass? How often do they play long balls? How often do they play into the box? How often do they try risky passes? Now think of a random 40-45% possession team and think of the exact same questions.
The answers to these questions are the opposite of what you’d think. High-possession teams typically, as a proportion of their total passes, don’t play nearly as many risky passes as low-possession teams. Or long balls. Or sometimes even passes into the box. High-possession teams don’t tend to be nearly as direct as low-possession teams are. And that’s simply because low possession teams are forced to be relatively more direct than high possession teams, otherwise they’d score a handful of goals all season. Low possession teams playing like high possession teams would play methodical buildups for 20-25 of the 30 minutes they have possession for, and get only a few shots off per match. Low possession teams are forced, by nature of having relatively little possession, to maximize their limited time on the ball relative to high possession teams.
If Forest played in the exact same style as City, but for 35% of the game not 65%, we would be able to adjust those two teams’ metrics for possession and compare. But Forest just can’t risk playing as methodical as City when they only have 35% of the ball.
The same goes for Getafe. And for Bournemouth. For Troyes, Union Berlin, Mainz, Everton… The list goes on.
Despite ranking #1 for possession, City rank just 42nd in the big 5 leagues for number of long passes attempted. Getafe, who had the 2nd-lowest possession last season, rank 7th for number of long passes attempted. Despite having about 25% less possession, Getafe played over 11% more long passes than City.
This is just long passes, but the same idea holds for many different aspects/styles of play. The gist is that when we adjust for possession, we’re only adjusting for possession. But we’re magnifying other issues inherent in football metrics: both in and out of possession metrics. And I’m not sure that the benefits of adjusting for possession outweigh the problems these other issues bring in.
That’s why I’ve tried to refrain from using pAdj metrics recently.
But I wouldn’t be Ben Griffis without a couple visuals explaining this, right? So, let’s take one of the most extremely lopsided leagues in the world as a case study. That’s right, it’s the 1800s home of football, Scotland!
For those that don’t know, Celtic and Rangers completely run the league, and that goes for possession too. Celtic averaged an astronomical 69.5% possession last season. I can’t comprehend it myself. Celtic had over 70% of the ball in 18 matches last season. Eighteen! Behind them, Rangers had 64.9% of the ball. And in 3rd place, Hibs averaged 53.8%! 5 of the 12 teams in the league averaged below 45%. Needless to say, the Premiership is a great league to use to illustrate the limits of possession adjusting metrics and the issues that are magnified when adjusting.
At the risk of plugging my old work, I’m going to use my “passing danger index” (PDI) as the example. Essentially, it aims to show players who play different types of dangerous passes, and play them frequently. For this, you mainly just need to know that players who rank high can be called “dangerous passers”, but there is a footnote at the bottom of the graph with more info.
This first graph below is the unadjusted PDI for the 22/23 Premiership season.
As we should probably expect, given that the best players by and large play for Celtic or Rangers, 19 0f the top 20 players are Celtic/Rangers players. Fair play to Barrie McKay of Hearts!
But of course, as we now know, these players will have much more opportunity to play dangerous passes when they see nearly 70% of the ball, and almost twice as much of the ball as players from teams like Kilmarnock, Ross County, St. Mirren and the like.
This second graph below shows PDI adjusted for possession (pAdj PDI).
I hope I don’t need to say this, but these just aren’t the most dangerous passers in the league. Rangers would never look at this and say, “hey you know what we should do? Swap Malik Tillman for Yan Dhanda and that’s how we’ll beat Celtic to the title next season”. Is Dhanda a good player? Absolutely. Is he likely much better than Tillman at playing dangerous passes? I don’t think so. Both are Attacking Midfielders, and Tillman ranks 5th in the unadjusted PDI while Dhanda ranks 2nd in the pAdj PDI. In theory, Dhanda could be presumed to be “better” than Tillman in this area, right?
But think of how Ross County and Rangers play. Think of what Tillman does when he gets the ball compared to Dhanda. Dhanda is fun to watch even in a poor Ross County because nearly every single time he gets the ball he has to be overly attacking and direct. If he wasn’t, Ross County absolutely would have been relegated. But Tillman, while very dangerous, direct, and attacking, has the luxury of being able to pick and choose when to play these passes, unlike Dhanda who has to look to play these passes nearly every time he gets the ball.
Another issue that gets magnified in pAdj metrics is how central or important a player is to their team for the metric(s) adjusted. Continuing with Dhanda, he’s effectively the only Ross County player who is playing these passes consistently, as the team isn’t gifted with several good and dangerous passers like Rangers are. Tillman doesn’t have the same burden of being the sole midfield creative outlet, which will hurt his pAdj numbers since Dhanda plays that key role in a very low possession side, whereas Tillman is one of several creators in a very high possession side.
This magnification of other issues is why I have stopped using pAdj metrics so frequently recently. Of course, if I were making something for a club where I have the ability to explain the limitations of each and every metric I present, I would be fine using pAdj metrics as one of many variables. But out in the general public, I refrain from posting those same pAdj metrics in fear they’d be misappropriated by others.
The final piece of this story is: can we address these issues? I think we can, but that also means we’d go away from possession adjusting and into comprehensive adjusting. We’ll call it cAdj, since football analysis loves our acronyms.
But this is beyond the scope of a simple, resource-constrained public hobby-analyst like myself. And beyond the realm of “tangibility” for general online football-data-consumers. If we were able to adjust for possession; for team style; for player style; for relative importance of a player to their team; even potentially league style and many more variables, we’d be able to almost completely and objectively compare players to find the “best” players in leagues, positions, etc. Clubs would be better able to ensure their transfers could do what they want them to do. We might even be able to know how a player would perform in any given league in the world.
This would be more akin to a dissertation, however, and when done right is something clubs would probably pay a pretty penny for. Which is why a) it might be done already and b) the general public won’t know if it’s been (or being) done already! Isolating variables is nearly impossible in sports, given the drove of external factors that we can’t uncouple from currently-measured variables, so this would be a monumental, albeit ground-breaking, task.
Essentially, adjusting metrics for possession does address one small issue that’s inherent in football data. But it ends up magnifying other inherent issues in football data to a point where I think we’re being led astray by pAdj data as we see it right now. At least in the public sphere, that is. So, until we can adequately address several issues in one go, I’m personally refraining from over-using a method in the public sphere that I have too many reservations on.
Thanks for reading everyone! We hope you enjoyed the first ever guest piece on The Weekly Rondo.
Know that every subscription allows me to pay more writers to bring content to you that’s going to help you on your coaching journey in one way or another.
Want to earn a free subscription? Refer a friend.