• A reminder that if you give a thumbs up or similarly positive reaction to a racist comment you may also receive a ban along with the user that wrote the post.

Analytics in Football - Official Thread

What's the use case for granular shot-by-shot data?
I want to break xG down to game state for as many teams and leagues as possible. I reckon (without being certain) that it's important for betting analysis. Understat covers this to a certain degree for the big five and, ehhh...Rü$$1ã(wtf?), but I want to be able to manipulate the data more vigorously than understat would allow.

I have a sheet for Brazil, using fbref data where I've been collecting the data manually, and it's working ok. I'd like to have something for Norway and Sweden too, but doing this manually with Fotmob would take forever, so automation would be the only way. Eventually, I'd love to have something set up where I'd have as much of the data that I want with minimal to zero work, so that I could have it ready to analyse.
then there are ways to use ChatGPT to help you interatively develop versions of that script. But you'd need somewher eto run that script - e.g. on your local machine or even Google Colab)
Yeah, Google Colab is what GPT is recommending, but I'm clueless as fuck with this stuff. Just looking back at the conversation now, I think it might be offering me more of a shortcut than I originally thought, but I'll have to try it again tomorrow evening.

I realise that this stuff is potentially unethical but I think it can be set up in a way that it won't bombard the site with queries.

It feels like there's a treasure trove of data right there that I'd love to have, but I also don't know at what point I'd be biting off more than I could chew! 😁
 
I want to break xG down to game state for as many teams and leagues as possible. I reckon (without being certain) that it's important for betting analysis. Understat covers this to a certain degree for the big five and, ehhh...Rü$$1ã(wtf?), but I want to be able to manipulate the data more vigorously than understat would allow.

I have a sheet for Brazil, using fbref data where I've been collecting the data manually, and it's working ok. I'd like to have something for Norway and Sweden too, but doing this manually with Fotmob would take forever, so automation would be the only way. Eventually, I'd love to have something set up where I'd have as much of the data that I want with minimal to zero work, so that I could have it ready to analyse.

I'd assume the potential usage would be for in-play betting then?

Across the last few weeks I've taken a copy of my SoT model and created an AGS model for some of the currently running leagues - Arg, Sweden, China, MLS - just to fill the gap while the big leagues are on a break. Where I'll struggle is not knowing the leagues/players means you can't do a decent sniff test on guys who the model thinks are value.

Yeah, Google Colab is what GPT is recommending, but I'm clueless as fuck with this stuff. Just looking back at the conversation now, I think it might be offering me more of a shortcut than I originally thought, but I'll have to try it again tomorrow evening.

Google Colab is pretty handy for running python and sharing notebooks.

If you want to get into scraping my guide would be to start small and build it up incrementally

e.g. try to scrape the shot data for one shot in one game
then try to extend it to cycle through all the shots in a game
then scrape a list of all games in a league
then run the match-level scraping logic against every game in the list

You'll get better insight from ChatGPT etc. with small, precise things and then build out as you solve each mini problem

1752603355891.png

I realise that this stuff is potentially unethical but I think it can be set up in a way that it won't bombard the site with queries.

Technically, automated scraping of Fotmob is not permitted - but as long as it's done in a vaguely sensible way (including pauses so you aren't hitting the site with a high volume of requests in a short space of time) you're unlikely to be prevented from doing it.

 
I'd assume the potential usage would be for in-play betting then?
No, it isn't. It's for getting a better grasp on who's "good" and who's "bad" in the short term.

Spurs and Forest were two big influences from last season. About 10-15 games in to last season, regular xG would have said Spurs were "very good" and Forest were only "ok", yet level game state xG would have said Spurs were below average while Forest were excellent. I know which was more trustworthy in the short term, and which was better for making judgement calls in match odds betting markets.

Spurs were padding big time after going behind regularly, and Forest were generally good before taking the lead, and then defending to a point where they supposedly didn't deserve to win!

Season-long overall xG is good for making calls about the following season (Forest will probably pay a big price this coming season). But, within a season, you need to be breaking shit down in various ways to see what's really going on.

Across the last few weeks I've taken a copy of my SoT model and created an AGS model for some of the currently running leagues - Arg, Sweden, China, MLS - just to fill the gap while the big leagues are on a break. Where I'll struggle is not knowing the leagues/players means you can't do a decent sniff test on guys who the model thinks are value.



Google Colab is pretty handy for running python and sharing notebooks.

If you want to get into scraping my guide would be to start small and build it up incrementally

e.g. try to scrape the shot data for one shot in one game
then try to extend it to cycle through all the shots in a game
then scrape a list of all games in a league
then run the match-level scraping logic against every game in the list

You'll get better insight from ChatGPT etc. with small, precise things and then build out as you solve each mini problem

View attachment 43490



Technically, automated scraping of Fotmob is not permitted - but as long as it's done in a vaguely sensible way (including pauses so you aren't hitting the site with a high volume of requests in a short space of time) you're unlikely to be prevented from doing it.

Thanks for all this. Sure I'll give it a go and see what happens. I didn't have time in the end this evening.
 
Ah well, it seems FotMob are blocking automated requests. The code on google colab kept running into errors, and that's what ChatGPT is telling me after several attempts to work around the errors.
 
Ah well, it seems FotMob are blocking automated requests. The code on google colab kept running into errors, and that's what ChatGPT is telling me after several attempts to work around the errors.

I'll ping you my email via private message. If you email the colab notebook I'll see if I can get it running for you 👍🏻
 
Screenshot-2025-07-28-at-06-26-49-Facebook.png
 
Almost certain to have people frothing at the mouth if their favourite players aren't ranked highly enough, but an interesting undertaking nonetheless.


He's only published 500-380 so far, but my guess is Giggs comes out on top as it's a cumulative methodology - so longevity will weigh heavily.
 
Last edited:
It's interesting to see university research papers on niche football related topics.

This one is a deep learning model for foul prediction based on player pose data.

I'm surpised they got to an accuracy as high as 77% to be honest, given the subtleties around the nature of the contact (position / strength / angle) and whether it constitutes a foul or not.

Some details of the paper (link below)
  • Clip selection: 3s before and 1s after a foul → 4s segments.
  • Players tracked: 5 closest to the ball using ByteTrack for bounding boxes.
  • Pose features: 17 keypoints (shoulders, hips, knees, etc.) estimated with OCHuman.
  • Inputs:
    1. Video frames (resized to 64×64),
    2. Bounding box positions,
    3. Pose keypoints,
    4. Cropped bbox images.

1757598843135.png

Conclusions: This paper presents a soccer foul dataset containing pose and bbox and proposes a method combining video, bbox, and pose to predict foul events in soccer. The results show that the pose and bbox information play an auxiliary role in the prediction of foul behavior. However, there are some limitations to this study. Firstly, the accuracy of the soccer foul-related dataset itself can be improved. During target detection and tracking, target loss may affect the final prediction results. In addition, detecting pose data is also challenging in pose detection due to the potential overlap between poses. There may be some confounding factors in the current dataset that affect the learning process of bbox and pose data.

Despite these limitations, this study provides an important exploration and contribution to the understanding of soccer foul play. Future work could further improve the accuracy and quality of the dataset, improve algorithms for target detection, tracking, and pose detection, and explore the possibility of other combinations of data sources and models to improve the accuracy and reliability of foul prediction. This will help to provide further insight into soccer foul play and provide useful support for refereeing decisions and training in soccer matches.

 
It's interesting to see university research papers on niche football related topics.

This one is a deep learning model for foul prediction based on player pose data.

I'm surpised they got to an accuracy as high as 77% to be honest, given the subtleties around the nature of the contact (position / strength / angle) and whether it constitutes a foul or not.

Some details of the paper (link below)
  • Clip selection: 3s before and 1s after a foul → 4s segments.
  • Players tracked: 5 closest to the ball using ByteTrack for bounding boxes.
  • Pose features: 17 keypoints (shoulders, hips, knees, etc.) estimated with OCHuman.
  • Inputs:
    1. Video frames (resized to 64×64),
    2. Bounding box positions,
    3. Pose keypoints,
    4. Cropped bbox images.

View attachment 44530




IMG_7703.webp

I predict fouls when i see this^^^
 
What's On Today

Live Music

Ballads & Banjos

The Welcome Inn, What's On Today @ 9:30 pm

More events ▼
Top