Brazilian Soccer

🤖 I built a simple application to predict match attendance for the Brazilian Championship, based only on historical attendance and revenue data.

To do that:

🧠 I asked Claude to generate Python code to test a Gradient Boosting model
🐍 Validated and tested the model
📱 Then built a Streamlit app following the model parameters

🔗 App link:
https://appestimadorbrasileirao-zxab6hase9fib9mhsbhpvt.streamlit.app/

🧪 Initial tests (7th round of the league):

⚽ Grêmio vs Vitória
Real: 19,963 | Prediction: 23,192

⚽ Bahia vs Red Bull
Real: 29,732 | Prediction: 34,396

📊 This is part of a broader project where I analyze attendance and revenue data from the Brazilian Championship, covering the period from 2018 to 2025 (excluding the pandemic years).

📌 The project started with data collection — since there was no consolidated dataset available.

The main challenge was exactly that: the data existed, but it was completely scattered. Each match had its own report (PDF) on the CBF website 📄, in a non-structured format — and to make things even harder, each federation used a different template.

💡 Data pipeline:

🔗 Built a list of report links and used Free Download Manager to download the files
☁️ Uploaded all PDFs to Google Drive
🤖 Used the Gemini 2.5 Flash API to extract structured data
🐍 Automated processing with Python
→ 📊 Result: for each season, an Excel dataset was generated

Additionally:

🧠 Standardized team names using Claude
📤 Uploaded the final dataset to Kaggle

💰 Cost vs. time:

💶 Processing ~3,000 PDFs (2–3 pages each): ~8 euros (~$9)
⏱️ Manual estimate: 1 minute per document → ~50 hours of work
💸 Estimated manual cost: ~$60

📈 Interactive visualization on Tableau:
https://public.tableau.com/app/profile/felipe.nunes.menegotto/viz/Publico_Renda_Brasileiro/Acumulado
➡️ Tableau is a Business Intelligence (BI) tool that transforms raw data into interactive dashboards, enabling better data analysis and data-driven decision-making.

📂 Dataset available on Kaggle:
https://www.kaggle.com/datasets/felipemenegotto/campeonato-brasileiro-de-futebol-pblico-e-renda

⚠️ As this is an automated extraction and an initial modeling approach, results may contain inconsistencies and can be further improved.

✨ This project reinforces something I’ve been seeing more and more: the real value comes from combining tools — not just picking “the best” one.

#DataScience #AI #MachineLearning #Automation #Football #Analytics #BigData #DataEngineering #Python #APIs #CloudComputing #GoogleCloud #GenerativeAI #LLMs #DataAnalytics #BusinessIntelligence #BI #DataViz #Dashboard #OpenData #Kaggle #SportsAnalytics #DataDriven #Tech #Innovation #AIProjects #ETL #DataPipeline

Description