I dove into open data from the city of Porto Alegre, Brazil — and the project evolved into a two-part data journey: an exploratory dashboard and a predictive Machine Learning application.
I aggregated over 325,000 ITBI records from Porto Alegre to build a deep analysis of the city’s real estate market.
But first — what is ITBI? The ITBI (Imposto sobre Transmissão de Bens Imóveis) is a Brazilian municipal tax levied on every real estate transfer, such as sales or other paid transactions. Since it applies to the transaction value, the ITBI dataset becomes a goldmine for understanding property market behavior.
Part 1: Exploratory Analysis (Tableau)
To visualize the market, I had to overcome a few data engineering challenges:
Spatial Alignment: The ITBI dataset and the shapefile provided by the city used different spellings for neighborhoods. I had to cross-reference, map, and correct these inconsistencies so the data would align properly on the map.
Inflation Adjustment: I consumed the Brazilian Central Bank’s API to pull IPCA (Brazil’s official inflation index) data and update apartment prices over time, enabling fair comparisons across different years.
LOD Calculations: Inside Tableau, I used Level of Detail (LOD) calculations to correctly compute the price per square meter. This locked the granularity to ensure the average price per m² reflected each individual transaction, rather than distorted aggregations driven by the dashboard layout.
Methodological Choice: For this dashboard, I decided not to remove outliers. Since these are official government records, extreme values most likely reflect data entry errors — removing them could misrepresent the raw dataset.
Part 2: Predictive Modeling (Machine Learning App)
Taking the analysis a step further, I asked: what if we could predict this value? To answer that, I built the Real Estate Transfer Value Simulator.
While the Tableau dashboard visualizes the raw data, training a reliable predictive model required a different approach. For this phase, I filtered the dataset strictly for apartments and removed the extreme outliers (keeping the central 95%), resulting in a clean, IPCA-adjusted training set of ~125,000 transactions.
Behind the scenes of the App:
The Engine: I trained an XGBoost regression model orchestrated through a Scikit-Learn pipeline (handling imputation and OneHotEncoding). The algorithm predicts the property’s market value based on its neighborhood, exact street, private area, floor, and construction year.
The Interface: Built with Streamlit and containerized with Docker, the web app provides an instant, dynamic user experience. The street inputs are filtered in real-time based on the selected neighborhood (via a dynamic JSON mapping).
Transparency: The app doesn’t just spit out a number. It features an expandable section explaining how the XGBoost algorithm works and displays the model’s performance metrics (R² and MAE) for full transparency.
Cookies necessários habilitam recursos essenciais do site, como login seguro e ajustes de preferências de consentimento. Eles não armazenam dados pessoais.
Nenhum
►
Cookies funcionais suportam recursos como compartilhamento de conteúdo em redes sociais, coleta de feedback e ativação de ferramentas de terceiros.
Nenhum
►
Cookies analíticos rastreiam as interações dos visitantes, fornecendo insights sobre métricas como contagem de visitantes, taxa de rejeição e fontes de tráfego.
Nenhum
►
Cookies de publicidade entregam anúncios personalizados com base em suas visitas anteriores e analisam a eficácia das campanhas publicitárias.
Nenhum
►
Cookies não classificados são aqueles que estamos em processo de classificar, junto com os provedores de cookies individuais.