Real Estate – Porto Alegre

I dove into open data from the city of Porto Alegre, Brazil — and the project evolved into a two-part data journey: an exploratory dashboard and a predictive Machine Learning application.

I aggregated over 325,000 ITBI records from Porto Alegre to build a deep analysis of the city’s real estate market.

But first — what is ITBI? The ITBI (Imposto sobre Transmissão de Bens Imóveis) is a Brazilian municipal tax levied on every real estate transfer, such as sales or other paid transactions. Since it applies to the transaction value, the ITBI dataset becomes a goldmine for understanding property market behavior.

Part 1: Exploratory Analysis (Tableau)

To visualize the market, I had to overcome a few data engineering challenges:

  • Spatial Alignment: The ITBI dataset and the shapefile provided by the city used different spellings for neighborhoods. I had to cross-reference, map, and correct these inconsistencies so the data would align properly on the map.

  • Inflation Adjustment: I consumed the Brazilian Central Bank’s API to pull IPCA (Brazil’s official inflation index) data and update apartment prices over time, enabling fair comparisons across different years.

  • LOD Calculations: Inside Tableau, I used Level of Detail (LOD) calculations to correctly compute the price per square meter. This locked the granularity to ensure the average price per m² reflected each individual transaction, rather than distorted aggregations driven by the dashboard layout.

  • Methodological Choice: For this dashboard, I decided not to remove outliers. Since these are official government records, extreme values most likely reflect data entry errors — removing them could misrepresent the raw dataset.

Check out the interactive dashboard here: https://public.tableau.com/app/profile/felipe.nunes.menegotto/viz/ITBI-IPCA/dash_m2?publish=yes

Part 2: Predictive Modeling (Machine Learning App)

Taking the analysis a step further, I asked: what if we could predict this value? To answer that, I built the Real Estate Transfer Value Simulator.

While the Tableau dashboard visualizes the raw data, training a reliable predictive model required a different approach. For this phase, I filtered the dataset strictly for apartments and removed the extreme outliers (keeping the central 95%), resulting in a clean, IPCA-adjusted training set of ~125,000 transactions.

Behind the scenes of the App:

  • The Engine: I trained an XGBoost regression model orchestrated through a Scikit-Learn pipeline (handling imputation and OneHotEncoding). The algorithm predicts the property’s market value based on its neighborhood, exact street, private area, floor, and construction year.

  • The Interface: Built with Streamlit and containerized with Docker, the web app provides an instant, dynamic user experience. The street inputs are filtered in real-time based on the selected neighborhood (via a dynamic JSON mapping).

  • Transparency: The app doesn’t just spit out a number. It features an expandable section explaining how the XGBoost algorithm works and displays the model’s performance metrics (R² and MAE) for full transparency.

Try the live simulator here: https://imovelpoa.fmind.app/


#DataScience #MachineLearning #DataAnalytics #Tableau #OpenData #DataViz #XGBoost #Streamlit #RealEstate #PropertyMarket #ITBI #PortoAlegre #Brazil #Python #ETL #DataAnalyst #LOD

Description