Vladislav Pomogaev

Engineering Physics Graduate

Craigslist Housing Scraper in Python

This project is a set of Python scripts that record and display trends in Craigslist posts, specifically rental housing around the Vancouver/UBC area.

By setting a URL it can scrape listings in any geographical area. The listings are saved to a tinyDB instance (basically JSON file) and can be loaded into a Pandas table for analysis. The script can extract almost all of the information from Craigslist posts including tagged geographic location in terms of coordinates. This scraper can be used as a first initial step in a machine learning pipeline to find good listings.

Personally, I used this set of scripts to find outliers in terms of rental units near UBC. I found a nice place one standard deviation below the mean in terms of price!

The project is split into a “scraper” and “report generator”. The scraper can be set up for standalone use using a cron job and the report generator can automatically query the latest file to make current reports. If you have a server to mess around with you can set up the scraper there and run the report generator on your local machine. The readme tells you how to set that up.

See the GitHub repository here.

Average price of added and removed listings can possibly an indicator for future short-term trends.
Very loose correlation between these two variables.
Simple distribution graphs can be used to find outliers.
Active listings count can be used to find good times to start looking for a place. Students tend to look for places about a month before semester start.