This project is a set of Python scripts that record and display trends in Craigslist posts, specifically rental housing around the Vancouver/UBC area.
By setting a URL it can scrape listings in any geographical area. The listings are saved to a tinyDB instance (basically JSON file) and can be loaded into a Pandas table for analysis. The script can extract almost all of the information from Craigslist posts including tagged geographic location in terms of coordinates. This scraper can be used as a first initial step in a machine learning pipeline to find good listings.
Personally, I used this set of scripts to find outliers in terms of rental units near UBC. I found a nice place one standard deviation below the mean in terms of price!
The project is split into a “scraper” and “report generator”. The scraper can be set up for standalone use using a cron job and the report generator can automatically query the latest file to make current reports. If you have a server to mess around with you can set up the scraper there and run the report generator on your local machine. The readme tells you how to set that up.