Type scraper Language python Status Public
210 lines of code. 479,697 rows of data.
Created 2 years, 1 month ago.
Made this at the #jdcny event at Columbia. Scrapes school budgets from the NYC Department of Ed site, see for example http://schools.nyc.gov/AboutUs/funding/schoolbudgets/GalaxyAllocationFY2010.htm?BSSS_INPUT=M411 Tricky thing is the markup changed slightly each year… and getting a list of school IDs. Thanks to Julian Todd for making category part of the row key, this gives a more useful table with tons of rows and not so many columns. Needs more work to categorize the categories into "major" categories ...