Each of these tutorials has its instructions and example code in one file that you edit, learn, and run entirely from your browser.
- How to Write a Screen Scraper: 1
Start here: check the ScraperWiki interface is working, then learn how to download a web page.
- How to Write a Screen Scraper: 2
Slightly more advanced: scrape data from raw HTML, and save it to the ScraperWiki datastore.
- How to Write a Screen Scraper: 3
Doing it again and again: following 'next' links to scrape multiple pages.
- Advanced Scraping: .ASPX Pages
Scrape ASP.NET web pages (with an .aspx extension) using the Mechanize library.
- Advanced Scraping: Pages Behind Forms
Scrape pages behind forms: using the Mechanize library.
- Advanced Scraping: Excel Files
Scrape Excel files using the spreadsheet library.
- Advanced Scraping: CSV files
Scrape CSV files using the fastercsv library.
- Advanced Scraping: PDFs
Scrape PDF files using PDF::Reader.
- Simple table of values
- Create a simple table of values from a scraper output.