Each of these tutorials has its instructions and example code in one file that you edit, learn, and run entirely from your browser.
- How to Write a Screen Scraper: 1
Start here: check the ScraperWiki interface is working, then learn how to download a web page.
- How to Write a Screen Scraper: 2
Slightly more advanced: scrape data from raw HTML, and save it to the ScraperWiki datastore.
- Advanced Scraping: Pages Behind Forms
We use the powerful Mechanize library to simulate the action of a browser, including all the cookies
- Advanced Scraping: Excel Files
Scrape Excel files using the xlrd library.
- Advanced Scraping: lxml
Demonstrates the use of lxml – an alternative to BeautifulSoup that is particularly useful for selecting elements by CSS class.
- How to Write a Screen Scraper: 3
Doing it again and again: following 'next' links to scrape multiple pages.
- Advanced Scraping: PDFs
Scrape PDF files using ScraperWiki's pdftoxml library.
- Advanced Scraping: .ASPX Pages
Pages ending in .aspx can be a nightmare. They are often impossible to solve until you learn all the tricks presented.