Back to contents PHP Python Ruby Choose a language:


Each of these tutorials has its instructions and example code in one file that you edit, learn, and run entirely from your browser.

Python scrapers

How to Write a Screen Scraper: 1

Start here: check the ScraperWiki interface is working, then learn how to download a web page.

How to Write a Screen Scraper: 2

Slightly more advanced: scrape data from raw HTML, and save it to the ScraperWiki datastore.

Advanced Scraping: Pages Behind Forms

We use the powerful Mechanize library to simulate the action of a browser, including all the cookies

Advanced Scraping: Excel Files

Scrape Excel files using the xlrd library.

Advanced Scraping: lxml

Demonstrates the use of lxml – an alternative to BeautifulSoup that is particularly useful for selecting elements by CSS class.

How to Write a Screen Scraper: 3

Doing it again and again: following 'next' links to scrape multiple pages.

Advanced Scraping: PDFs

Scrape PDF files using ScraperWiki's pdftoxml library.

Advanced Scraping: .ASPX Pages

Pages ending in .aspx can be a nightmare. They are often impossible to solve until you learn all the tricks presented.

Python views

Simple table of values
Create a simple table of values from a scraper output.
Analysis of values
Quickly analyze the values of a datastore to find which ones are numeric, and which are choices from a small set of alternatives.