Back to contents PHP Python Ruby Choose a language:

Every scraper comes with its own SQLite database which it can store data to. You can also read data from other scrapers.

Saving data, basic

Easy save function for most use.

ScraperWiki::save_sqlite(unique_keys=["a"], data={"a"=>1, "bbb"=>"Hi there"})

You can use symbols for keys if you prefer (new LXC sandbox only).

ScraperWiki::save_sqlite(unique_keys=[:a], data={:a=>1, :bbb=>"Hi there"})

If the values for the unique_keys matches a record already there, it will over-write.

ScraperWiki::save_sqlite(unique_keys=["a"], data={"a"=>1, "bbb"=>"Bye there"})

You can add new columns into the database and the table will extend automatically. (The print is so you can see the comment.)

require 'pp' pp ScraperWiki::save_sqlite(unique_keys=["a"], data={"a"=>2, "bbb"=>"Going home", "cccnew"=>-999.9})

Saving data, advanced

Each new column is given an affinity according to the type of the value it is first given (text, integer, real). It is okay to save a string in a column that was defined as an integer, but it will sometimes be converted if possible. You can define a column with no affinity by giving it the name ending in "_blob".

ScraperWiki::save_sqlite(unique_keys=["a"], data={"a"=>1, "dddd_blob"=>"999.999"}) pp ScraperWiki::save_sqlite("swdata")

Further parameters in the save function are table_name (the default table name is "swdata"), and verbose (which doesn't send messages to the data tab if set to 0

ScraperWiki::save_sqlite(unique_keys, data, table_name="swdata", verbose=2)

You can also list a list of dicts in the save for greater speed

data = { {"a"=>10}, {"a"=>20}, {"a"=>30} ] ScraperWiki::save_sqlite(["a"], data)

Saving data, variables

It's often useful to be able to quickly and easily save one metadata value. For example, to record which page the last run of the scraper managed to get up to. ScraperWiki::save_var('last_page', 27) puts ScraperWiki::get_var('last_page')

It's stored in a simple table called swvariables.

Finding out the schema

To see the dict of table_names mapping to schemas.

pp ScraperWiki::show_tables()

Info about a particular table (and its elements) can be queried.

info = ScraperWiki::table_info(name="swdata") for column in info puts +" "+ column.type end

Direct SQL for saving

You can execute direct SQL commands. Back-ticks ` are used to quote column names that are have spaces in them.

ScraperWiki::sqliteexecute("create table ttt (xx int, `yy` string)") ScraperWiki::sqliteexecute("insert into ttt values (?,?)", [9, 'hello']) ScraperWiki::sqliteexecute("insert or replace into ttt values (:xx, :yy)", {"xx"=>10, "yy"=>"again"})

Don't forget after doing your inserts you need to commit the result. (The save() command always automatically commits.)


Direct SQL for selecting

Selection can be done by execution of a select function.

pp ScraperWiki::sqliteexecute("select * from ttt") pp ScraperWiki::sqliteexecute("select min(xx), max(xx) from ttt group by yy")

The result will be a dict with a list for keys, and a list of rows (which are lists) for the corresponding values.

{ "keys"=> ["xx", "yy"], data=>[[9, 'hello'], [10, 'again']] }

The shorthand select command gives the results in dicts.

print"* from ttt") [{'yy'=> 'hello', 'xx'=> 9}, {'yy'=>'again', 'xx'=>10}]

Direct SQL for modifying schemas

You can also clean up by deleting rows or dropping tables

ScraperWiki::sqliteexecute("delete from ttt where xx=9") ScraperWiki::sqliteexecute("drop table if exists ttt") ScraperWiki::commit()

There's also a "clear datastore" button on the scraper page, which is useful for starting again during development if the schema is in a mess.

If you like, you can completely ignore the ScraperWiki save command, and construct all your schemas explicitly.

Reading data from other scrapers

To access data from other scrapers we attach to them, using their shortname (the name in the URL).

ScraperWiki::attach("new_americ_foundation_drone_strikes") pp ScraperWiki::select("* from new_americ_foundation_drone_strikes.swdata limit 2")

To make it easy, you can change the name of the database you import it as.

ScraperWiki::attach("new_americ_foundation_drone_strikes", "src") pp ScraperWiki::table_info("src.swdata")

Access to other scrapers data through the attach interface is read-only.