Back to contents Shared PHP Python Ruby Choose a language:

ScraperWiki has some binaries installed that you can access from any language. You need to call them by spawning them as external processes.

If you would like us to add something that isn't listed here, please get in touch.

/usr/bin/pdftohtml, /usr/bin/pdftotext
Extract information from PDF (Portable Document Format) files. Other binaries from poppler-utils are available too. In Python, there is a helper scraperwiki.pdftoxml (see ScraperWiki library). docs: pdftohtml, pdftotext