Sarah Booker

29/11/2010

Introduction to ScraperWiki #hhhrbi

Filed under: journalism,technical — Sarah Booker Lewis @ 10:44 am
Tags: , , , , , , , ,

Francis Irving of Scraperwiki explains how it works.

Take the Gulf oil spill. You can find a list of oil fields around the UK, but it’s all in a strange lump.

He shows a piece of Python code reading the oil field pages and turns it into a piece of data.

It’s quite simple to make a map view, but also code to make more complicated views.

Scraperwiki is automatic data conversion.

 

Scrape internet pages, Parser it, organise it, collect it and model it into a view. It will keep running and give the dataset constantly.

 

There are two kinds of journalism to use with the data. You can make tools, specific tools and find a story.

In Belfast took a list of historic houses in the UK. The data scraper looked through a host of websites, using Python, can use Ruby.
There are a multitude of visuals available. The Belfast project showed a spike in 1979, this was explained due to a political sectarian issue.

Answering a question, Francis confirms you can scrape more than one website at a time.

Francis would like to see more linked data and merging datasets together.

Asked about licensing for commercial use. Francis says it’s mainly used for public data. Scraperwiki blocks scraping Facebook because it’s private data, but the code can be adjusted.

Interested areas for projects today are: farming, local government budgets, public sector salaries, mapping chemical companies and distributors, environment, transport, road transport crime, truckstops map, energy data, countryprofile link to carbon emissions, e-waste, airline data, plastics data, empty shops, infotainment to make user interested in the data, another visualisation on companies ranking based on customer reviews, using the crowd to share information with data and create interesting information, data annotating content and enriching content, health data… and anything else we’re doing.

 

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,102 other followers