Audio slideshow to illustrate Preston protests. Telling the story in a different way.
How your money is really being spent.
Wanted to look at local government spending in various areas. Looked at government account figures published on Number 10 website.
Government temp spending is triple it’s own staff budget
Found page with 190,000 individual data entires.
Had someone writing in java, one in Ruby and found the data was a bit rubbish.
Date columns were not filled, or had the number of days since 1900.
Cannot trust the data, have to ask if it’s correct and can be validated.
Had a massive amount of data, tried to break through agency and temp staff. Cutting back a massive spreadsheet.
Used Zoho(?) where you can see things pretty quickly.
Visuals created once separated the costs. Need to dig deep into the data to find the quirks.
Taking home to learn the accuracy of data, structured database, other axis of investigation, getting data clean, automatically updating.
Is it worth it?
Took extensive salary data.
Put in location and job and then the function shows if it’s worth living there.
A Welsh teacher earning £45,000, not competitive.
Someone in London working as an accountant at £45,000, data showed 16 applicants per job making it a 50/50.
From the initial data service a map was created where you can choose a function, a job title and a region to find out visually whether a job is worth it. It pushes down per region.
Can also zone in to regions using a slider system.
A splendid and complex visualisation. (The winner)
Started with the idea of truck stops and which ones were safe.
Started looking for data on the Highways Agency site and found it wanting.
Found a map with decent truck stop sites.
Had the xml source and started to develop a scraper on Scraperwiki and got a view on Google maps.
Plotted all the points. Letter on the point shows how safe by analysing which ones had CCTV and various security measures.
Further on wanted to find out more about truck crime. Looked ast the TruckPol website and took the data from PDFs and put in a spreadsheet.
Updated the view with the information about crimes. Red ones not so great, blue are good and a purple is okay.
(Winner of the best scraper award from Scraperwiki and third place overall).
Take over watch
UK Takeover panel was the prime source of information showing all take overs in play. The aim was to create something to provide details about companies.
Had scraped data but needed to add sector and revenue to create context.
Also used Investigate.co.uk
Had a live table showing activity from the last two days.
Have different sectors and can pull information out to see what’s happening in different areas
Creates a map showing areas affected by snow and see where the nearest snow hole is. (See snow hole blog)
How people move around the chemical world
Used Google Refine to play with the data. Pulled out the geocode to map where the companies were.
Google Fusion also used.
Top 100 chemical companies. Merged Google finance information with Isis.
Created a visual showing how sales had gone down with the chemical industry sales halving from 2007-08.
After spending the morning running before we could walk the team I’m in, Mike Beardmore, Dominic Clay, Matt Holmes and I have discussed putting together something simple.
Matt had used C (sharp) to pull out all the #uksnow tweets and we plan to create a mash-up map using the Highways Agency RSS feed to build a regularly updated map.
The first process was removing all the non-post code tweets.
Mike has also suggested mashing the #uksnow with a re-written scrape on Scraperwiki with details of Harvester restaurants in the UK.
However, we had an issue with the Twitter as too much information was coming in at once. It’s snowing and #uksnow is a popular hashtag and the API couldn’t deal with it.
Mike took Twitter feed for #UK with the post code, extracted the postcode, took the Scraperwiki feed of Harvesters and extracted the post code from those, and created a datafile formatted as XML so it would show up on Google maps.
The plan is for a pointer showing the location of a restaurant in the snow, a snow hole, providing warmth.
Mike has managed to get it to work on his own because he’s very capable with the code and produced a map showing areas where heavy snow is reported and the location of the Harvester restaurants nearby.
The potential future for this map would be to show a wide variety of restaurants, service areas and places offering shelter to people who find themselves trapped by snow while travelling.
We were quite a large group to start with, so we’ve ended up splitting in two. One group is working on scraping details of registered care homes, and I’m in a group working on information gathered but creating an interesting and informative visual.
Our first battle was making sure Scraperwiki could read our data so we could work with it.
First of all I uploaded to Google docs, but the comma separated values (CSV) scraper didn’t like it. Then when the spreadsheet was published as a web page, as suggested by it still wasn’t happy because it wanted to be signed into Google.
Francis Irving also suggested scraping What Do They Know, because it was Freedom of Information dat.
After much fiddling Matt managed to pull out the raw data by popping (pulling from the top of the list) and using a Python scraper.
It turned out the data we had was so unstructured it wasn’t possible to work with it.
After lunch we’re working on a different project.
Francis Irving of Scraperwiki explains how it works.
Take the Gulf oil spill. You can find a list of oil fields around the UK, but it’s all in a strange lump.
He shows a piece of Python code reading the oil field pages and turns it into a piece of data.
It’s quite simple to make a map view, but also code to make more complicated views.
Scraperwiki is automatic data conversion.
Scrape internet pages, Parser it, organise it, collect it and model it into a view. It will keep running and give the dataset constantly.
There are two kinds of journalism to use with the data. You can make tools, specific tools and find a story.
In Belfast took a list of historic houses in the UK. The data scraper looked through a host of websites, using Python, can use Ruby.
There are a multitude of visuals available. The Belfast project showed a spike in 1979, this was explained due to a political sectarian issue.
Answering a question, Francis confirms you can scrape more than one website at a time.
Francis would like to see more linked data and merging datasets together.
Asked about licensing for commercial use. Francis says it’s mainly used for public data. Scraperwiki blocks scraping Facebook because it’s private data, but the code can be adjusted.
Interested areas for projects today are: farming, local government budgets, public sector salaries, mapping chemical companies and distributors, environment, transport, road transport crime, truckstops map, energy data, countryprofile link to carbon emissions, e-waste, airline data, plastics data, empty shops, infotainment to make user interested in the data, another visualisation on companies ranking based on customer reviews, using the crowd to share information with data and create interesting information, data annotating content and enriching content, health data… and anything else we’re doing.
There is always a hashtag search buzzing away in the background while at work.
It’s another window, or two, or three, on Tweetdeck monitoring my interest du jour.
The tag maybe related to a journalism or social media conference I’m interested in, or a trending topic, it changes.
Yesterday (Friday, November 27) there was a great deal of activity on the #demo2010 tag as students started occupying more universities, and tweets were full of pictures and videos from demonstrations on Wednesday, November 24.
After updating colleagues on which of their old unis were taken over by students one asked me: “How do you know this stuff and find it on Twitter?”
Then I explained how I followed the hashtag. It’s a simple way to find everything posted on a particular theme, topic or event.
Hence his final comment: “So that’s how a hashtag works…”
Francis talking about two different stories on the internet.
It used to be the case you had to check the division list to find out how MPs voted.
Created a web scraper pulling out the information and created The Public Whip, showing how MPs voted.
Have to be a parliament nerd to understand, even when it’s broken down.
They Work for You simplifies the information even more, it tells you something about your MP.
Bring the division information together. Take a list from public whip and create a summary of how they voted.
Checking how one MP voted on the Iraq War. Voted with the majority in favour of the war on three votes and abstained from the first and then the final three. It’s almost a deal with electorate.
MP asked to have “voted moderately” removed because found it misleading. A number of MPs have complained, but checked the votes.
Richard Pope founder of Scraperwiki made a website after the demolition of his local pub (a fine-looking establishment called The Queen) and created Planning Alerts.com website.
It helps people access information from outside the immediate catchment area. He wrote lots of web scrapers. Example of different councils’ planning application systems.
Scraperwiki is like Wikipedia but for data. It’s a technical product for use when you’re not technical. Can look at different data scrapers and copy what others are doing without learning Pearl or Python.
Planning Alerts is being moved over to Scraperwiki. Can tag it on Scraperwiki and find information. Can find stories and in-depth information.
Can request a dataset and have something built for you.
Francis was asked, is it legal? In the UK if it’s public data, not for sale, you can reuse it. Would take things down if asked, but it’s open stuff.
Could it be stopped? Would be ill-advised to stop people, and journalists, reading public information.
Public whip and They work for you, look at numerous votes.
Looking at ways to fund it such as private scrapers, or scrapers in a cocoon. Looking at white label for intranet use. There’s a market for data and developers who want to give data. Want to match developers with data. Currently funded by Channel4. Want to remain free for the public.
Does it make people lazy? No, it’s already published but it makes it easier. Movement of people trying to get publishers of data to change. Always a need to pull out in a variety of formats.
Running Hacks and Hackers days working together finding stories and hunting around.
Have had data scraped from What do they know site.