Sarah Booker

29/11/2010

Creating something visually stimulating from data #hhhrbi

Filed under: journalism,technical — Sarah Booker Lewis @ 12:56 pm
Tags: , , , , ,

We were quite a large group to start with, so we’ve ended up splitting in two. One group is working on scraping details of registered care homes, and I’m in a group working on information gathered but creating an interesting and informative visual.

Our first battle was making sure Scraperwiki could read our data so we could work with it.

First of all I uploaded to Google docs, but the comma separated values (CSV) scraper didn’t like it. Then when the spreadsheet was published as a web page, as suggested by  it still wasn’t happy because it wanted to be signed into Google.

Matt suggested putting the CSV onto his server, so I exported it and sent it over to him.

Francis Irving also suggested scraping What Do They Know, because it was Freedom of Information dat.

After much fiddling Matt managed to pull out the raw data by popping (pulling from the top of the list) and using a Python scraper.

It turned out the data we had was so unstructured it wasn’t possible to work with it.

After lunch we’re working on a different project.

Introduction to ScraperWiki #hhhrbi

Filed under: journalism,technical — Sarah Booker Lewis @ 10:44 am
Tags: , , , , , , , ,

Francis Irving of Scraperwiki explains how it works.

Take the Gulf oil spill. You can find a list of oil fields around the UK, but it’s all in a strange lump.

He shows a piece of Python code reading the oil field pages and turns it into a piece of data.

It’s quite simple to make a map view, but also code to make more complicated views.

Scraperwiki is automatic data conversion.

 

Scrape internet pages, Parser it, organise it, collect it and model it into a view. It will keep running and give the dataset constantly.

 

There are two kinds of journalism to use with the data. You can make tools, specific tools and find a story.

In Belfast took a list of historic houses in the UK. The data scraper looked through a host of websites, using Python, can use Ruby.
There are a multitude of visuals available. The Belfast project showed a spike in 1979, this was explained due to a political sectarian issue.

Answering a question, Francis confirms you can scrape more than one website at a time.

Francis would like to see more linked data and merging datasets together.

Asked about licensing for commercial use. Francis says it’s mainly used for public data. Scraperwiki blocks scraping Facebook because it’s private data, but the code can be adjusted.

Interested areas for projects today are: farming, local government budgets, public sector salaries, mapping chemical companies and distributors, environment, transport, road transport crime, truckstops map, energy data, countryprofile link to carbon emissions, e-waste, airline data, plastics data, empty shops, infotainment to make user interested in the data, another visualisation on companies ranking based on customer reviews, using the crowd to share information with data and create interesting information, data annotating content and enriching content, health data… and anything else we’re doing.

 

19/11/2010

Great people for journalists to follow on Twitter #ff

Alan Rusbridger‘s article today, Why Twitter matters for media organisations listed a great many reasons for using Twitter.

During my years on Twitter I have found it is a great way to learn and I continue to learn a great deal by following other digital journalists, educators and developers.

In an effort to help journalists stepping into the Twittersphere for the first time I have compiled a list of really useful people to follow and learn from.

Teaching and learning

Paul Bradshaw – Lecturer and social media consultant Online journalism blog – great tips  Twitter.com/ojblog

BBC Journalism College

Clay Shirky – Influential future media blogger

Glynn Mottershead – Journalism lecturer

Andy Dickinson – Online journalism lecturer and links; twitter.com/linkydickinson

Jeff Jarvis – The Buzz Machine blogger and journalism professor

Sue Llewellyn – BBC social media trainer and TV journo

Steve Yelvington – Newsroom trainer

Jay Rosen – Journalism lecturer at NYU

Roy Greenslade – City University, media commentator

Journalists

Alison Gow – Executive Editor, digital, for the Liverpool Daily Post & Liverpool Echo

Marc Reeves – The Business Desk, West Midlands

Richard Kendall – Web editor Peterborough Evening Telegraph

David Higgerson – Head of Multimedia, Trinity Mirror

Sam Shepherd – Bournemouth Echo digital projects

Jo WadsworthBrighton Argus web editor

Matt Cornish – journalist and author of Monkeys and Typewriters

Louise Bolotin – Journalist and hyperlocal blogger

Sarah Booker (me because I try to be useful)

Joanna Geary – Guardian digital development editor twitter.com/joannageary and  twitter.com/joannaslinks

Adam Tinworth –  Consultant and ex-Reed Business Information editorial development manager

Adam Westbrook – Lecturer and multimedia journalist

Patrick Smith – The Media Briefing

Shane Richmond – Telegraph Head of technology

Edward Roussel – Telegraph digital editor

Damian Thompson – Telegraph blogs editor

Kate Day – Telegraph communities editor

Ilicco Elia – Former Head of mobile Reuters

Sarah Hartley– Guardian local

Jemima Kiss – Guardian media/tech reporter

Kate Bevan – Guardian media/tech reporter

Josh Halliday – Media Guardian

Jessica Reid – Guardian Comment is Free

Charles Arthur – Tech Guardian editor

Heather Brooke – Investigative journalist, FOI campaigner

Kevin Anderson – Journalist, ex BBC, ex Guardian

Wannabehacks – Journalism students and trainees

Simon Rogers – Guardian data journalist and editor of the datastore

Jon Slattery – Journalist

Laura Oliver – Journalism.co.uk

Johann Hari – Journalist, The Independent (personal)

Guy Clapperton – Journalist and writer

Alan Rusbridger – Guardian editor

Specialists

George Hopkin – Seo evangelist

Nieman Journalism Lab – Harvard

Martin Belam – Guardian internet advisor

Tony Hirst – OU lecturer and data mash up artist

Christian Payne – Photography, video, mobile media

David Allen Green – Lawyer and writer

Judith Townend – Meeja Law & From the Online

Richard Pope – Scraperwiki director

Suw Charman-Anderson – social software consultant and writer

Scraperwiki – Data scraping and information

Chris Taggart – Founder of Openly Local and They Work for You

Suzanne Kavanagh – Publishing sector manager at Skillset, personal account

Greg Hadfield – Director of strategic projects at Cogapp, ex Fleet Streets

Francis Irving – Scraperwiki

Ben Goldacre – Bad Science

Philip John – Journal Local, Litchfield Blog,  twitter.com/hyperaboutlocal

David McCandless – Information is Beautiful

Flying Binary – Cloud computing and visual analytics

Rick Waghorn – Journalist and founder of Addiply

News sources

Journalism news

Journalism blogs

Mike ButcherTech Crunch UK

Richard MacManus – Read Write Web

The Media Blog

Press Gazette

Hold the Front Page

Mashable – Social media blog

Media Guardian

Guardian tech weekly

Paid Content

The Media Brief

BBC news

Channel4 news

Channel4 newsroom blogger

Sky News

House of Twits –  Houses of Parliament

Telegraph Technology

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,102 other followers