Sarah Booker


The Iraq War logs – How data drove the story (@jamesrbuk) #hhldn

Filed under: journalism — Sarah Booker Lewis @ 7:45 pm
Tags: , , , , , , , , , ,

James Ball from the Bureau of Investigative Journalism

He was the chief data analyst for Dispatches and Al Jazeera by turning the logs into English to help journalists working on the programmes.

Stories on torture; civilian deaths at US checkpoints; 109,032 dead; 183,991 one in 5o detailed; 1,3oo allegations of torture against Iraqi troops, 30p against American forces

US helicopters gun down surrendering insurgents.

US claim to have killed 103 civilians.

Getting the data..; Freedom of INformation Act, Web scrapers ( or turn up at an undisclosed location, at 1am on Sunday, and told not to go straight home after picking up a USB stick.

It was a 400mb text file. Almost 400,000 documents and almost 40 million words of dense military jargon.

Couldn’t read it or open it up. It’s a data cleaning problem. Had a text file, a comma separated file and these did not work. Dates creeping into wrong columns.

Had to scrap and look at MySQL file. Used UltraEdit and worked really well.

To turn it into something workable was knocking off bits of code.

Dates didn’t work, also inconsistent. Find Google Refine a useful new tool to clean up information.

Old Excel cut off so you can see more than a scrap. Needed to find a way to help people view it when had limited number of computers to look at it.

Low tech solutions were small PDFs but these were really helpful.

Always asked what data looks like, so by exporting sections as 800 page PDFs it worked to give something for people to see. Not good for data crunching, but good for reading several hundred reports. Worked well for reporters, particularly when looking at a specific area or torture records.

Used mail merge as a handy way to free out the data.

Ran a MySQL database and got a tech person to build a web interface.

War Logs diary dig is very neat but it’s not the best thing.

Searching for information such as escalation of force, or blue on white, find few reports. Search for friendly actions, find more. These are attacks with civilian categories.

Asking the right questions and searching brought out the right stories. Had to be so sure asking data the right question.

Searched for Prime Minister’s name. Found out more about stories already reportered. Data had it from the in-depth. Covered all areas, not just limited to where the few journalists were embedded.

Used great software to show incidents over periods of time. Colour coded to show deaths, civilians, enemies, police, friendlys etc.

Ten thousand killed through ethnic cleansing murders. More people killed in murders than IED explosions, found in data.

Discovered a category of incident marked as causing media outcry. – Tutorial.

Used Tableau to see data. Limit to free version of up to 100,000 records.

Searches of the data found civilians killed at checkpoints due to car bombs exploding.  Had people reading 800 reports to get the real story behind the numbers, too.

Found was great to use, particularly visually without worrying about code.

People liked word highlights and PDF was the best way to use it.

Used the data as part of the research. Didn’t think, let’s do maps and data images, but did.

Had maps showing where fatal incidents happened.

Powerful information, especially when you pull out from central Baghdad.

Team on the ground went out to Baghdad talking to people for Dispatches.

All the data was geocoded. Took an area and pulled out every report from the area. Used in a map view to see what had happened.

The map helped reporters speak to people on the ground.

Had video of man in a white sedan come out of his vehicle who was then gunned down by an Apache. Found the report in the Iraq log mentioning the sedan using geodata. Report didn’t show the driver getting out and surrendering, the video did.

Checking details found it was within range of Apache, and lawyer cleared the footage for Dispatches.

Information tells story that doesn’t look like a data story. Man shot while surrendering is a stronger story, although he had a mortar tube in his car.

It wasn’t found with clever tricks but 10 weeks, with 25 people reading detailed reports working more than 18 hours a day. 30,000 reports read in detail. 5,000 read closely.


Richard Dixon from The Times asks if the leak will make this type of data more difficult to come across and unlock.

James suggests not because of the way it was leaked.

Francis Irving asked who paid? Funding from the David and Elaine Potter foundation. Dispatches paid a standard fee. Also took a fee from Al Jazeera. This gave a budget to cover research.

Mechanical Turk used for mundane repeat tasks, but something like this is too sensitive for farming out to different nationalities. Needed researchers who were trusted and had been working on it for some time because the information was so sensitive.


Judith Townend asked if there were issues with mainstream media taking up the story. James said it was difficult but explaining the data and making it clear helped. Put across idea it’s battlefield data but trust the data. The numbers change as you’re going through in data journalism.

As people became more comfortable with it, it didn’t become difficult to ‘sell’ at all.

Bureau of Investivative Journalism put all information, maps, animations on the web. Also put the raw data, heavily redacted, online. Wikileaks put it all online.

links for 2010-11-24

Filed under: Links — Sarah Booker Lewis @ 6:22 pm


links for 2010-11-23

Filed under: Links — Sarah Booker Lewis @ 6:01 pm


links for 2010-11-22

Filed under: Links — Sarah Booker Lewis @ 6:12 pm


links for 2010-11-21

Filed under: Links — Sarah Booker Lewis @ 6:01 pm


links for 2010-11-20

Filed under: Links — Sarah Booker Lewis @ 6:10 pm


Great people for journalists to follow on Twitter #ff

Alan Rusbridger‘s article today, Why Twitter matters for media organisations listed a great many reasons for using Twitter.

During my years on Twitter I have found it is a great way to learn and I continue to learn a great deal by following other digital journalists, educators and developers.

In an effort to help journalists stepping into the Twittersphere for the first time I have compiled a list of really useful people to follow and learn from.

Teaching and learning

Paul Bradshaw – Lecturer and social media consultant Online journalism blog – great tips

BBC Journalism College

Clay Shirky – Influential future media blogger

Glynn Mottershead – Journalism lecturer

Andy Dickinson – Online journalism lecturer and links;

Jeff Jarvis – The Buzz Machine blogger and journalism professor

Sue Llewellyn – BBC social media trainer and TV journo

Steve Yelvington – Newsroom trainer

Jay Rosen – Journalism lecturer at NYU

Roy Greenslade – City University, media commentator


Alison Gow – Executive Editor, digital, for the Liverpool Daily Post & Liverpool Echo

Marc Reeves – The Business Desk, West Midlands

Richard Kendall – Web editor Peterborough Evening Telegraph

David Higgerson – Head of Multimedia, Trinity Mirror

Sam Shepherd – Bournemouth Echo digital projects

Jo WadsworthBrighton Argus web editor

Matt Cornish – journalist and author of Monkeys and Typewriters

Louise Bolotin – Journalist and hyperlocal blogger

Sarah Booker (me because I try to be useful)

Joanna Geary – Guardian digital development editor and

Adam Tinworth –  Consultant and ex-Reed Business Information editorial development manager

Adam Westbrook – Lecturer and multimedia journalist

Patrick Smith – The Media Briefing

Shane Richmond – Telegraph Head of technology

Edward Roussel – Telegraph digital editor

Damian Thompson – Telegraph blogs editor

Kate Day – Telegraph communities editor

Ilicco Elia – Former Head of mobile Reuters

Sarah Hartley– Guardian local

Jemima Kiss – Guardian media/tech reporter

Kate Bevan – Guardian media/tech reporter

Josh Halliday – Media Guardian

Jessica Reid – Guardian Comment is Free

Charles Arthur – Tech Guardian editor

Heather Brooke – Investigative journalist, FOI campaigner

Kevin Anderson – Journalist, ex BBC, ex Guardian

Wannabehacks – Journalism students and trainees

Simon Rogers – Guardian data journalist and editor of the datastore

Jon Slattery – Journalist

Laura Oliver –

Johann Hari – Journalist, The Independent (personal)

Guy Clapperton – Journalist and writer

Alan Rusbridger – Guardian editor


George Hopkin – Seo evangelist

Nieman Journalism Lab – Harvard

Martin Belam – Guardian internet advisor

Tony Hirst – OU lecturer and data mash up artist

Christian Payne – Photography, video, mobile media

David Allen Green – Lawyer and writer

Judith Townend – Meeja Law & From the Online

Richard Pope – Scraperwiki director

Suw Charman-Anderson – social software consultant and writer

Scraperwiki – Data scraping and information

Chris Taggart – Founder of Openly Local and They Work for You

Suzanne Kavanagh – Publishing sector manager at Skillset, personal account

Greg Hadfield – Director of strategic projects at Cogapp, ex Fleet Streets

Francis Irving – Scraperwiki

Ben Goldacre – Bad Science

Philip John – Journal Local, Litchfield Blog,

David McCandless – Information is Beautiful

Flying Binary – Cloud computing and visual analytics

Rick Waghorn – Journalist and founder of Addiply

News sources

Journalism news

Journalism blogs

Mike ButcherTech Crunch UK

Richard MacManus – Read Write Web

The Media Blog

Press Gazette

Hold the Front Page

Mashable – Social media blog

Media Guardian

Guardian tech weekly

Paid Content

The Media Brief

BBC news

Channel4 news

Channel4 newsroom blogger

Sky News

House of Twits –  Houses of Parliament

Telegraph Technology

links for 2010-11-19

Filed under: Links — Sarah Booker Lewis @ 6:20 pm


links for 2010-11-18

Filed under: Links — Sarah Booker Lewis @ 6:01 pm


links for 2010-11-17

Filed under: Links — Sarah Booker Lewis @ 6:01 pm
« Previous PageNext Page »