Sarah Booker

14/01/2011

Missing out on selling to the nationals

Filed under: journalism — Sarah Booker Lewis @ 11:46 am
Tags: , , , , , ,

My colleague Alex Therrien has found out the hard way about selling on a great story.

Picking up the phone paid off for him with a great story about Tammy Page from Worthing who  had the tip of her finger bitten off by a fox.

It was such a great, quirky and topical tale a local agency picked it up and sold it on to the Daily Mail and The Sun.

The Worthing Herald hits the streets on a Wednesday afternoon, despite the Thursday publishing date, so any reporter who wants to make some much-needed cash needs to get on the phone pretty quickly before someone else jumps in.

At least Alex discovered he is not alone as all of us could list great stories we had just been too late to sell but had made it to the nationals.

29/11/2010

Introduction to ScraperWiki #hhhrbi

Filed under: journalism,technical — Sarah Booker Lewis @ 10:44 am
Tags: , , , , , , , ,

Francis Irving of Scraperwiki explains how it works.

Take the Gulf oil spill. You can find a list of oil fields around the UK, but it’s all in a strange lump.

He shows a piece of Python code reading the oil field pages and turns it into a piece of data.

It’s quite simple to make a map view, but also code to make more complicated views.

Scraperwiki is automatic data conversion.

 

Scrape internet pages, Parser it, organise it, collect it and model it into a view. It will keep running and give the dataset constantly.

 

There are two kinds of journalism to use with the data. You can make tools, specific tools and find a story.

In Belfast took a list of historic houses in the UK. The data scraper looked through a host of websites, using Python, can use Ruby.
There are a multitude of visuals available. The Belfast project showed a spike in 1979, this was explained due to a political sectarian issue.

Answering a question, Francis confirms you can scrape more than one website at a time.

Francis would like to see more linked data and merging datasets together.

Asked about licensing for commercial use. Francis says it’s mainly used for public data. Scraperwiki blocks scraping Facebook because it’s private data, but the code can be adjusted.

Interested areas for projects today are: farming, local government budgets, public sector salaries, mapping chemical companies and distributors, environment, transport, road transport crime, truckstops map, energy data, countryprofile link to carbon emissions, e-waste, airline data, plastics data, empty shops, infotainment to make user interested in the data, another visualisation on companies ranking based on customer reviews, using the crowd to share information with data and create interesting information, data annotating content and enriching content, health data… and anything else we’re doing.

 

27/11/2010

So that’s how a hashtag works…

Filed under: Twitter — Sarah Booker Lewis @ 9:45 am
Tags: , ,

There is always a hashtag search buzzing away in the background while at work.

It’s another window, or two, or three, on Tweetdeck monitoring my interest du jour.

The tag maybe related to a journalism or social media conference I’m interested in, or a trending topic, it changes.

Yesterday (Friday, November 27) there was a great deal of activity on the #demo2010 tag as students started occupying more universities, and tweets were full of pictures and videos from demonstrations on Wednesday, November 24.

After updating colleagues on which of their old unis  were taken over by students one asked me: “How do you know this stuff and find it on Twitter?”

Then I explained how I followed the hashtag. It’s a simple way to find everything posted on a particular theme, topic or event.

Hence his final comment: “So that’s how a hashtag works…”

24/11/2010

The Iraq War logs – How data drove the story (@jamesrbuk) #hhldn

Filed under: journalism — Sarah Booker Lewis @ 7:45 pm
Tags: , , , , , , , , , ,

James Ball from the Bureau of Investigative Journalism

He was the chief data analyst for Dispatches and Al Jazeera by turning the logs into English to help journalists working on the programmes.

Stories on torture; civilian deaths at US checkpoints; 109,032 dead; 183,991 one in 5o detailed; 1,3oo allegations of torture against Iraqi troops, 30p against American forces

US helicopters gun down surrendering insurgents.

US claim to have killed 103 civilians.

Getting the data.. Data.gov.uk; Freedom of INformation Act, Web scrapers (ScraperWiki.com) or turn up at an undisclosed location, at 1am on Sunday, and told not to go straight home after picking up a USB stick.

It was a 400mb text file. Almost 400,000 documents and almost 40 million words of dense military jargon.

Couldn’t read it or open it up. It’s a data cleaning problem. Had a text file, a comma separated file and these did not work. Dates creeping into wrong columns.

Had to scrap and look at MySQL file. Used UltraEdit and worked really well.

To turn it into something workable was knocking off bits of code.

Dates didn’t work, also inconsistent. Find Google Refine a useful new tool to clean up information.

Old Excel cut off so you can see more than a scrap. Needed to find a way to help people view it when had limited number of computers to look at it.

Low tech solutions were small PDFs but these were really helpful.

Always asked what data looks like, so by exporting sections as 800 page PDFs it worked to give something for people to see. Not good for data crunching, but good for reading several hundred reports. Worked well for reporters, particularly when looking at a specific area or torture records.

Used mail merge as a handy way to free out the data.

Ran a MySQL database and got a tech person to build a web interface.

War Logs diary dig is very neat but it’s not the best thing.

Searching for information such as escalation of force, or blue on white, find few reports. Search for friendly actions, find more. These are attacks with civilian categories.

Asking the right questions and searching brought out the right stories. Had to be so sure asking data the right question.

Searched for Prime Minister’s name. Found out more about stories already reportered. Data had it from the in-depth. Covered all areas, not just limited to where the few journalists were embedded.

Used great software to show incidents over periods of time. Colour coded to show deaths, civilians, enemies, police, friendlys etc.

Ten thousand killed through ethnic cleansing murders. More people killed in murders than IED explosions, found in data.

Discovered a category of incident marked as causing media outcry.

http://v.gd/q4zxDz – Tutorial.

Used Tableau to see data. Limit to free version of up to 100,000 records.

Searches of the data found civilians killed at checkpoints due to car bombs exploding.  Had people reading 800 reports to get the real story behind the numbers, too.

Found was great to use, particularly visually without worrying about code.

People liked word highlights and PDF was the best way to use it.

Used the data as part of the research. Didn’t think, let’s do maps and data images, but did.

Had maps showing where fatal incidents happened.

Powerful information, especially when you pull out from central Baghdad.

Team on the ground went out to Baghdad talking to people for Dispatches.

All the data was geocoded. Took an area and pulled out every report from the area. Used in a map view to see what had happened.

The map helped reporters speak to people on the ground.

Had video of man in a white sedan come out of his vehicle who was then gunned down by an Apache. Found the report in the Iraq log mentioning the sedan using geodata. Report didn’t show the driver getting out and surrendering, the video did.

Checking details found it was within range of Apache, and lawyer cleared the footage for Dispatches.

Information tells story that doesn’t look like a data story. Man shot while surrendering is a stronger story, although he had a mortar tube in his car.

It wasn’t found with clever tricks but 10 weeks, with 25 people reading detailed reports working more than 18 hours a day. 30,000 reports read in detail. 5,000 read closely.

 

Richard Dixon from The Times asks if the leak will make this type of data more difficult to come across and unlock.

James suggests not because of the way it was leaked.

Francis Irving asked who paid? Funding from the David and Elaine Potter foundation. Dispatches paid a standard fee. Also took a fee from Al Jazeera. This gave a budget to cover research.

Mechanical Turk used for mundane repeat tasks, but something like this is too sensitive for farming out to different nationalities. Needed researchers who were trusted and had been working on it for some time because the information was so sensitive.

 

Judith Townend asked if there were issues with mainstream media taking up the story. James said it was difficult but explaining the data and making it clear helped. Put across idea it’s battlefield data but trust the data. The numbers change as you’re going through in data journalism.

As people became more comfortable with it, it didn’t become difficult to ‘sell’ at all.

Bureau of Investivative Journalism put all information, maps, animations on the web. Also put the raw data, heavily redacted, online. Wikileaks put it all online.

03/05/2010

Confessions of an election geek

The first election I remember was in 1979.  It was when Mr Callahan stopped being Prime Minister and Mrs Thatcher took over. It was exciting to see a woman in charge, and amused by people chanting “Maggie, Maggie, Maggie, out out out”, within days of her arrival.

I recall with fondness the Spitting Image election special of 1987, and, I think, the 1992 election night with Armondo Ianucci with a choir performing television news theme tunes. Or was it 1997? That was the  last election I watched on TV . I kept promising myself I would go to bed, but it was just too interesting.

The last two general elections I have been at the count. It was a childhood dream. I always enjoyed going along to the polling station with my parents, and have always voted with enthusiasm. However, this count will be different because this is the social media election.

Five years ago the newspaper I worked on didn’t have a website. Now, as digital editor for the Worthing Herald series I have encouraged reporters to use Scribblelive during hustings, giving readers the chance to watch from home, and participate.

A few of the candidates standing in the four constituencies the group covers have Twitter accounts. All bar one have a website, blog and Facebook presence.  All four seats are pretty safe Conservative ones.

This is where I’m lucky. I live in Brighton Pavilion, a constituency where Labour have been in power since 1997, taking over from the Conservatives, with a council where the Greens are the biggest group. All three parties are fighting to take the seat. It’s the Green’s number 1 target, hence Green leader and MEP Caroline Lucas is standing.

I can watch Tweets from the three main candidates throughout the day, every day.

The bulk of election literature we have received has been Green and Labour. The occasional bit of Conservative has arrived and one general piece of Liberal Democrat. Brighton Blogger and Labour Party activist Dan Wilson has written an interesting blog about Brighton politics, What does the LibDem surge mean in Brighton Pavilion? Brighton ought to be a Lib Dem stronghold, but it seems Lewes and Eastbourne have greater potential on that front.

Back to the point. I have uploaded photographs of leaflets to The Straight Choice.org. The majority have been either Labour or Green.  I started uploading after the Brighton Future of News Group meeting with Richard Pope, one of the founders of The Straight Choice, spoke to Brighton Future of News Group. Comparing party leaflets in different post code areas, across the country, and at varying times during the campaign.

Inspired by Richard’s presentation, I promoted the idea on the Worthing Herald website resulting in the first leaflets uploaded for the constituencies.

During this bank holiday weekend I’ve taken the process one step further by taking photographs of every campaign poster I’ve seen while walking around Brighton. I had planned to do it once when I walked to visit friends in Hove, before meeting another friend in the city centre. It was a circular walk back home. The number of Green Party posters I saw was astounding. The idea was inspired by a Tweet from @mockduck to @jowadsworth

This exercise prompted me to walk around where I live and map the photographs.  The resulting mass of green makes Thursday’s result an interesting prospect.

I’ll be spending Thursday night and the early hours of Friday morning at Lancing Leisure Centre, waiting for the results of the East Worthing and Shoreham and Worthing West constituencies, but at least this time I can keep track of everywhere else online.

I’ll be using Scribblelive and Qik to bring live text and video from the count, and publicising the result online as it happens.

14/03/2010

Brighton Future of News election special #bfong

Brighton Future of News Group has an election theme for the second meeting on Monday, March 22.

Web developer Richard Pope will be talking about his work with Democracy Club, The Straight Choice and My Society.

I am an election geek living in a constituency where a three-horse race is developing between Nancy Platts (Labour), Caroline Lucas (Green) and Charlotte Vere (Conservative).

Sorry Bernadette Millam (Liberal Democrat) and Nigel Carter (UKIP), but it’s true.

All three have been campaigning. As a resident I’ve been aware of Nancy Platts beavering away in the background for a year or so now.  Charlotte Vere seems to have taken up a Saturday morning residency in London Road and Caroline Lucas has also been spotted.

One of the ideas suggested at the first Brighton Future of News Group was a candidate tracking Google map. Jo Wadsworth, web editor of the Brighton Argus had the Brighton Pavilion map up within a few days.

This is the sort of innovative idea that can come out of an event where journalists, bloggers and technical wizards can get together.

Personally, I’m very interested in The Straight Choice, and have started collecting up the few election leaflets my partner hasn’t thrown straight into the recycling.

Once this election is over it will be interesting to see the stories arising from the literature targeting our votes.

Join in the discussion at The Skiff at 7.30pm for a prompt start.

Your hosts are Judith Townend from Journalism.co.uk and me.

Brighton Future of News Group is open to all with an interest in news and journalism, from broadcasters to bloggers, PRs to podcasters, programmers, students, writers, journalists and all media folk.

09/02/2010

Brighton Future of News Group first meeting #bfong

The first Brighton Future of News Group took place yesterday (Monday, February 8), attracting a variety of journalists, writers, bloggers and techy folk, all interested in telling stories and relaying facts in new and interesting ways.

Jo Wadsworth

Our first speaker was Jo Wadsworth, web editor at the Brighton Argus, who spoke about building a community of bloggers writing on specific themes or hyperlocally, the sort of news that might not make it into the newspaper, but will be of wider interest.

Examples included the Bevendean Bulletin, which uses the Argus in lieu of its own website. Student reporters from the Journalist Works gaining experience by writing patch blogs, and others are aspiring writers dipping their toes in the water.

Jo was keen to point out the bloggers aren’t considered a replacement for reporters, but rather augmenting the newspaper’s website.

After all, as Jo explained, these people will be blogging anyway why not utilise their enthusiasm and talent for the paper?

The bloggers benefit from a ready-made audience and technical support, the paper gets street-level coverage.

Jo cited the pothole paradox hypothesized by Steven Berlin Johnston ie. extremely local, small-scale news is interesting to people living in a certain street with pot holes but not to those living a few streets away.

When it comes to looking after a paper’s bloggers, Jo advised giving constructive but honest feedback and never be afraid to turn people down.

I was pretty pleased to hear there was a high turnover of bloggers and some who didn’t even start, as I’ve had similar situations with a number of ex and failed-to-starters.

Simon Willison

The second speaker Simon Willison initially talked about his work creating the  software and database for The Guardian’s MPs’ expenses crowd-sourcing project, where more than 200,000 documents were studied in the search for interesting information.

The structure was put together in a week before 450,000 documents were dumped into the public domain during this act of government “transparency”.

It was a steep learning curve for the team behind the project, but it was developed on for the second release of MPs’ expenses information for 2008/9 and the first quarter of 2009/10.

A few thousand documents were torn through by the crowd. Simon and the team created a wider variety of tags for each page, such as food or soft furnishings.

Hand-written pages were often particularly interesting, such as a lengthy note from Jack Straw.

My personal favourite site Simon has created is Wildlifenearyou.com where people can share their pictures of wildlife, both wild and captive. It’s an amazing site where people can vote for their favourite pictures of animals, add their own, find creatures geographically. It really is imaginative.

A spin off site is Owlsnearyou.com which has had friends/fans hijacking the American Superbowl hashtag #superbowlday superb-owl-day, geddit…

Simon also showed impressive crowd-sourced maps, particularly a post-earthquake map of Haiti, created by users of OpenStreetmap.org

It was pretty impressive to see what could be created by people with the imagination and skills to make something happen and not just draw ideas out on paper.

Break away

Both talks definitely fired the imaginations of everyone involved who took part in the break-away sessions at the end of the evening.

The four groups came up with multimedia ways to cover Brighton Pride, this year’s general election and transport issues.

A particular favourite of mine was creating a spot the candidate Google map. Now that’s an idea with legs.

Other blogs/posts about Brighton Future of News Group:

A document of all the event’s tweets featuring the hashtag #bfong.

Laura Oliver, editor of Journalism.co.uk also blogged about Jo Wadsworth’s and Simon Willison’s presentations, as did John Keenan.

Judith Townend, from Journalism.co.uk organised the event at The Skiff and put together a summary linked with the first Future of News Group West Midlands meeting, which took place on the same evening.

The original UK Future of News Group was set up by Adam Westbrook.

08/02/2010

Web headline relief

Filed under: Web journalism — Sarah Booker Lewis @ 4:18 pm
Tags: , , , , , ,

A four-hour flying visit to Grantham later and I’m feeling much happier about the new-look website than I did when I wrote Making the headlines work.

The content management system (CMS) is radically different to the current JP system, but at least I know it is possible to have specific web headlines and edits.

I think the new sites look cleaner and are/will be easier to navigate.

The majority of reporters, particularly the Worthing team, will be able to grasp this quite quickly.

It’s just a case of bringing everyone else on board.

07/02/2010

Making the headlines work

Late on Friday afternoon (February 5, 2010) I had an interesting discussion with a member of senior management about the difference between web news and the newspapers.

It’s no secret that Johnston Press is changing its operating systems, and part of that includes a redesigned website. The Grantham Journal already has its beta site up for feedback.

Tomorrow I am going to Grantham to see how it all works, hence the discussion with the senior manager. I explained I particularly wanted to know how the web edits and headlines are separated from print, as the system is integrated.

This was something he couldn’t get his head around. Why the difference? Writers, and sub editors in particulars are aware of the need for expressive, concise and witty headlines necessary for print.

However, those of us working with news websites are well aware of the need to simplify headlines for search engine optimisation (SEO).

When I was handed the newspaper websites I knew nothing of SEO, but as a web user of 12 years standing at that point, I had an instinctive understanding of how the headlines and stories needed to be written.

It seems obvious to me that journalists need to put themselves in the position of someone searching for a particular news story.

On Friday I had uploaded a video of fans queuing to see Peter Andre at the Holmbush Centre Tesco in Shoreham-by-sea.

My chosen headline was Peter Andre thrills fans at Holmbush Centre signing and the picture caption included the words Tesco, and Shoreham.

The Shoreham Herald version of the story topped Google on Friday afternoon, and the story was the most popular on the three websites it appeared on. My work was done.

However, the senior member of management couldn’t understand why we wouldn’t use that headline in the print edition. It’s not dreadful, but it’s hardly the sort of pithy eye-catching stuff you might expect.

I suggested the print headline “Fans scream for six-pack star”, and pointed out it wouldn’t work on the web, but had an element of fun for the paper.

Unfortunately the senior manager couldn’t understand why it would work in the paper and not on the web, and vice versa.

When I checked the web analytics I found “Peter Andre” was the top search term and high in the rankings were the phrases “Peter Andre at Holmbush”, “Peter Andre in Shoreham” and “Peter Andre signing at Tesco”.

It may not seem like rocket science but I have tried to explain this to a number of reporters who still don’t get it, and use print style headlines instead.

An example I use during the online journalism workshops I host at Brighton City College, is The Sun’s Gotcha headline.

I’ve read variations on the Gotcha isn’t good for the web theme, but the best is Shane Richmond’s post for British Journalism Review.

The “Gotcha” headline on a Sun front-page splash about the sinking of the General Belgrano is one of the most famous, or infamous depending on your taste, in the history of British journalism.

Yet no web producer with any experience would consider a headline like that today. The reason is search engine optimisation (SEO).

SEO has been around almost as long as search engines themselves, but journalists were quite late to cotton on.  It didn’t really reach newsrooms until a couple of years ago.

The concept is simple.  It’s about ensuring that your content is found by the millions of people every day who use search engines as their first filter for news and those who don’t search at all but trust an automated aggregator, such as Google News, to filter stories for them.

These people are essentially asking a computer to tell them the news.

If you want your story to be read, you’d better make sure the computer knows what you’re writing about.

It’s logical and simple, it’s strange how some people just can’t get it.

31/01/2010

Live and interactive

FOOTBALL fans enjoy transfer deadline day (apparently) so Lee Hall, my colleague at the Sunderland Echo, is sharing his live coverage of the sporting excitement with other Johnston Press titles. Here’s the Worthing Herald’s transfer day page.

Lee has put a lot of effort into this day, as he explains in his blog D-Day is upon me. It’s a great opportunity for University of Sunderland students, too, as they will experience the real buzz of a newsroom.

Cover It Live is the medium of choice. It is a fabulous piece of software with terrific functions, particularly the polls. I used it for Worthing International Birdman.

However, the Worthing Herald sport department favours ScribbleLive. Why? They have always been able to connect and have never lost service.

We started using ScribbleLive after I carried out a comparison test for Johnston Press’s digital higher powers.

Getting and maintaining a connection had been a problem from the start with CoverItLive (CiL).

During Worthing Birdman I couldn’t connect to CiL using a JP laptop and I had difficulty with my 3G connection, losing the page for an hour or so.

The sports team used CiL three times at matches, the rest of the time it wouldn’t work.

I had an email conversation with CiL president Keith McSpurren who didn’t think 3G connections should cause a white-out in the pop up page.

From our experience the one page does all functionality of ScribbleLive means we have been able to connect and write every time.

A phenomenal example of ScribbleLive in action is its white label service used by Reuters. The Berlin Wall 2o event was fascinating.

A fine example of CiL used at local news level was the Manchester Evening News’ live blog of Manchester City Council’s meeting described by David Higgerson in his blog CoverItLive and Twitter covering councils in a new way.

I also find  commenting is more obvious for the readers in CiL as it’s at the bottom of the feed, whereas it’s at the top with ScribbleLive.

I also miss CiL’s poll function, as I’d really like to use something like it during election coverage.

There is also a preloading function for text and photographs which seems useful but in my experience shows it is difficult to use.

Both platforms are useful tools for any news team, but I stick with what’s working for us. However, I would be interested to hear other people’s experiences, particularly teething or connection problems with either service.

With the way things are at the moment, we are keen to continue with ScribbleLive here, and miss out on the polls etc. unless CiL starts working on our laptops.

In the meantime I’m looking forward to seeing how well the transfer deadline day coverage goes and whether it proves popular with our readers in Sussex.

Next Page »

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 3,104 other followers