From Idea to [the start of a] Story

This is an introductory guide on how to produce the beginnings of a piece of data journalism. We’re going to walk through it together, as I outline the key things to consider before starting, how to structure your work, a basic process to follow, and then use a real case study to show how the process works with a real story.

Be at ease, there is hope

The glitz and glamour of data journalism (the animations, the striking maps, those great infographics) are all over the Internet. It’s easy to think then that it’s about the data and how cool you can make it look, sing, or dance. Our wise friends at Code4SARaymond and Adi, keep reminding us (and the salivating Internet-at-large) that the focus should be on data journalism, and not data journalism.

Data journalism is no different than the journalism we all know and consume every day. Where traditional journalism relies on human sources (insiders, experts, scholars, scientists), data journalism treats data sources (spreadsheets, websites, databases) with all the rigor and scrutiny journalists treat human sources.

The animations and snazzy work is a part of communicating the final product – the story – but they will never replace the actual story.

The grand start

A data journalism story can start from an important event or it can simply be a question. You could have seen a breaking headline and wondered, how much x did it take for y to happen? Or, you start thinking about, say, food in a supermarket and wonder, what percentage of dog food features on the average shopper’s bill? Both questions are equally valid and are great starts for considering a piece of data journalism.

What I’ve learned so far in my work is that there is little difference between doing the work of basic science and that of data journalism. You make an observation, you come up with a question (hypothesis for purists and fancy people), and then you go about, doing some work to answer that question. Your work will show either that your initial hypothesis was incorrect or yes, it was indeed correct.

So, as I mentioned earlier, it’s not about the fancy graphic or how much data you trawled through. It’s about, what was your question and did you answer it or not?

Don’t believe the hype.

Who’s your data and what does he do‘?

I live and work in South Africa, so I’ll be basing this guide on data on the workforce from the country’s statistics agency. (The results of the quarterly survey was released just recently and the official unemployment rate is at a grim 25%.) The agency cares (in my head) about my feelings and thus have released the data in a Excel spreadsheet format. I will write other posts about how to deal with sources of data, where the publishers don’t care as much about your feelings.

The dataset is here and there are enough sheets in there to warrant exploration. This exploration is important because an excited and hurried deep dive into the data, without knowing what it’s about, what it covers, and so on, may end up on looking at the wrong data that doesn’t answer your question, attempting to answer the wrong question or – the nightmare of every data journalist – hours wasted achieving little.

So, before we talk about the process, let’s look at the data and see what it tells us. We don’t usually work with all the data (unless our initial idea or question requires this). It’s better to first spending some time looking at all the data and then focus on a particular section that catches their attention.

Looking through the spreadsheet from the stats agency, the data looks at different characteristics of the workforce (by province, age, gender, and demographic group). Even if it’s this is your first time and you’re following along, throw a quick glance at each of the sheets. It’s part of developing that methodical work ethic that will become invaluable as you progress in this type of journalism.

As an important sidenote, you’ll need to have only a basic working knowledge of Excel. I won’t be wielding any sort of magic on the worksheets, so anyone can follow along. For the sake of brevity (and so you don’t drop into a catatonic stupor from me detailing every single step), I will leave you to figure out how to do the basic manipulations in Excel after I explain them.

Now the journey begins

We’ve talked about what it really means to produce a work of data journalism, how we start considering an idea that will lead towards a piece, and some introductory remarks about how to look at a dataset. Finally. The process, the good stuff. How does it work?

Step I: Take a bite out of the data

For this guide, I want to see the size of the workforce in all the provinces in South Africa and how it has fared between 2013 and the second quarter of 2015. That data is in the very first worksheet. (You’re welcome to look through all the others and see what other interesting insights you can mine from them.)

So we went from an original spreadsheet of more than 20 worksheets:


… to just this one entitled Table 1: Population of working age (15-64 years):


Let’s copy ’n’ paste the bottom part of into a new sheet, since that’s the view of the data we want to work with. To move towards a clean dataset, I took out the heading and “thousands” rows, and the cell labelled “South Africa”. I also took out the totals row, so it doesn’t come up later to confuse us. (I will adjust all the values, to reflect millions, in a minute.) It now should look like this:


Now, let’s change all the cells to show values in millions. I created columns next to each original column and multiplied the value by 1000. It now will look like this:

I also removed all borders, decimal places, and made the thousand separator a comma; this will help us make our charts readable and accessible later. At this point, you’d (and I did, too, at some point) be ready to take this table and analyse it. Not quite yet. Although it is indeed cleaner, the data structure we need is not there. Why does this matter? Because the data needs to be organized in a way that we can aggregate or group them. The wise old sages of data journalism say, if your data is not summarized [or aggregated], it is not ready for analysis.

Step 2: Transform the data into an analysis/visualization-ready structure

What factors are we ultimately looking to expose from this data set? They are province, year and the total number of workers. But, before that, we’re going to create this new data structure with the following columns:

If you studied database design or are a working programmer, you would have failed your database design test or received the chiding of your life if you proposed this dataset design. And your lecturer (or boss) would have been right; it’s not a normalized (computer science speak for optimised) dataset. However, this is data analysis for a piece of data journalism, so you may scorn those rules! We need to have duplicate rows in order to aggregate the data later (remember?).

Step 3: Produce the final dataset

In the screenshot above, I put in the structure to be followed for all years. So, copy in the totals for 2013, 2014 and 2015. You will then have a dataset that should look this. You should have 91 rows and only Q1, Q2 for 2015.

We’re almost there! The last step is actually aggregating the data. So, take a deep breath and create a PivotTable in a new sheet. Your summarized data should look like this:


Clean up the table: put in thousands separator, remove decimal places, and take out that cell labelled “Row values”. It should now look like this:


Step 4: Produce the visualisation

Congratulations! You have a dataset that is ready to be visualised.

We’re going to use to produce a infographic. This guide won’t cover how to sign up and use, so (as with Excel) you’ll have to become acquainted with the tool. I do assure you that it’s straight-forward and intuitive; you’ll use it like a professional in no time! You shall see.

Create a new infographic, choose any template you like, and look at the blank work area. It will look like this:


Give the infographic a title like “Total workforce in provinces, 2013 – 2015” or something similar, as you see fit. Then, add a grouped bar chart from the popup wizard. You’ll see the chart show up on the work area. (Delete the existing chart that comes with the template, that is now below the one you just created.)
Double-click on the chart and you’ll see an interface appear, not too different from Excel. Delete all the data you see, copy the data in your Excel worksheet from the last step we created (the PivotTable), and paste it into the spreadsheet interface. It should look like this:


When you pasted the data in, the graphic should have automatically updated itself. It’s starting to look great!

Have a look at the infographic. Everything is in there, but it may not be immediately understandable. You have to scroll doarrowswn to the legend to see which colors denote which provinces. So, instead of having to re-format the data, click on the two-directional arrows icon in the top right-hand corner of the spreadsheet interface. This nifty feature will switch together the rows and the columns, so that the provinces are now the rows and the years are now the columns.


Always aim to show the values on the chart (where appropriate, obvs), so click the “Show values” switch and the totals will reflect on the chart. Also, click on the Settings button and scroll down to add “total (in millions)” in the X-axis textbox. This will help the reader (and you) understand further the chart.

If you click the “Publish” button, you can give your graphic a title and then choose whether you want it to be an interactive or image. This is how the final image would look like:


And you have produced your first visualisation. Pat yourself on the back, have a coffee or beer, and get ready because you’ve just started the process. 🙂

Before we look at the rest of the work needed, let’s review what we’ve done:

  1. We looked at a data source and extracted a view of the data that we want to look at. In this case, we asked the question, what was the size of the workforce in all of South Africa’s provinces between 2013 and 2015?
  2. We followed a basic process of cleaning, formatting, transforming, and summarizing the data until we produced a table showing the data we need to answer our question.
  3. We then inserted the data into our visualisation tool and produced an infographic, shown above.

At this point, you’re so excited that you jump on Twitter or email, and send out your work to everyone you know. Hold on! Not yet.

What do your findings really mean?

Yes, you analysed the data and you answered your question. Gauteng province has had the largest workforce within the time period we chose, but it’s been decreasing in size since 2013. The Northern Cape has been consistently below 5 million since the same year. Why is this?

That’s why the second part of the title for this guide has the disclaimer: “start of a story”, because now starts the work of journalism that you know or were trained to do. At this point, you would:

  • contact analysts, experts, academics to interpret and comment on the data
  • depending on the scope of the story or your editor’s instructions, you’d look at other data sets or speak to experts to explain the context behind the findings
  • even analyse/visualise other datasets to test and refine your findings
  • and, do anything else required to make sure the piece is balanced and fair.

Once you’ve done any or all of these steps, you write the final article, include the infographic we produced above, and submit it for publication. If you run your own blog or website, you would just publish it live.

There’s no place like the end!

And the end, it is. I hope that you’ve come this far and your appetite has been whet to do further (and more sophisticated) work in data journalism.

If anything hasn’t worked for you or you’d like some help with a certain section, follow me on Twitter @minaddotcom and we can figure it out together. Please also check out the Johannesburg chapter of Hacks/Hackers @HacksHackersJHB for more information and resources on data journalism.


I’ve included below all spreadsheets, tools, and links, so you can pick up this guide any time and see how I arrived at the final infographic.

Should You Report When the Public Doesn’t Care?

I apologise for my long silence. The few days’ illness turned into 2 weeks spent in bed. I’m re-establishing my rhythm.

This article summarizes my thoughts about reporting for countries and societies that have no interest in the truth, but rather confirmation for their own biases and opinions:

Covering wars for a polarized nation has destroyed the civic mission I once found in journalism. Why risk it all to get the facts for people who increasingly seem only to seek out the information they want and brand the stories and facts that don’t conform to their opinions as biased or inaccurate?

And without a higher purpose, what is a career as a reporter? It may count among the so-called “glamor jobs” sought after by recent graduates, but one careers website has listed newspaper reporting as the second worst job in America, based on factors such as stress, pay, and employment uncertainty; toiling as a janitor, dishwasher, or garbage collector all scored better. Even if you love the work, it’s hard not to get worn down by a job that sometimes requires you to risk life and limb for readers who wonder if maybe you suffer all the downsides and hazards just to support some hidden agenda.

Every day when I write or argue or think about Egypt, I wonder what is the point when even the most prominent activists are deflated and considering giving up. I’m coming to politics and journalism much later in life than most reporters;yet, I feel a lot of their same disillusionment, frustration, and futility.

I no longer call or consider myself a ‘revolutionary’ because I was never in the streets like others and I never fought on any of the frontlines: media, courts, social activism, so on. This feeling that I am not at all worthy to be called an activist came from reading Alaa’s open letter published yesterday.

What are we reporting for?

Fact-Checking: ‘Hamas is Israel’s Frankenstein’

As you traverse social and mainstream media to understand the current Israel-Gaza war, you won’t be able to hop far on one foot before you stumble or fall over or get in the hit in the face by a barrage of conspiracy theories, myths, and urban legends that have become ‘accepted fact’. (In the Middle East, this can be read as ‘I heard it and it then must be true because it’s anti-Israel and anti-America’)

The only defense available, in the midst of the proverbial faecal shower, is to grab the myth or ‘accepted fact’ in your hand and crumble it, piece by piece.

I stumbled upon this piece by Hassane Zerouky the other day. It attempts to assert and prove that Hamas was created and fashioned by the Israeli regime, and then let to grow in size under its bemused eyes… while the regime twirls its mustache.

Origin of Piece and Authorship

There is no information online about the author. The piece is quoted across a myriad of message boards, conspiracy theory blogs, and alternative news websites. However, the identity of the author cannot be ascertained from either Google web or image results. I contacted L’Humanité to confirm the identity of Hassane Zerouky. [add response from them]

The first clue comes from one article on the alternative website War is Crime that references the piece:

The article below originally appeared in the French daily L’Humanité on December 14, 2001, translated to English by Global Outlook in 2002, and published by Global Research in March 2004. It shows how the so-called Islamic Resistance Movement (Hamas) was founded by Israel’s Institute for Intelligence and Special Tasks (Mossad) with the strategic purpose to prevent the creation of a Palestinian State.

On L’Humanité‘s website, the only information about the author is a cryptic single sentence: “International News” with no further hyperlinks or explanatory paragraph. The link to the original French article points to the wrong piece; the real essay in question is found here.

I ran Google Translate over the article because my French is as impeccable as my Sanskrit. This is the opening paragraph from the original French:

For many Palestinians, people without territory, subject to repression, humiliation and repeated closures, the radicalism of the fundamentalist Hamas embodies the ultimate recourse against the occupation. How was created and developed the organization that took the late train “resistance” to Israel? It does not say enough, it is Israel that has basically created Hamas, “thinking ensures Zeev Sternell, historian, professor at the Hebrew University of Jerusalem, it was smart to play the Islamists against the PLO. “

Compare this with the opening paragraph in the alleged English translation:

Thanks to the Mossad, Israel’s “Institute for Intelligence and Special Tasks”, the Hamas was allowed to reinforce its presence in the occupied territories. Meanwhile, Arafat’s Fatah Movement for National Liberation as well as the Palestinian Left were subjected to the most brutal form of repression and intimidation

Let us not forget that it was Israel, which in fact created Hamas. According to Zeev Sternell, historian at the Hebrew University of Jerusalem, “Israel thought that it was a smart ploy to push the Islamists against the Palestinian Liberation Organisation (PLO)”.

I searched through the original French article for the phrase ‘Institute for Intelligence and Special Tasks’. It’s not there. A cursory glance shows that the English leans more towards a paraphrase of the original French, with some editorialized embellishments thrown in for good measure. I contacted Global Research for comment. [add their response]

Questions around the Publisher

The editor’s note above on the War is Crime article mentions that Global Outlook translated the article from the French original, and then Global Research published it on its website. It’s curious that the two links in the editor’s note point to the same translated piece. A search online didn’t bring up any results for Global Outlook.

Global Research (Centre for Research on Globalisation) does not list its editorial or production teams on the about page, but does outline its submission requirements. They stipulate that references and sources be made available and linked to citations. For a controversial piece like Hamas is a Creation of Mossad, there are no footnotes or sources in both the English and French articles.

Let’s Go Through It… One by One

I will be using the English translation, referring to the original French in the event of a substantial discrepancy between the two versions.

Let us not forget that it was Israel, which in fact created Hamas. According to Zeev Sternell, historian at the Hebrew University of Jerusalem, “Israel thought that it was a smart ploy to push the Islamists against the Palestinian Liberation Organisation (PLO)”.

This is a very bold assertion. I contacted Dr. Zeev Sternell to confirm this and he replied with the following:

The quotation as far as I can remember is correct but totally out of context. I have said that various Israeli governments had preferred to play the religious elements against the nationalist, believing that religion was much less dangerous than nationalism. Must people did not understand so many years ago neither the nature of radical islam nor that of radical judaism. That does not mean that the Mossad has created Hamas, which is idiotic.

ABC Australia’s Stone Cold Justice report misses the mark

A friend shared this report on my Timeline yesterday. It talks about the IDF’s arresting of young Palestinian children in the West Bank and two lawyers – one Australian and one Israeli – involved in their defense, and towards the eventual lobbying and end of this practice.

I watched the report with a critical eye and despite certain sections being disturbing, I wasn’t convinced by neither its angle nor its execution. I’ve been researching lately for a piece to be posted soon here about the latest Gaza war and from reading outside the main news channels, I’ve started to develop an awareness of the intricacies of the overall Israel-Palestine question.

My critique of the report follows.

Although the contents of it cannot be disputed and the fact that children are being targeted for both arrest and systematic abuse is fundamentally objectionable, this report suffers the same problems of many Western reports. It lacks nuances and stops at the emotional appeals rather than delving into the details.

Firstly, other than a few cursory reports from the Israeli international spokeperson, there’s no comment by analysts or the like from the Israeli IDF or government. IDF and Shin Bet (Israel’s homeland security) do not engage in any operation without a motive and a meticulous plan behind it. There is more than 99% possibility that these children are being targeted for a specific reason. The report doesn’t tell you what parts of the West Bank they’re targeting the children, whether their families have indirect or direct links to Hamas or other political and militant Islamist groups, and whether these children are on some watchlist by the Israeli authorities. The report just glosses over all that nuance.

Secondly, deep within the report – approximately around the 37th minute – you suddenly hear something about children being either coerced to be informants or collaborators. Why didn’t the journalists and the producers press to find out more about this? I’d say this is pretty key to the whole report, that is, the *motive* behind the surge against the children. Does the IDF want to recruit these children to be double agents or moles inside PLO or Hamas or other groups?

Just as Hamas and other political Palestinian groups recruit young for their cadres, the IDF and Shin Bet may want to counteract that initiative by scaring or scarring the children. Whether this is moral or not is not the question here. It’s the why. So many why’s are unanswered and frankly ignored in this report to score emotional points against Big Bad IDF.

Finally, I picked up in the one of the last interviews with the children that the boy wanted to go back to Amman. This may mean that these are displaced Palestinians from Jordon, now living in the West Bank. And that opens up a whole line of questioning; could the IDF be targeting Jordanian Palestinians?

There’s just too much gray area in an otherwise fairly conventional report about Palestinian victim vs. Israeli bad guy. The nuances that could have been explored, developed, and explained would have lent to a report that shows the complexity of the geopolitics, history, and social realities of the conflict.