The Iraq War Diary – An Initial Grep

Posted: October 29th, 2010 | Author: | Filed under: Data Analysis, Philosophy of Data | Tags: , , , , | 17 Comments »

Editors’ Note: While data itself is rarely a source of controversy, supports pursuing a data-centric view of conflict. Here, Mike Dewar examines the Wikileaks Iraq war documents with sobering results. Hackers should note the link to his source code and methods at the end of the post.

…any man’s death diminishes me, because I am involved in mankind, and therefore never send to know for whom the bell tolls; it tolls for thee.” — John Donne, Meditation XVII

The Iraq War logs recently released by Wikileaks are contained in a 371729121 byte CSV file. It contains 390849 rows and 34 columns. The columns contain dates, locations, reports, category information and counts of the killed and wounded. The date range of the events spans from 2004-11-06 12:37:00 to 2009-04-23 12:30:00, and the events are located within the bounding box defined by (22.5,22.4), (49.6,51.8). Row 4 describes a female walking into a crowd and detonating the explosives and ball bearings she was wrapped in, killing 35 and wounding 36. Searching for events mentioning `ball bearing’ returns 503 events.

There were 65349 Improvised Explosive Device (IED) explosions between the start of 2004 and the end of 2009. Of these 1794 had one enemy killed in action. The month that saw the highest number of explosions was May of 2007, when Iraq experienced 2080 IED explosions. During this month 693 civilians were killed, 85 enemies were killed and 93 friendlies were killed. The ratio of civilian deaths to combatant deaths is 3.89 civilians per combatant. On the first day of May there were 49 IED explosions in which 3 people were killed.

IED explosions in Iraq

Location of all IED explosions as reported in the Wikileaks Iraq War Diary

108 different categories are used to categorise all but 6 events. The category with the most events is `IED explosion’ with 65439 events, followed by `direct fire’ with 57815 events. The category `recon threat’ has 1 event which occurred at 8am on the 17th of April, 2009, where 25 people were noticed with 6 cars in front of a police station in Basra. There are 325 `rock throwing’ events and 325 `assassination’ events.

There are 1211 mentions of the word `robot’, 4710 mentions of the word `UAV’, 1332 mentions of the word `predator’ and 443 mentions of the word `reaper’. The first appearance of one of these keywords is on the 3rd of October, 2006. There are 445 mentions of one or more of the words “contractor”, “blackwater” or “triple canopy”.

drones and contractors in iraq

Density showing the distribution over time of events mentioning contractors and drones

The joint forces report that 108398 people lost their lives in Iraq during 2004-2010. 65650 civilians were killed, 15125 members of the host nation forces were killed, 23857 enemy combatants were killed, and 3766 friendly combatants were killed.

Deaths in Iraq over time

The number of deaths per month as reported in the Wikileaks Iraq War Diary

Please don’t believe any of this. Go instead to the data and have a look for yourself. All the code that has generated this post is available on github at You can also see what others have been saying, for example the Guardian and the New York Times have great write ups.

  • Tweets that mention dataists » Blog Archive » The Iraq War Diary – An Initial Grep —

    […] This post was mentioned on Twitter by Drew Conway, Mark Dumas, John Myles White, Michael Dewar, dataists and others. dataists said: The Iraq War Diary – An Initial Grep by @mikedewar […]

  • Dave Kincaid

    Not sure what the point is here. It’s a war. People die in wars. That’s kind of the point. I don’t find any of this particularly unique or innovative. If you’re surprised by any of this I have to wonder what you thought has been going on over there for the last 6 years. Have people really become so far removed from what’s going on around them that data like this is shocking?

  • Kevin Nuckolls

    Frightening data. Especially the civilian deaths. How were these visualizations produced? I’m specifically interested in the software package that produced the second “distribution of events” graph.

  • Matt

    It’s all in R – if you look in the last paragraph, Mike’s got his source code posted.

  • M Calderisi

    I’ve got a problem…I can’t download the data. could you help me’
    i’ve got this error message:
    Problem connecting to tracker – , (at sab ott 30 ’10 @ 09:46:40 m.)

  • Mikedewar

    Hi, looks like you’ve got a problem with your bittorent, which isn’t something that we can help with!

  • Infovore » Links for October 30th

    […] dataists » Blog Archive » The Iraq War Diary – An Initial Grep "Please don’t believe any of this. Go instead to the data and have a look for yourself." Which is, for this audience, a very good way of putting it. (tags: data iraqwar belief trust ) […]

  • Mark Bulling

    Mike, thanks for the python code to clean up the data, again much appreciated.

    I’ve started to investigate it, using Tableau Public in the first instance – (tableau public can only handle 100k rows, hence looking at only IEDs with it for now).

  • boddah

    Using Your code: Iraq-War-Diary-Analysis / prep /
    I get this massage:
    line 33
    print row
    SyntaxError: invalid syntax

    I have never used Python, so maybe You can help

  • Mikedewar

    Hi! I can’t generate your error I’m afraid, so I’m a bit stuck. The line you’ve printed is the last line in the preprocessing code, and is certainly not the cause of the error! I’d like to help, and the best bet would be to take your question and raise an issue on github here: where we can sort out the error away from the blog. Thanks so much for having a go at running the code!

  • Robert Doherty

    it might be the version of python you being used.

    the syntax for “print” in more recent versions of python (3.1) requires “()” around the print argument:
    #new syntax
    print (row)
    #older syntax
    print row

  • Robert Doherty

    Great article and analysis.

    I am new to R and trying to run the sample code. One quick question: Where did you get the .shp files?

  • Due Regard for UAVs « // jake porway //

    […] from being used to commit atrocities overseas, unmanned aerial vehicles (UAVs) are actually pretty handy and, whether you like it or not, will […]

  • cheap louis vuitton handbags

     Glad to visit your website, it’s great! 

  • Burberry Sac à Main


  • Sers


  • Jhon clerk

    People are very fond of looking fashionable and stylish and they adopt fashion for their living. They use to adopt different types of hairstyles for them. There are many styles of hairs are commonly used by the people but the half up half down hairstyles are most popular among young generation.