It’s been almost a month since I had the privilege of attending NICAR, the national data journalism conference run by Investigative Reporter and Editors, this year held in Jacksonville, Florida. It was an inspiring trip - made some new friends, learned a bunch about what kinds of data-driven reporting are going on around the country and the techniques used to do the work and present it. Honestly after three and a half days packed with content, some of it fairly dense, I felt a bit overwhelmed. But I came back to my job with fresh energy, which I applied to a data project at EdSource soon when the need to present California’s new school dashboard came up. The following post is adapted from a slideshow I gave to my colleagues after I came back from the trip.
Tech Meets Journalism
Having been to tech conferences before, I’ll start by saying this: NICAR isn’t just another tech conference. It’s a journalism event that is highly technical. It also covers a lot of ground. While data journalism (taken to mean reporting on data) is at the heart of it, it’s not the only thing. NICAR stands for National Institute of Computer Assisted Reporting - and the CAR in NICAR is a term going back to the 60s which increasingly describes a wider variety of things than data and databases strictly speaking. Topics at NICAR include mapping and GIS, “signals journalism” which harnesses information from the Internet of Things, and all kinds of stuff related to the presentation of news + data which involve the whole rainbow of web publishing technologies.
I’m not going to describe the technical workshops I went to in-depth in this post, but here’s where over half my time went - straight into the coding side, including:
- Hands-on workshops in data analysis (R, Python)
- Working with data files / formats (CSV, JSON in R, Python)
- Creating maps in Leaflet.js
- Tools / platforms for creating daily news graphics
The rest of my time was happily spent balancing that out attending sessions that were less technical but no less substantive: sessions presenting specific data journalism projects, both the execution and delivery and the background on gathering the data and writing the story; sessions on the universe of data that’s out there, why it exists and how to get it; and sessions oriented towards specific beats, including education, criminal justice, race and civil rights, and housing / real estate.
Florida and the South made a strong showing
In both “The Year in Data Journalism” stand-out projects and the “Packaging a Data Story for Digital” session some really strong work from Florida newspapers and others in the South jumped out at me. Another other big theme at NICAR Jacksonville: data reporting on race and immigration, front and center, which felt really appropriate for this political moment.
A product of six years of work and $15,000 in data fees resulted in what’s probably one of the most empirically sound accounts of judicial bias and racial disparity in sentencing ever undertaken. The Sarasota Herald-Tribune mined millions of criminal justice documents to reveal in great detail - down to a map-by-county - just how much more confinement black offenders in Florida get when they are convicted of a crime than white ones.
“Thousands of police calls. You paid the bill.” The Tampa Bay Times did its own innovative data-driven reporting on criminal justice in Florida with a project on the striking number of calls to police made by Walmart stores, with analysis indicating Walmart was offloading routine store security to the state. (Sound familiar? This is the company whose employees make so little they rely on food stamps and Medicaid, after all.) The Tampa Bay Times paired their sobering report with a really fun, inventive vintage-CRT monitor presentation for the intro, a theme followed in a series of original infographics in the story.
Innovation in Data Journalism
There were some projects at this year’s NICAR that I was told really raised the bar relative to past data journalism efforts: big data, new types of data, and pushing the envelope in terms of presentation. Here’s a couple that stood out.
The data team over at the Center for Investigative Reporting, Eric Sagara and Scott Pham, did some intense work digging into satellite data to illuminate how wildfires in California spread, earning a Philip Meyer award for data reporting. With tinderbox woods close to urban areas in California, the danger has been exacerbated by years of drought and has in recent years yielded some startlingly rapid destruction.
This project featured:
- Multiple data sources: historical fire data covering three different fires
- Counterintuitive findings: fire spreading downhill
- Direct analysis of satellite signals data usually untouched by journalists
- Innovative story map presentation built on scrolling markers to pair a narrative with the interactive
- An alternate presentation for mobile devices
Peter Aldhous is a badass: a data journalist not only good at what he does, but who understands it well enough to explain it and teach it, which I can attest to from taking one of his R workshops as well as seeing him present this Buzzfeed News investigation reporting on aerial surveillance. The project found that flight patterns indicate a concentration of surveillance of urban, Muslim communities in the United States: which the FBI has denied.
This project featured:
- Both authoritative and crowdsourced data
- Huge volume of data on ~200 planes, churned through and crunched down with Python scripts
- Data then analyzed in R, then open-sourced on Buzzfeed’s Github
- Carto used to display
- Visual display of circling plane patterns which ended up being very revealing and attention-getting
- A nuanced analysis of the data and the findings
Education in focus
Since I work at a non-profit news organization covering education in the state of California, of course I was on the lookout for projects about education and schools.
The Houston Chronicle did an outstanding investigative package and web presentation of the findings, digging deep into special education policy and funding in Texas and the consequences of policies set a decade ago in the state. They made the data they used in the investigation available to the public, openly inviting follow-ups and fact-checking.
Not brand-new but remarkable work looking at resegregation in Florida schools, with major impact including increased funding for schools, parent and community involvement, limits on school suspension policies, and raises in teacher pay.
NPR uses an open-sourced toolchain to produce visually rich, quick-turnaround graphics for stories like this one on English Language Learners, which was the example used to introduce their dailygraphics rig. Dailygraphics lets you rapidly generate infographics using Python and assorted technologies. More about their cool tile map techniques here.
Lightning Talks Highlights
The lightning talks were a great way to take in the breadth of data journalism in the country in contrast to some of the more deep-dive sessions. Some of the most sobering talks underlined the fragility of the public data journalists rely on.
- Lots of discussion of data integrity / data refuges
- IRE announced it's going to create a data refuge on their website, helping to coordinate the efforts of the over 6,000 members of IRE - a recent article on Poynter discusses how they're tackling this
- "Rethinking the Story Model in the Age of Trump,” presented the idea of “turning spectacle into evidence” using live annotation and transcription - including real-time fact-checking - using NPR's anno-docs app
Data Sources of Interest
One of the reasons to go to NICAR is to learn about not just how to work with data, but where it is and what it is. Getting into a room with other people (or as may be the case, wandering in and out of them) can aid discovery in a way that clicking all day around the Internet never could. There are all sorts of data to learn about, so the following are just a few that stood out to me from certain sessions, and for a lot more go to the NICAR Database Library.
- OBTS (Offender Based Transaction Statistics) are the data that Bias on the Bench, mentioned above, mined to uncover judicial bias. These data are available only in certain states, but those include some of the largest, with the largest prison systems, like California.
- EMMA is used for tracking municipal bonds. It's primarily a tool for investors, but also of use to journalists. Municipal bonds, despite the name, are used not just by cities for a lot of other kinds of public entities.
- Zillow offers a surprising amount of all sorts of housing data publicly on their data portal. For reporters interested in housing affordability, home values, rents, and important metrics like negative equity that can help forecast economic trouble, this is a rich resource.
- Appearing at NICAR just days after damaging information about company culture was released, a representative from Über was almost apologetic as she announced movement.uber.com to an audience of skeptical journalists. This open platform presents data on urban travel times that may be of interest to reporters digging into stories on commutes, urban congestion, transit times relative to driving times, and so on.
Enterprise Data Journalism in 2017
So in 2017 what makes for a bold and fresh data story? These are the characteristics that stood out to me, as someone very much approaching data journalism and CAR as a newcomer:
- Digging into under-explored (and often hard to get or interpret) datasets
- Analyzing new types of data emitting from our rapidly changing technological ecosystem
- Churning through Big Data for fresh insights
- Doing stand-out presentation of the work
- Combining datasets in novel ways no one has thought to
- Last but first: Combining data with traditional reporting / storytelling
Case study: Settling for Misconduct
I was already a fan of the Chicago Reporter’s Settling for Misconduct project, so it was a treat to be able to see Data Editor Matt Kiefer present about the project. It embodies a lot of what strike me as the things that make a news app successful, from effective storytelling to intuitive navigation.
I also learned that the behind-the-scenes work included:
- Tons of FOIA requests
- Lots of data had to be manually keyed in
- Documentation - lots of it - which was key to success
- Effective use of frameworks; Backbone.js for rich web presentation UI
On the last day of the conference I took in one of my favorite sessions from the whole event: a crash course in command-line graphics from Jon Keegan. I was already familiar with some of the tools he covered, like imagemagick and ffmpeg, but it never occurred to me to use them in the way he suggested. Like a lot of the best things in technology, it’s more about the ingenious use of simple techniques than elaborate and overcomplicated solutions. Keegan showed how with these command line utilities and some basic scripting in Node or Python, you can take a directory of images and stitch them into a quilt, extract data from them, or make video from a sequence of photos.
The hypnotic video Keegan made from 11 years of space satellite photos, mostly of the planet Saturn and its rings and moons, was an unexpectedly cosmic high note on the last day of NICAR.
I’m already looking forward to heading to Chicago for NICAR 2018.