As I noted in an earlier post, the powerful thing about data visualization (and analytics in general) is that it reveals insights that otherwise remain hidden. The objective with visualization is to see more clearly (and typically, we’re aiming to see problems more clearly so we can imagine solutions).
As a case in point, I’ve visualized some Philadelphia crime data from 2006-2013. What’s interesting is that I’ve never been to Philadelphia. I know very little about the city, but by using the right technique, you can start to see patterns.
To be clear, this doesn’t at all tell the full story; there are subtleties that could take days and weeks to uncover, and would likely require expertise in the city (or, collaboration with a subject matter expert). That, however, is the nature of analytics. You can always dig a little more. Still, it doesn’t detract from the main point: data visualization works for seeing and communicating insightful patterns in data.
(Note: All of the following visualizations were produced using R’s GGPlot2 and GGMap packages. Data is from www.opendataphilly.org)
As seen in the map above (which contains all Philadelphia crime, 2006-2013), crime appears to be centered around Center City. That’s not really the full story though. Thefts are the most frequent incident type, and it’s actually thefts that are most concentrated in Center City. Once you take a look at a few different crime categories, they show slightly different patterns. Below, I’ve broken out a few (but not all) crime categories for a closer look.
As I noted, thefts are strongly concentrated in the Center City area.
When we take a look at thefts year-by-year, there’s really not much change geographically.
Residential burglaries are most concentrated in Lower North Philadelphia and South Philadelphia, though year-by-year, there are slight changes. (Note: this data excludes non-residential burglaries. Non-residential burglaries exists in the overall crime dataset, but I haven’t included it in the two maps below.)
(Note: the overall incident counts are somewhat low when we break out the data year-by-year.)
Vehicle thefts are strongly concentrated in Kensington.
Keep in mind that here, “vehicle thefts” includes “recovered stolen motor Vehicle” and “motor vehicle theft.” They were separate categories in the original dataset, but for the purposes of analysis, I combined them into one category called “vehicle thefts.
When we look at Vehicle Thefts by year, there seems to be a decrease in incidents, although Kensington appears to have the most incidents in any given year.
I quickly charted the data over time, and sure enough, vehicle thefts have decreased over time.
Also note that in 2010 and 2011, there was a relative increase in the per portion of vehicle thefts which weren’t recovered (labeled as “Motor Vehicle Thefts” in the legend below).
Homicides were strongly concentrated in Kensington and North Philadelphia.
(Keep in mind that when plotted geographically, the data counts are somewhat low.)
Similar to homicides, aggravated assaults are highly concentrated in Kensington and North Philadelphia, but with high concentrations in South Philadelphia and West Philadelphia.
(Note: for the purposes of analysis, I grouped together aggravated assault “with firearm” and “no firearm.” They are separate in the original dataset.)
All in all, there were about 730,000 rows in the original data set, which is really impossible to make sense of in spreadsheet format. Our brains are simply not wired to make sense of row-and-column data.
That said, our brains are absolutely wired for visual information. Without going into the cognitive science or neuroscience in this post (that’s another post for another time), I’ll repeat that our brains are really wired to understand data in visual format. If you learn a little about how the brain processes visual information, you can apply that knowledge to create insightful data visualizations. You’re using visualization to communicate directly to the brain, both to your own brain as an analyst (when you’re doing exploratory analysis) or to others’.
Visualization then, is sort of a way of translating data from a format that we really can’t understand (row-and-column data) into a format that we do. In fact, it’s translating the data into a format that we understand almost intuitively.
This is the power of visualization. It’s not just about making beautiful images out of data. At it’s best, data visualization provides a means for seeing hidden patterns in data and communicating them to others.