Demographics of 2015 U.S. Police Killings
The Dataset
In the following report, we investigate four key demographic-driven analyses using observations from 2015 US police killings data. This data comes from FiveThirtyEight’s online database, which itself is an appended dataset that was collected by The Guardian in 2015. The dataset features a record of police killings specifically from January 1st to June 1st of 2015, including a number of variables helpful in our four analyses.
Each row in the dataset is one observation; one instance of a police killing. Each column is a piece of demographic information on the individual and the circumstances of the killing. Examples of individual demographic information include the individual’s name, race/ethnicity, age, and gender. Examples of variables detailing the circumstances of the killing include the date, the exact latitude and longitude of the killing, the cause of the death, and whether or not the victim was armed. Supplemental variables include demographic information about the county or locale in which the killing occurred; such as the local poverty rate, racial demographics, and the unemployment rate. In total, the dataset contains 467 observations of police killings across the US from January 1st to June 1st of 2015, with information contained in 34 variables.
Below, we detail each research question specifically.
Research Questions
These were the research questions we forcused on in relation to the dataset:
- What patterns exist in when and where police killings occur?
- What Are the Economic Characteristics of the Areas Where Victims Were Killed?
- Are there racial disparities in who is killed and how?
- Do armed victims die different deaths than unarmed victims?
Answering these 4 research questions would generate great insights into the nature of police killings in the year 2015 in the US specifically. So the following graphs were produced.
Graphs & Findings
1. What patterns exist in when and where police killings occur?
First, we explore what patterns may exist in where and when police killings occur across the continental US. In answering “where,” we decided to plot killings on a map, emphasizing visually where killings were taking place. The variables used in creating the maps were latitude
and longitude
, which allowed us to plot exactly the coordinates of each police killing. Furthermore, we used population
and county
to aggregate county-level and population-level information.
In answering the second part of the question, “when” police killings were happening, we created an ACF plot.
The two maps below suggest that the West Coast of the continental US saw more police killings on an absolute basis while the East Coast saw more police killings on a relative basis.
The map on the left aggregates the number of killings by county, and plots them on a map of the US on an absolute basis. The larger the red circle, the more killings that took place in that specific county. As highlighted on the map, more western counties had higher numbers of killings, with the leader in these statistics as Los Angeles County, with around 20 police killings between January 1 and June 1, 2015.
However, our conclusions slightly change when we evaluate the number of police killings on a relative basis. On the map on the right, instead of simply aggregating observations by county, we scaled the number of killings per 10,000 population. We find, as highlighted, more counties from the eastern US are flagged under this measure. The leader in police killings per 10,000 population is St. Lucie County, FL, with around 20 killings per 10,000.
What may be the cause of this discrepancy? A simple explanation is that counties in the western parts of the US are generally larger, and contain much higher populations. Thus, when we account for the population size of different counties, we find that on a per capita basis, counties on the East Coast saw more police killings. The importance of this finding can shed light on how population and geographical statistics may be misleading depending on how variables are used and scaled.
The ACF plot is based on the time series of daily killings from January 1 to June 1, 2015, aggregated with a weekly frequency. It shows the correlation between the number of killings on a given day and the number of killings at various weekly lags. For example, at lag 1 on the x-axis, the y-axis value represents the correlation between each day’s number of killings and the number of killings exactly one week earlier.
Based on the ACF plot, the spike at lag 1 exceeds the confidence band, indicating a statistically significant positive correlation between the number of killings on a given day and the number a week earlier. In other words, higher killings on a day last week tend to be associated with higher killings on the same day this week. Additionally, the spike near lag 0.5 also exceeds the confidence band but in the negative direction, suggesting a significant negative correlation with killings half a week earlier. Together, these patterns suggest possible differences in killing rates between weekdays and weekends, with weekday killings showing persistence across weeks and contrast with adjacent weekend days.
So if we look at the table, there appear to be more instances of killings on weekdays and especially Wednesdays. And there appear to be fewer instances of police killings on weekends.
So to answer the research question, there is strong evidence suggesting weekly patterns of police killings, and the frequency of police killings suggests that weekends have fewer police killings than weekdays, especially on Wednesdays.
2. What Are the Economic Characteristics of the Areas Where Victims Were Killed?
The tract unemployment rate histogram reveals a slightly right-skewed distribution, with a mode around 10%, and a maximum value of around 50%. The purpose of making this graph was to visualize the distribution of the unemployment rates where the police killings occurred. If the distribution of the unemployment rate is different from expected, then there is some dependence between frequency of police killings and a tract’s unemployment rate.
By adding a vertical line at the 2015 average national unemployment rate of 5.3%, we can see that the vast majority of the data’s tracts had an unemployment rate above this national average. In fact, around 85% of the data had an unemployment rate above this average. This suggests that where police killings occur is not independent of the unemployment rate of the area, as the average unemployment rate of the tracts in the dataset is closer to 11-12%. More specifically, police killings occur disproportionately more in areas with higher unemployment rates than would be expected if there was no relationship between unemployment rate and police killings.
A scatterplot is a great way to explore whether there is a relationship between the white population share and poverty rate of tracts where police killings occurred. The negative linear trend among the scatterplot is noticeable, but including the linear regression line, its slope (-0.218), and its 95% standard error bars is much more informative. The slope means that between two tracts, one with a 10% higher white population share tends to also have a 2% lower poverty rate. Including the error bars in the graph also highlights the significance of this relationship, as the shaded bars are very narrow and confirm the presence of a statistically significant relationship.
Also included on the graph is a “rug,” or markers along the x-axis showing how the white population share in the data is distributed. While the data is not perfectly uniformly distributed, there is not a large noticeable difference in frequency of police killings at a particular value of white population share. This suggests that police killings are not significantly more likely to occur given high or low white population shares.
3. Are there racial disparities in who is killed and how?
The plot above displays the distribution of age stratified by race. This allows us to compare and contrast the racial/ethnic groups. Immediately, one can see that the plots are not the same. Specifically, the white distribution differs from the other two.
The black and hispanic distributions are practically identical with their modes and medians falling well below the white population. The two are also generally downshifted, potentially indicating that younger black and Hispanic are more likely to be killed.
The white distribution on the other hand is much wider and has a noticeably fatter right tail. This indicates that older white victims are being killed more compared to the other racial/ethnic groups.
Pearson's Chi-squared test
data: table_data
X-squared = 11.302, df = 4, p-value = 0.02337
Gunshot Taser Vehicle
Black 115.84834 6.867299 3.284360
Hispanic/Latino 59.76303 3.542654 1.694313
White 212.38863 12.590047 6.021327
This mosaic plot compares race and cause of death exploring if some racial/ethnic groups are more or less likely to be killed by a specific means.
Looking at the plot, one can see that lack of color. This indicates little significant differences/dependence between the categorical variables.
The only colored cell is for black victims dying by taser, showing that black victims are dying at significantly higher rates than other combinations of race and cause of death. While the lack of color does suggest little statistical significance, using a chi squared test of independence at a significance level of .05, we reject the assumption that the two categorical variables are independent.
4. Do armed victims die different deaths than unarmed victims?
Lastly, the variable armed
was interesting to explore because there may be an effect of how victims died depending on their armed status. So we looked at the variables armed
(armed status of the victims), cause
(cause of death of victims) to answer the question, “Do armed victims die different deaths than unarmed victims?”
This faceted proportional bar chart facets on the armed status of the victim. The armed status is defined as whether the victims were possessing a knife, a firearm, driving a vehicle, etc. In each facet, the x-axis is the cause of death and the y-axis is the proportion of each cause of death in the specific facet. The bar only colors 1 column to highlight the most common cause of death in police killings, which is gunshot for both armed and unarmed victims.
To perform higher-level analysis, a Chi-sq test was performed on the victim’s armed status and cause of death. The resulting p-value is 0.425, suggesting an insignificant association between the 2 variables. This also implies that the distribution of the cause of death for both armed and unarmed victims in police killings is similar.
So, from the above graph, it strongly suggests that gunshots are the most common cause of death in police killings and that armed victims do not die differently from unarmed victims in terms of cause of death.
Now that we have established that gunshots are the most common cause of death in police killings regardless of whether or not a victim is armed or unarmed, the question naturally becomes why we see so much deadly force utilized in policing in the US. To potentially answer this, we look specifically at armed victims gunned down by police, and what types of weaponry victims have on them. We see in the bar plot that most victims gunned down also possessed a firearm by a fairly wide margin. Out of all armed victims, 54.5% possessed firearms. Additionally, out of all police killings victims recorded in the period between January 1 and June 1, 2015, 42.6% possessed firearms.
The question of whether or not so much deadly force, especially firearms, should be utilized in policing may no doubt be more of a social and political question rather than a purely statistical one. Nonetheless, the data tell us that perhaps a reason for police carrying and utilizing deadly force is that there exists a high potential for firearms and deadly force usage in confrontations with suspects and victims, no matter if a killing is justified under the eyes of the law.
Conclusion
Using 2015 US police killings data sourced from FiveThirtyEight and The Guardian, we investigated four key demographic-driven analyses. Here is a refresher of our four demographic areas that we focused in:
What patterns exist in when and where police killings occur? What Are the Economic Characteristics of the Areas Where Victims Were Killed? Are there racial disparities in who is killed and how? Do armed victims die different deaths than unarmed victims?
In first investigating when and where police killings occur, we discovered via geospatial mapping that geographic areas counts may offer different conclusions based on absolute versus relative counts, while killings appear to occur more often on weekdays rather than weekends according to our time-series ACF plot.
Next, we looked at various economic factors and characteristics of the locales in which police killings occurred. Using a histogram, we found that police killings occur disproportionately more in areas with higher unemployment than expected if there was instead no relationship between unemployment rate and police killings. Additionally, by plotting poverty rate against share of white population, we discover that in regions/tracts with police killings, one tract with a 10% higher white population share tends to also have a 2% lower poverty rate.
We also wanted to conduct a thorough analysis of whether there exists racial disparities in who is killed and how. Using a ridge plot of age distributions of white, Hispanic, and black victims, we find that the white distribution is much wider and has a noticeably fatter right tail, suggesting that older white victims are being killed more compared to the other racial/ethnic groups. We further utilized a mosaic plot that compares race and cause of death exploring if some racial/ethnic groups are more or less likely to be killed by a specific means. Using a chi squared test of independence, we reject the assumption that the two categorical variables are independent.
Lastly, we investigated whether armed victims die different deaths than unarmed victims. To do this, we created both a faceted proportional bar plot as well as a 1 dimensional bar plot. We found that across armed and unarmed victims, most victims were killed by police gunshots. Additionally, police utilization of such deadly force may exist because of the presence of deadly force carried by victims themselves.
Future Work
An interesting question that arose from investigating the location of and when police killings occur is the difference in police staffings across the country. Why is there a disparity in per capita police killings between the West Coast and East Coast of the countries? Why staffing factors may influence the response to confrontations between weekday and weekend police work? Furthermore, in looking at whether armed vs unarmed victims die different deaths, why is deadly police force via firearms such a hallmark of US police work? Is it simply due to the fact that high private gun ownership exists in the US, or all there other extraneous variables and factors in police training programs and recommendations?
These questions are much more probing into the political, social, and perhaps ethical issues that cannot be addressed strictly with data, and even much less so given the current dataset that we could work with. We would first need more transparency and data into how police departments across the country function, and what structures are currently in place that may facilitate some of the patterns we uncover in our present analyses. Nonetheless, these are questions that should be asked, and further statistical and data analysis may shed some light on better understanding these problems that would leave decision-makers on a policy level more informed into how to leverage these findings to better our society.