36-315 Report

Data Introduction

Our data comes from ACLED (Armed Conflict Location and Event Data), which is a non-profit organization that dis-aggregates conflict data and makes it publicly available. Specifically, we are using their data set on the Middle East, which contains all political violence events, demonstration events, and strategic developments from 2015-2025. Each row in this data set contains information on a type of event, meaning a protest, riot, battle, ect. The data set tracks 31 variables, our analysis focuses on the following:

Categorical Variables

disorder_type: Qualitative description of unrest cause
sub_event_type: Sub-category of event type
country: Country name
interaction: Parties involved, separated by “/”
inter1: One faction or group involved
civilian_targeting: Whether civilians were targeted
actor1: Primary actor orchestrating the attack
source_scale: Scale of media source (national, local, NGO, etc.)

Quantitative Variables

event_date: Date the unrest occurred (time-series index)
fatalities: Number of deaths
timestamp: UNIX epoch time (larger values are more recent)
longitude: Geographic coordinate
latitude: Geographic coordinate

Research Questions

In analyzing this data set, we sought to investigate the following research questions:

Has violence changed over time? If so, are there specific groups that can be attributed to these changes?
How do source scales of media influence reporting patterns of violent incidents with fatalities, and what factors can influence these reporting patterns?
Who are the biggest perpetrators of civilian violence in the Middle East, and where do civilian targeting incidents most often occur?
What types of civilian-targeted attacks were the most deadly, and how did their usage patterns vary across countries and over time?

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

ACLED_middle_east_data <- read_csv("MiddleEast_2015-2025_Mar14.csv")

Rows: 507557 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (22): event_id_cnty, disorder_type, event_type, sub_event_type, actor1,...
dbl   (8): year, time_precision, iso, latitude, longitude, geo_precision, fa...
date  (1): event_date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Research Question #1

We wanted to learn who are the biggest perpetrators of civilian violence in the Middle East, and where do civilian targeting incidents most often occur? To answer this question we chose to focus on the following 4 variables from ACLED:

fatalities: Number of deaths
civilian_targeting: Whether civilians were targeted
longitude: Geographic coordinate
latitude: Geographic coordinate

library(dplyr)
library(ggplot2)
library(maps)


Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map

# Specify the countries I want to include
countries <- c("Bahrain","Iran","Iraq","Israel","Jordan","Kuwait","Lebanon","Oman",   
               "Palestine","Qatar", "Saudi Arabia","Syria", "United Arab Emirates","Yemen",
               "Egypt","Turkey", "Ethiopia","Somalia","Sudan","South Sudan","Afghanistan","Pakistan",
               "Turkmenistan","Georgia","Armenia","Azerbaijan","Uzbekistan","Russia","Bulgaria",
               "Kazakhstan")

# Grab the map data from the world and filter for the countries I specified above
me_borders <- map_data("world") |>
  filter(region %in% countries)

# Create a dataset that includes the counts of civilian targeting incidents and fatality counts for civilian targeting incidents for each actor. Then in this dataset, rank the incident and fatality counts so that we can label who the worst actors of civilian violence are.
worst_civilian_targeters <- ACLED_middle_east_data |>
  filter(civilian_targeting == "Civilian targeting") |>
  group_by(actor1) |>
  summarise(incident_count = n(),
            total_fatalities = sum(fatalities)) |>
  ungroup() |>
  mutate(incident_rank = dense_rank(desc(incident_count)),
         fatality_rank = dense_rank(desc(total_fatalities)),
         is_worst = (incident_rank <= 10) | (fatality_rank <= 10))

# Join the ME violence dataset with the worst actors dataset so that we can filter for civilian targeting incidents done by the worst actors.
points_for_worst_actors <- ACLED_middle_east_data |>
  filter(civilian_targeting == "Civilian targeting") |>
  left_join(select(worst_civilian_targeters, actor1, is_worst), by = "actor1") |>
  filter(is_worst)

# Point colors that was found on R color Brewer
point_colors <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a', '#ffff99', '#b15928', 'black')

ggplot() +
  # Plot the map
  geom_polygon(
    data = me_borders,
    aes(x = long, y = lat, group = group),
    color = "black", fill = "antiquewhite", size = 0.2
  ) +
  # Plot points for worst actors
  geom_point(
    data = points_for_worst_actors,
    aes(x = longitude, y = latitude, color = actor1),
    size  = 1.5,
    alpha = 0.5
  ) +
  scale_color_manual(
    values = point_colors,
    name    = "Worst Perpetrators of Civilian Violence"
  ) +
  # Specify the range of the map
  coord_quickmap(xlim = c(25, 70), ylim = c(10, 45), expand = FALSE) +
  labs(
    title = "Civilian Targeting Incidents Across the Middle East",
    subtitle = "Dots represent incidents by the worst actors/perpetrators of civilian violence",
    caption = "Dots are colored by actor to show all civilian targeting incidents by the worst offenders, which is measured by number of incidents with civilian targeting and total number\nof civilians killed; the map highlights dense clustersin the Levant, Iraq, and Yemen.\n\nData Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com.",
    x = NULL,
    y = NULL
  ) +
  theme(
    plot.title    = element_text(size = 30),
    plot.subtitle = element_text(size = 25),
    plot.caption  = element_text(size = 20, hjust = 0, face = "italic"),
    legend.key.size = unit(1.5, "cm"),
    legend.text = element_text(size = 18),
    legend.title = element_text(size = 22),
    axis.text = element_blank()
  ) +
  # Increase the size of the colors in the legend
  guides(
    color = guide_legend(
      override.aes = list(size = 7)
    )
  )

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Graph #1 Interpretation and Takeaways

This map plots every civilian targeting event committed by the worst perpetrators of civilian violence in the Middle East that account for the largest shares of (a) incident counts or (b) civilian deaths in our data set. Each colored dot marks the longitude/latitude of an event, with dots sharing a color if they were carried out by the same actor.

We chose the “worst” perpetrators by doing the following:

Filtered the data set to include only events with civilian targeting.
For each actor we computed
1. incident_count: number of such events
2. total_fatalities: cumulative civilian deaths caused by these events
Ranked actors on both computed metrics and kept anyone appearing in the top 10 of either list.

The map shows that the densest clusters of civilian targeting incidents appear in the Levant, Iraq, and Yemen:

Levant: The main perpetrators are the Israel Defense Forces, ISIL, and rioters on both the Israeli and Palestinian sides, along with various other actors.
Iraq: Once again ISIL is the dominant force of civilian violence, joined by unidentified armed groups that account for a large share of attacks.
Yemen: Civilian violence is primarily conducted by Operation Restoring Hope (a U.S. led coalition) and the Military Forces of Yemen.

Now, to get a better understanding of which actors cause the highest numbers of incidents and fatalities, let’s examine the next graph.

# Label only the worst actors/perpatrators of civilian violence
label_df <- worst_civilian_targeters |> filter(is_worst)

worst_civilian_targeters |>
  ggplot(aes(x = incident_count, y = total_fatalities)) +
    # Label points for worst actors/perpatrators with their name
    geom_text(
      data = label_df, 
      aes(label = actor1), 
      show.legend = FALSE, 
      nudge_x = 110, 
      size = 3, 
      hjust = 0
      ) +
    geom_point(
      aes(color = is_worst), 
      show.legend = FALSE,
      alpha = 0.7
      ) +
    # Color points for worst actors/perpatrators with red
    scale_color_manual(
      values = c("TRUE" = "red", "FALSE" = "black")
    ) +
    # Expand the scale to the right to include Israel's label
    scale_x_continuous(
      expand = expansion(add = c(0, 2200))
    ) +
    labs(
      title = "Incidents vs Total Civilian Fatalities For Worst Perpatrators of Civilian Violence",
      subtitle = "Each point is an actor’s total civilian targeting incidents vs. total civilian fatalities; red highlights the worst perpetrators of civilian violence",
      caption = "Data Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com.",
      x = "Incident Count",
      y = "Total Civilian Fatalities"
    ) +
    theme_minimal()

Graph #2 Interpretation and Takeaways

For every actor in our data set, this scatter plot shows the total number of incidents involving civilian violence on the x-axis, and the total number of civilian fatalities on the y-axis. The dots colored in red mark the “worst” actors of civilian violence (see description in Graph #1 for how this was coded), while black dots represent all other actors.

Several patterns and observations are made clear through this graph. Specifically, as incident count increases, the total civilian fatalities tends to increase as well. Furthermore, the dots colored in red lie well above or to the right of the main cluster in the bottom right of the plot, confirming that the “worst” actors are responsible for a disproportionate amount of both civilian attacks and deaths. Finally, I would like to highlight an extreme outlier in the top right of the plot, the Military Forces of Israel, who are responsible for over 9,000 civilian targeting incidents and well over 40,000 civilian fatalities. This stark outlier shows that, even with data available only from 2022 onward, Israel’s military is the region’s largest contributor to civilian violence by far.

Research Question #2

In this section we seek to explore question of how different source scales of media influence reporting patterns of violent incidents with fatalities (which we deemed ‘serious’), and what factors can influence these reporting patterns. Throughout this section we focus on these variables to explore this question in different viewpoints.

source_scale: Scale of media source (national, local, NGO, etc.)

year: Year which the event occurred

fatalities: Number of deaths

interaction: Explain all of the actors invovled in the unrest

Since there are many source scale, but most of the reports of fatalities are reported by 3 major sources: National, Local-National, Local-Other, we will focus on these three sources throughout the exploration of this research question.

Graph #1 Interpretation and Takeaways

top_sources <- ACLED_middle_east_data %>%
  filter(fatalities > 0) %>%
  count(source_scale) %>%
  arrange(desc(n)) %>%
  slice_head(n = 3) %>%
  pull(source_scale)

# Filter and summarize data for top source scales
top_sources_data <- ACLED_middle_east_data %>%
  filter(source_scale %in% top_sources, fatalities > 0) %>%
  count(year, source_scale, name = "incident_count")

# Filter and summarize data for other source scales
other_sources_data <-ACLED_middle_east_data %>%
  filter(!source_scale %in% top_sources, fatalities > 0) %>%
  count(year, source_scale, name = "incident_count")

# Count background sources
background_sources <- unique(other_sources_data$source_scale)
num_background <- length(background_sources)

ggplot() +
  geom_line(data = other_sources_data, 
            aes(x = year, y = incident_count, group = source_scale),
            color = "gray80", alpha = 0.5, size = 0.6) +
  geom_line(data = top_sources_data, 
            aes(x = year, y = incident_count, color = source_scale),
            size = 1.5) +
  scale_color_manual(values = c("#d73027", "#fc8d59", "#fee090")) +
  scale_x_continuous(breaks = function(x) seq(floor(min(x)), ceiling(max(x)), 1),
                     labels = function(x) as.integer(x)) +
  labs(title = "Fatality Incidents by Source Scale Over Time",
       subtitle = "Highlighting top 3 source scales in Middle East conflict reporting",
       x = "Year", y = "Number of Fatality Incidents",
       caption = paste0("This visualization captures different reporting patterns. ",
                        num_background, " other source scales shown in gray.\n\nData Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com.")) +
  theme_minimal() +
  theme(panel.grid.minor = element_blank(),
        legend.position = "bottom",
        legend.title = element_blank(),
        plot.caption = element_text(hjust = 0, face = "italic"),
        axis.text.x = element_text(angle = 0))

This graphic displays the number of incidents reported by different source scales over time, illustrating how reporting patterns vary by year. The year variable was represented in x axis and number of incident that caused fatality is represented in Y axis.

As noted earlier, three primary reporting sources—Local partner-National, Local partner-Other, and National—were emphasized, while all other categories were shown in gray for background context.

Interestingly, although there were slight differences in the timing and fluctuation of peaks, the overall reporting patterns across these three sources were fairly consistent. All three experienced a noticeable peak between 2016 and 2018, followed by a general decline. Notably, the National source scale exhibited an unexpected spike in 2024.

To further explain difference in reporting pattern for source scale using another variable, we will look at these source scales with reference to interactions

Graph #2 Interpretation and Takeaways

fatality_data<-ACLED_middle_east_data %>%
  filter(fatalities>0 & source_scale %in% top_sources)

# Get top 15 interaction types by total count
top_interactions <- fatality_data %>%
  count(interaction) %>%
  arrange(desc(n)) %>%
  slice_head(n = 15) %>%
  pull(interaction)

# Filter for only the top interactions
heatmap_data <- fatality_data %>%
  filter(interaction %in% top_interactions) %>%
  count(interaction, source_scale) %>%
  complete(interaction, source_scale, fill = list(n = 0))

# Make sure both variables are properly factored
heatmap_data <- heatmap_data %>%
  mutate(
    interaction = factor(interaction, levels = top_interactions),
    source_scale = factor(source_scale)
  )

# Create the matrix for heatmap
interaction_scale_table <- heatmap_data %>%
  pivot_wider(
    names_from = source_scale, 
    values_from = n,
    values_fill = 0
  ) %>%
  column_to_rownames("interaction") %>%
  as.matrix()

Since interaction types of the incients were nuermous, we have selected 15 most common interaction patterns to showcase clean, identifiable trend for major 3 source scales

library(pheatmap)

pheatmap(interaction_scale_table,
         scale = "column",
         cluster_rows = FALSE,
         cluster_cols = FALSE,
         fontsize_row = 7,
         fontsize_col = 8,
         color = colorRampPalette(c("white", "lightblue", "darkblue"))(50),
         main = "Fatal Interactions by Source Scale\nDarker blue indicates higher reporting frequency")

Differences in reporting patterns are more apparent in this visualization, representing reporting frequencies of each type of interaction for 3 major news sources. (news source on x axis and interaction type on y axis)

Local partner–National and National media outlets often overlap in their coverage, frequently reporting on high-fatality incidents involving State forces vs. State forces and State forces vs. Rebel groups. However, they rarely report on State forces vs. Civilians conflicts. In contrast, Local partner–Other sources are more likely to cover incidents involving State forces and Civilians.

Moreover, the Local partner–Other scale also captures a wider range of civilian-related interactions that do not involve state forces, such as Political militias vs. Civilians and Rebel groups vs. Civilians.

This pattern may suggest that Local partners without national affiliation are subject to fewer constraints or censorship influenced by national agendas. While this hypothesis would require validation in a controlled, experimental or causal research setting, the observed data clearly show that reporting patterns are correlated with interaction type.

Research Question #3

Another idea we wanted to explore further was whether frequency of violence had fluctuated over time in the Middle East. Additionally, we also wanted to know if in the case that the frequency of violence had changed over time, were there any specific groups or entities that had also seen an increase or decrease in violent conflicts. To effectively answer these questions, we utilized the following variables from the ACLED data:

fatalities: Number of deaths
timestamp: UNIX epoch time (larger values are more recent)
inter1: One faction or group involved
event_date: Date the unrest occurred (time-series index)
disorder_type: Qualitative description of unrest cause

ACLED_middle_east_data |>
  ggplot(aes(x = timestamp, y = fatalities)) +
  geom_point(alpha = 0.4, aes(color = inter1)) + 
  guides(color = guide_legend(title = "Belligerents")) +
  labs(title = "History of Political Violence in the Middle East",
       x = "Timestamp (UNIX)", y = "Fatalities", 
       subtitle = "Larger UNIX value -> more recent",
       caption = "Belligerent of each given incident is displayed. External forces with varying \n allegiances were involved in many conflicts, leading to many orange points.\n\nData Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com.") +
  theme(plot.caption.position = "plot", plot.caption = element_text(hjust = 0,
                                                              face = "italic"))

chisq.test(ACLED_middle_east_data$fatalities, ACLED_middle_east_data$timestamp)

Warning in chisq.test(ACLED_middle_east_data$fatalities,
ACLED_middle_east_data$timestamp): Chi-squared approximation may be incorrect


    Pearson's Chi-squared test

data:  ACLED_middle_east_data$fatalities and ACLED_middle_east_data$timestamp
X-squared = 2450740, df = 1857300, p-value < 2.2e-16

Graph #1 Interpretations and Takeaways

The above scatterplot displays the time stamp of a singular political event in UNIX code on the x-axis, the number of fatalities on the y-axis, and each individual point is colored by one of the factions involved. Upon inspection, the number of fatalities seems to gradually increase as the timestamp value increases, indicating that violence has increased over time. Particularly, the number of fatalities increases noticeably around the timestamp value of 1.70e+09, which roughly translates to November 2023.

Specific factions involved in political events seem to have shifted as well. For instance, excluding external/other forces, there appeared to be a somewhat even mix of belligerents involved in these events at the beginning of the timestamp according to the scatterplot. However, more and more events have had state forces involved, denoted by the abundance of pink points on the right side of the scatterplot.

Additionally, we have ran a chi-squared test at significance level of 0.05 to formally assess whether there is a correlation between the number of fatalities and timestamp. The result is a p-value that is approximately zero, in which case we reject the null hypothesis and conclude that there is a relationship between the number of fatalities and timestamp.

To get a better understanding of exactly how violence fluctuates over time, let’s examine the next visual.

ACLED_middle_east_data |> 
  mutate(event_date = as.Date(event_date)) |>
  group_by(event_date, disorder_type) |>
  summarize(total_fatalities = sum(fatalities, na.rm = TRUE)) |>
  filter(disorder_type != "Political violence; Demonstrations") |>
  ggplot(aes(x = event_date, y = total_fatalities, color = disorder_type)) + 
  geom_line(alpha = 0.5) +
  labs(title = "Number of Fatalities Per Disorder Type Over Time",
       x = "Year", 
       y = "Total Fatalities",
       caption = "Data Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com."
       ) +
  guides(color = guide_legend(title = "Type of Disorder"))

`summarise()` has grouped output by 'event_date'. You can override using the
`.groups` argument.

Graph #2 Interpretation and Takeaways

The time series above displays the number of fatalities per the precise date of a given event for each type of disorder: demonstrations, political violence, and strategic developments.

Demonstrations (red) and strategic developments (blue) both have minor fluctuations in total number of fatalities between 2015 and 2025, but overall the shifts are minor and may be considered stable for the most part.

On the other hand, political violence (green) as proven to be erratic and extreme over the decade, noted by the sharp and sudden increases and decreases in total fatalities. There seems to be an overall peak in violence between 2016 and 2020, followed by a very sharp and narrow moment of violence in 2024.

This pattern may suggest that while most forms of disorder and events have been quite stagnant for most of the decade, violence has indeed been wildly fluctuating.

Research Question #4

The last question we investigated was what types of civilian-targeted attacks were most deadly and how their usage patterns varied across countries and over time. The first graph shows civilian-targeted attack fatalities by the type of attack and the country it occurred in, and the second graph shows the number of these attacks by type, over time. To effectively answer these questions, we utilized the following variables from the ACLED data:

civilian_targeting: Whether civilians were targeted
fatalities: Number of deaths
sub_event_type: Subcategory of event type
country: Country the event occurred in
event_date: Date the unrest occurred (time-series index)

Graph #1 Interpretations and Takeaways

top_countries <- ACLED_middle_east_data |>
  filter(civilian_targeting == "Civilian targeting") |>
  group_by(country) |>
  summarise(total_fatalities = sum(fatalities, na.rm = TRUE)) |>
  arrange(desc(total_fatalities)) |>
  slice_max(total_fatalities, n = 8)

civ_top <- ACLED_middle_east_data |>
  filter(civilian_targeting == "Civilian targeting", country %in% top_countries$country) |>
  group_by(sub_event_type, country) |>
  summarise(total_fatalities = sum(fatalities, na.rm = TRUE), .groups = "drop")

civ_top |>
  ggplot(aes(x = reorder(sub_event_type, -total_fatalities), y = total_fatalities, fill = country)) +
  geom_col(position = "stack") +
  labs(
    title = "Civilian-Targeted Attack Fatalities by Event Type and Country (Top 8)",
    x = "Event Type",
    y = "Fatalities",
    fill = "Country",
    caption = "\nData Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com."
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6))

The graph shows that air/drone strikes cause the highest number of fatalities in civilian-targeted attacks, followed by attacks (use of armed force, but not battle), shelling/artillery/missile attacks, and remote explosive/landmine/IED. The event types that caused the least number of fatalities were sexual violence and abduction/forced disappearance. The majority of these attacks occur in Palestine, Syria, and Iraq, meaning that these countries have been involved in broader military campaigns and experienced significant turmoil over time.

The fatalities are largely dominated by large-scale and high-intensity civilian-targeted violence, such as air/drone strikes and shelling/artillery/missile attacks, rather than smaller-scale or individual attacks such as suicide bombing or sexual violence. The concentration of deaths in Palestine, Syria, and Iraq indicates that civilians in these regions are consistently targeted and caught in broader military conflicts.

Understanding which type of attacks are specifically used to target civilians can help government prioritize developing defense and precautionary measures against these type of attacks. To get a better sense of the type of attacks and how frequently they have been used over the years, let’s look at our next graph.

Graph #2 Interpretation and Takeaways

data_filtered <- ACLED_middle_east_data |>
  filter(civilian_targeting == "Civilian targeting") |>
  mutate(event_date = as.Date(event_date)) |>
  mutate(year = as.numeric(format(event_date, "%Y"))) |>
  group_by(year, sub_event_type) |>
  summarise(num_events = n(), .groups = "drop")

data_filtered |>
  ggplot(aes(x = year, y = num_events, color = sub_event_type)) +
  geom_line(size = 0.8) +
  labs(
    title = "Civilian-Targeted Attacks by Event Type Over Time",
    x = "Year",
    y = "Number of Events",
    color = "Event Type",
    caption = "Data Source: Armed Conflict Location & Event Data Project (ACLED); accessed April 2025; available at acleddata.com."
  ) +
  scale_color_brewer(palette = "Paired") +
  scale_x_continuous(breaks = seq(min(data_filtered$year), max(data_filtered$year), by = 1)) +
  theme_minimal() +
  theme(legend.position = "bottom",
        legend.title = element_text(size = 8),
        legend.text = element_text(size = 6))

The graph shows that air/drone strikes and attacks (use of armed force, but not battle) are the most frequent type of civilian-targeted events across the years. There are two sharp spikes in the number of air/done strikes, one in 2017 and one in 2024, while attack events are consistently high over time. Other event types such as mob violence and abduction/forced disappearance steadily increase between 2017 and 2024, and less common event types such as grenades, suicide bombs, and sexual violence remain relatively low over time.

Aerial warfare (air/drone strikes) seems to occur in bursts rather than steady year-to-year increases, which indicates that air power may be used to respond to specific political or military situations instead being used routinely. Ground-based attacks are consistently and frequently used for civilian targeting across all the years, as it is a more conventional form of violence, and the increase in mob violence and abduction/forced disappearance suggests increasing unrest and new tactics being used to target civilians.

Overall, civilian-targeted violence is dominated by air/drone strikes and attacks, with air/drone strikes being more event-driven and ground attacks remaining more consistent and widespread. Understanding patterns in the types of attacks may help countries better anticipate certain attacks and develop strategies to better protect their citizens.

Conclusion and Future Work

Overall, our analysis of the ACLED Middle East data set revealed several key findings. First, civilian-targeted violence is highly concentrated in regions like Levant, Iraq, and Yemen, with a small set of actors, the Israeli military and ISIL, being responsible for a disproportionate share of incidents and fatalities. Additionally, reporting patterns on high-fatality incidents are heavily influenced by the scale of the media source, with local and independent outlets being more likely to cover civilian-targeted violence than national sources. Violence in the region has escalated over time, especially after 2023, and state forces have been increasingly involved. Lastly, air/drone strikes and conventional armed attacks have been the deadliest and most frequently used methods of targeting civilians, and usage patterns have varied in intensity across countries and years.

A potential direction for future work would be applying causal inference methods to better understand the variables that cause spikes in violence (outside the scope of this class), or integrating additional datasets to investigate the broader societal impacts of these violent events (refugee displacement, humanitarian aid records, etc.).

Data Source

Data sourced from the Armed Conflict Location & Event Data Project (ACLED), available at https://acleddata.com. Data accessed April 2025. Filters applied to select Middle East events (2015–2025).