Netflix’s global library evolves continuously. New titles drop year-round, spanning diverse genres and markets, and helmed by various directors. Just like any other business, industry insights suggest underlying trends: viewership may swell during festive seasons, certain genres might resonate differently in the U.S. other countres, blockbuster revenues could cluster in key markets, and star directors often drive genre-specific success. By examining these four dimensions together, time (seasonality), genre performance, market geography, and directorial influence, we aim to uncover a cohesive story of what makes Netflix movies thrive.
Netflix Movies and TV Shows till 2025
This dataset was sourced from TMDb (via Kaggle), including titles, genres, release dates, ratings, descriptions, and other relevant metadata. It provides a comprehensive overview of Netflix’s entire content library through 2025.
Each row of the dataset represents one Netflix movie title, and we focused our analysis on the following variables, which are features describing each title:
date_added: when the title became available on Netflix, formatted as YYYY-MM-DD
director: director of the movie
popularity: a numeric score indicating how popular the title was
genres: category/categories of the title (e.g. Animation, Crime, Fantasy, Romance)
country: location(s) where the film was produced
revenue: box-office revenue in USD
release_year: year the title was released
Other variables that were not used in analysis, but were provided by the dataset are:
show_id: unique identifier for each title
type: entry type, either “Movie” or “TV Show” (this dataset only have type == “Movie”)
title: name of the movie
cast: principal cast members
rating: numeric score between 0-10
duration: length of the movie, in minutes
language: primary audio language
description: brief synopsis
vote_count: number of user ratings
vote_average: average user rating, ranging from 0-10
budget: production budget in USD
Research Question #1: Does Netflix Movie Popularity Vary Throughout the Year?
Netflix releases content year-round, just like how other industries boom and fall during certain times of the year, we also wonder if Netflix’s streaming business also follows something similar. To test whether movie popularity follows a recurring annual cycle, we ask: Does Netflix movie popularity vary systematically by calendar month? To answer this question we examined the following variables: date_added and popularity.
# Librarieslibrary(forecast)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
library(ggplot2) library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
library(dendextend)
---------------------
Welcome to dendextend version 1.19.0
Type citation('dendextend') for how to cite the package.
Type browseVignettes(package = 'dendextend') for the package vignette.
The github page is: https://github.com/talgalili/dendextend/
Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
You may ask questions at stackoverflow, use the r and dendextend tags:
https://stackoverflow.com/questions/tagged/dendextend
To suppress this message use: suppressPackageStartupMessages(library(dendextend))
---------------------
Attaching package: 'dendextend'
The following object is masked from 'package:stats':
cutree
# Reading in the datamovie <-read_csv("netflix_movies_detailed_up_to_2025.csv")
Rows: 16000 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): type, title, director, cast, country, genres, language, description
dbl (8): show_id, release_year, rating, popularity, vote_count, vote_averag...
lgl (1): duration
date (1): date_added
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Reformat date_added movie <- movie %>%mutate(date_added =as.Date(date_added, "%Y-%m-%d"))# Construct a time series & perform STL Decompositionmov_monthly <- movie %>%filter(!is.na(date_added)) %>%mutate(YearMonth =floor_date(date_added, "month")) %>%# Turned 2004-07-18 into 2004-07-01group_by(YearMonth) %>%summarize(mean_pop =mean(popularity, na.rm =TRUE), .groups ="drop") %>%arrange(YearMonth)mov_ts <-ts(mov_monthly$mean_pop,start =c(year(min(mov_monthly$YearMonth)), month(min(mov_monthly$YearMonth))), frequency =12)stl_decomp <-stl(mov_ts, s.window ="periodic")# Extract seasonal component ts_df <-data.frame(Date = mov_monthly$YearMonth, Seasonal = stl_decomp$time.series[, "seasonal"])# Plot ggplot(ts_df, aes(x = Date, y = Seasonal)) +geom_line(color ="steelblue", size =0.7) +scale_x_date(date_breaks ="1 year", date_labels ="%Y") +labs(title ="Seasonal Component of Netflix Movies Popularity", x ="Year",y ="Seasonal Effect on Popularity",caption =paste("Movie popularity peaks in December, with a smaller rise in June–July.\n","April shows the lowest popularity, followed by a dip around August–September.")) +theme_classic() +theme(axis.text.x =element_text(angle =45, hjust =1))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Don't know how to automatically pick scale for object of type <ts>. Defaulting
to continuous.
The first graph suggests that Netflix movie popularity exhibits a consistent annual cycle. We see peaks and dips around the same time each year. The zoomed view of the yearly pattern in the second graph provides a closer look at this cycle. There, one can see that popularity typically reaches its highest point in December—likely due to the holidays—but we also see a smaller peak around June–July, which may correspond with the timing of summer break. On the other hand, popularity reaches its lowest point around April—potentially indicative of final-exam season—and we see a smaller dip around August–September, which may reflect the back-to-school period.
To determine whether there is a difference in the mean seasonal effect for each month, we used a one-way ANOVA where the null hypothesis is that all monthly means are equal and the alternative is that at least one mean differs. The ANOVA yielded an extremely low p-value (p = 2e-16), so we reject the null hypothesis, indicating that at least one mean differs, suggesting that month-of-year explains a reliable share of variation in the seasonal component. A subsequent Tukey HSD test confirmed all month-to-month differences, such as December versus April and June versus April, are statistically significant. Overall, this shows that popularity is highly seasonal and rather predictable from year to year.
# ANOVAanova_seasonal <-aov(Seasonal ~ month, data = seasonal_df)summary(anova_seasonal)
Df Sum Sq Mean Sq F value Pr(>F)
month 11 1416 128.7 3.925e+31 <2e-16 ***
Residuals 180 0 0.0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Test which months are differentTukeyHSD(anova_seasonal)
Research Question 2: How has the popularity of Netflix’s top three genres evolved in our four key markets?
We set out to explore how Netflix’s genre mix has shifted over time in our four focus markets—and how that compares to the broader global picture. To answer this, we first look at the annual share (2010–2025) of each market’s three most prolific genres—Drama, Comedy and Thriller (Japan substitutes Animation for Thriller).
movie <-read_csv("netflix_movies_detailed_up_to_2025.csv")
Rows: 16000 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): type, title, director, cast, country, genres, language, description
dbl (8): show_id, release_year, rating, popularity, vote_count, vote_averag...
lgl (1): duration
date (1): date_added
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
movie_long <- movie %>%filter(!is.na(country)) %>%separate_rows(country, sep =",") %>%# one row per countrymutate(country =str_trim(country)) %>%filter(country %in%c("France", "Japan","United Kingdom", "United States of America" )) %>%separate_rows(genres, sep =",") %>%# one row per genremutate(genres =str_trim(genres))# 2. top-3 per market + line data -------------------------------------------------top3_genres <- movie_long %>%group_by(country, genres) %>%summarise(n_titles =n_distinct(show_id), .groups="drop") %>%arrange(country, desc(n_titles)) %>%group_by(country) %>%slice_head(n =3) %>%select(country, genres)movie_top3 <-inner_join(movie_long, top3_genres, by =c("country","genres"))genre_trends <- movie_top3 %>%group_by(country, genres, release_year) %>%summarise(count =n_distinct(show_id), .groups="drop") %>%left_join( movie_long %>%group_by(country, release_year) %>%summarise(total =n_distinct(show_id), .groups="drop"),by =c("country","release_year") ) %>%mutate(percent = count/total*100)# factor‐order so facets always in FR/JP/UK/US order:genre_trends <- genre_trends %>%mutate(country =factor(country, levels =c("France","Japan","United Kingdom","United States of America" )))# 3. directly labeled faceted line chart ------------------------------------------last_pts <- genre_trends %>%filter(release_year ==max(release_year))plot_lab <-ggplot(genre_trends, aes(release_year, percent, color = genres)) +geom_line(linewidth =1.2) +facet_wrap(~ country) +scale_color_brewer(palette ="Set2") +coord_cartesian(xlim =c(2010, 2027)) +labs(title ="Evolution of Top-3 Netflix Genres by Country (2010–2025)",subtitle ="Each line is a genre’s share of that country’s yearly output",x ="Year",y ="Percentage of Titles (%)",caption ="Data: Netflix Movies & TV Shows (Kaggle)" ) +theme_minimal(base_size =13) +theme(legend.position ="none",panel.grid.minor =element_blank(),panel.grid.major.x =element_blank(),strip.text =element_text(face ="bold"),plot.title =element_text(face ="bold") ) +geom_point(data = last_pts,aes(x = release_year, y = percent),size =2 ) +geom_text_repel(data = last_pts,aes(label = genres),nudge_x =1.5,direction ="y",hjust =0,segment.size =0.3,segment.color ="grey80",size =3.5 )print(plot_lab)
This chart shows each country’s top three genres—Drama, Comedy, and Thriller (Japan substitutes Animation for Thriller)—as a percentage of all titles released each year. In France, the U.K., and the U.S., Drama consistently holds around 50–65 % of annual output, while Comedy and Thriller oscillate between 20–40 %.
Japan stands apart: Animation rises from roughly 40 % in 2010 to over 60 % by the late 2010s and remains the # 1 genre, with Drama and Action each around 30–50 %.
Takeaway: Drama is the backbone of Netflix’s global catalogue, but Japan’s sustained preference for Animation (now its # 1 genre) highlights the need for regionally tailored commissioning and marketing.
Then we will take a look at the annual percentage share of the top-10 genres worldwide (2010–2025).
# 4. global heatmap of top-10 genres over time -------------------------------------#global share by genre & year (drop NA)global_trends <- movie_long %>%filter(!is.na(genres)) %>%group_by(genres, release_year) %>%summarise(count =n_distinct(show_id), .groups="drop") %>%group_by(release_year) %>%mutate(total =sum(count)) %>%ungroup() %>%mutate(percent = count/total*100)# pick top-10 genres by total volumetop10_genres <- global_trends %>%group_by(genres) %>%summarise(total_count =sum(count), .groups="drop") %>%arrange(desc(total_count)) %>%slice_head(n=10) %>%pull(genres)# filter + reorder so highest‐peak genres sit at topheat_data <- global_trends %>%filter(genres %in% top10_genres) %>%group_by(genres) %>%mutate(max_share =max(percent)) %>%ungroup() %>%mutate(genres =fct_reorder(genres, max_share))# plot heatmapggplot(heat_data, aes(release_year, genres, fill=percent)) +geom_tile(color="white") +scale_x_continuous(breaks =seq(2010, 2025, 5)) +scale_fill_viridis_c(option="D", begin=.2, end=.8,name="Share of Titles (%)" ) +labs(title ="Global Evolution of Netflix’s Top-10 Genre Shares",subtitle ="Only the ten genres with the highest overall counts are shown",x ="Year",y =NULL ) +theme_minimal(base_size=14) +theme(panel.grid =element_blank(),axis.text.x =element_text(angle=45,hjust=1),legend.position ="bottom" )
This heatmap displays the annual share of the ten most-common genres worldwide. Drama and Comedy dominate at 15–20 %, yet Horror spikes mid-decade, Action and Crime steadily climb, and Sci-Fi surges post-2020.
We can see the key pattern here is while Drama and Comedy dominate, genres like Horror, Action, Sci-Fi and Crime have also steadily grown—peaking in different years—suggesting Netflix’s overall catalog is diversifying. What we can make from here is: Even as Drama remains king, rising niches point to new content‐investment opportunities beyond the established top-3.
Takeaway: Netflix must keep Drama front and center but also lean into both regional hits (e.g. Animation in Japan) and emerging global niches to fuel its next wave of growth.
Research Question 3: How does the average revenue of Netflix movies varies by genre and country
Netflix releases a wide variety of movies across different genres and countries. Given the differences in media preferences around the world, we want to examine how the average revenue of Netflix movies varies by genre and country, and what this can reveal about global media markets. We examined the following variables: genre, country, and revenue.
clean <- movie %>%filter(!is.na(genres), !is.na(country), revenue >0) %>%mutate(main_genre =str_split(genres, ",") %>%map_chr(1) %>%str_trim()) %>%mutate(main_country =str_split(country, ",") %>%map_chr(1) %>%str_trim())top_5_countries <- clean %>%count(main_country, sort =TRUE) %>%slice_max(n, n =5) %>%pull(main_country)filtered <- clean %>%filter(main_country %in% top_5_countries)avg_revenue <- filtered %>%group_by(main_country, main_genre) %>%summarise(avg_revenue =mean(revenue), .groups ="drop")top_genres <- avg_revenue %>%group_by(main_country) %>%slice_max(avg_revenue, n =3, with_ties =FALSE) %>%mutate(main_genre =fct_reorder(main_genre, avg_revenue)) %>%ungroup()ggplot(top_genres, aes(main_genre, avg_revenue, fill = main_genre)) +geom_col(position="dodge") +geom_text(aes(label = scales::label_number(scale =1e-6, accuracy =1) (avg_revenue)), vjust =-0.1, size =2.5) +facet_wrap(~main_country) +scale_fill_brewer(palette ="Set3", name ="Genre") +scale_y_continuous(labels =label_number(scale =1e-6)) +labs(title ="Average Revenue of Top 3 Netflix Genres across Countries",subtitle ="based on the 5 countries with the most Netflix titles",x =NULL, y ="Average Revenue (Millions)") +theme_minimal() +theme(axis.text.x =element_blank())
This side-by-side bar graph presents the average revenue for the top three Netflix genres in each of the five countries with the most titles on Netflix: Canada, France, Japan, the United Kingdom, and the United States. Countries with larger film industries, such as the U.S. and U.K., tend to show higher average revenues and stronger genre specialization, with each of their top genres surpassing 200 million dollars. Notably, the Fantasy genre in the U.K. stands out with an average revenue of 643 million dollars, likely driven by globally popular franchises like Harry Potter. In contrast, France records the lowest revenues across its top genres—Fantasy, Adventure, and Action—each generating under $60 million, possibly reflecting a more localized media market with less emphasis on globally scaled productions. Fantasy appears among the top three genres in four of the five countries, frequently achieving the highest average revenue, indicating widespread global appeal. Adventure and Animation also perform well across multiple countries. Overall, the data highlights how cultural preferences, production capabilities, and market scale shape genre success in global media markets.
This heatmap presents a broader view of how average revenue varies across genres and countries. Each tile represents the average revenue for a specific genre-country combination, with brighter colors indicating higher revenues. The Fantasy genre in the United Kingdom and the Western genre in Belgium represent the extremes, with average revenues of $643 million and $0.1 million, respectively. The United States shows consistently high revenue across multiple genres, especially in Adventure, Animation, and Family, which aligns with the previous bar graph, confirming that its top genres also perform best in revenue terms. The United Kingdom and China also have bright colors across the rows. In contrast, countries like Belgium and Germany display darker tones across most genres, indicating relatively lower average revenues, likely due to smaller or more localized media markets. Looking across the columns, genres such as Action, Adventure, and Fantasy tend to perform well, while Documentaries show the lowest average revenues. Overall, the heatmap effectively illustrates the uneven distribution of genre success across global media markets, supporting the idea that both cultural and economic factors influence Netflix revenue patterns.
Research Question #4: What production patterns emerge when we examine top directors’ genre outputs across countries?
Netflix’s global catalog is driven by directors whose genre mixes may reflect regional tastes. We ask: Do the most prolific directors in the U.S., UK, Canada, France, and Japan show distinct patterns in their Comedy, Drama, and Thriller outputs? To answer this, we examined the variables director, country, and genres (filtered to top 3 genres: Comedy, Drama, Thriller) and focused on each country’s top three directors.
Rows: 16000 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): type, title, director, cast, country, genres, language, description
dbl (8): show_id, release_year, rating, popularity, vote_count, vote_averag...
lgl (1): duration
date (1): date_added
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
movies_expanded <- movies %>%mutate(country =str_trim(country),genres =str_trim(genres) ) %>%separate_rows(country, sep =",\\s*") %>%separate_rows(genres, sep =",\\s*") %>%mutate(country =case_when( country =="United States of America"~"US", country =="United Kingdom"~"UK",TRUE~ country ) )# Select top 5 countries by number of moviestop_countries <- movies_expanded %>%count(country, sort =TRUE) %>%slice_max(n, n =5) %>%pull(country)m1 <- movies_expanded %>%filter(country %in% top_countries)# Select top 3 genres overalltop_genres <- m1 %>%count(genres, sort =TRUE) %>%slice_max(n, n =3) %>%pull(genres)m2 <- m1 %>%filter(genres %in% top_genres)# For each country, pick its top 3 directorstop_directors <- m2 %>%count(country, director, sort =TRUE) %>%group_by(country) %>%slice_max(n, n =3) %>%ungroup() %>%pull(director) %>%unique()m3 <- m2 %>%filter(director %in% top_directors)# Count number of movies per director-genre-countrycounts <- m3 %>%count(country, director, genres, name ="movie_count")# faceted bar plotggplot(counts, aes(x = movie_count, y = director, fill = genres)) +geom_col(position =position_dodge(width =0.7), width =0.7) +facet_wrap(~ country, nrow =2, scales ="free_y") +scale_fill_brewer("Genre", palette ="Set2") +labs(title ="Top Netflix Directors by Country and Genre",x ="Number of Movies",y =NULL ) +theme_minimal(base_size =12) +theme(axis.text.y =element_text(size =8),panel.grid.major.y =element_blank() )
The first graph is a faceted bar chart shows that U.S. directors (Tyler Perry, Steven Soderbergh) overwhelmingly produce Drama, France’s Quentin Dupieux leads in Comedy, and the UK/Japan directors have more balanced mixes. Canada’s output is lower overall, with Uwe Boll slightly Thriller-leaning.
dend <-as.dendrogram(hc)dend_k <-color_branches(dend, k = k)plot(dend_k,main =paste("Dendrogram of Directors\nColored by", k, "Clusters"),ylab ="Height")
The dendrogram reveals three natural groups: 1. Balanced/low-output (e.g. Quentin Dupieux, Uwe Boll) with under ~10 films and mixed genres. 2. Mid-volume, drama-weighted (Stephen Frears, Hirokazu Kore-eda, Cédric Klapisch) with moderate Drama plus some Comedy/Thriller. 3. High-volume drama specialists (Tyler Perry, Steven Soderbergh, François Ozon) whose Drama output far exceeds other genres.
Together, these visualizations show that Netflix’s top directors split into three clear genre‐production profiles, with the U.S. dominating drama specialists, France/Canada populating the balanced cluster, and UK/Japan directors sitting in the middle. This aligns with regional industry trends—Hollywood’s drama focus, France’s comedy heritage, and eclectic portfolios in the UK and Japan.
Conclusion
Overall, the analyses across all four research questions reveal consistent and well-supported insights into Netflix’s global movie trends.
First, movie popularity on Netflix exhibits a clear seasonal pattern, with peaks around December and summer months and dips around exam and back-to-school periods. Statistical testing confirms that month-to-month differences in popularity are significant and reliable, suggesting that seasonal cycles are a major factor in Netflix viewership. In examining genre evolution across key markets, Drama remains the dominant genre in most countries, but Japan’s unique and growing preference for Animation highlights the importance of regional tastes. The expansion of genres like Horror, Action, and Sci-Fi worldwide further suggests Netflix’s catalogue is becoming increasingly diverse. Average revenue analysis by genre and country shows that countries with larger film industries, such as the U.S. and U.K., not only dominate in the number of titles but also generate much higher revenues. The side-by-side bar graph and heatmap together show clear differences across countries and genres. Finally, the exploration of top directors’ genre outputs reveals distinct production patterns: U.S. directors predominantly specialize in Drama, while directors from France, the U.K., Canada, and Japan show more balanced or mixed outputs. The dendrogram analysis captures three natural groupings of directors by their genre focus and volume of production, aligning well with known regional filmmaking trends.
These findings confirm that Netflix’s global movie market is shaped by a mix of predictable cycles, evolving genre trends, revenue potential, and differentiated director profiles across countries. These insights are well-supported by the graphs and statistical analyses presented.
Unanswered Questions and Future Insights
Although our four research questions paint a coherent picture of Netflix’s recent evolution, they inevitably leave several avenues unexplored. Most notably, our analysis treats Netflix’s catalogue as a fixed universe and measures popularity in relative terms, yet it does not model why titles become popular. Future work could incorporate external drivers—marketing spend, social-media buzz, theatrical runs, or day-and-date releases—to isolate the causal mechanisms behind the seasonal peaks we observed. Doing so will likely require data that are not publicly available in the TMDb export and would benefit from time-series methods that go beyond STL decomposition, such as dynamic regression or state-space models that can accommodate advertising shocks and lagged word-of-mouth effects.
We have also focused on movies, top genres, and the five largest producing countries; as a result, television series, niche genres, and emerging markets appear only at the margins of our heatmaps and dendrograms. Extending the study to the full set of titles—including regional originals, limited series, and micro-budget films—could reveal whether the diversification we documented in global genre shares is accelerating in smaller territories. That expansion, however, will require more sophisticated text-mining to reconcile the long tail of genre labels and hierarchical modelling to stabilise estimates in markets with few yearly releases. Finally, we treated directors as independent units; future work could embed them in collaboration networks and use community-detection algorithms or mixed-membership models to capture cross-country creative clusters. Collectively, these extensions would deepen our understanding of Netflix’s content strategy—but they demand data sources and statistical techniques that lie beyond the scope of the current course.