Introduction

Data Description

The dataset that we chose to analyze consists of the IMDb scores and related information for four decades of movies. This data was scraped from IMDb and can be found on https://www.kaggle.com/datasets/danielgrijalvas/movies?resource=download. This dataset contains approximately 6,820 different movies from 1986 to 2016 and focuses on 5 quantitative and 10 qualitative variables. These are split up into the budget, revenue, IMDb score and the associated votes, the release date, genre, rating, as well as the movie production information such as company, cast, writers, directors, country of origin, and runtime. Data cleaning and preprocessing helped to remove missing instances, create informative new variables such as profit, decade, and release season. The data and the analysis below provides the possibility to explore and learn more about the underlying trends seen in the movie industry.

Research Questions

Through our analysis of the data, we have identified three overarching questions that help to understand the dataset and the interesting relationships between the movie variables.

How do movie ratings (G, PG, R, etc.) impact IMDb scores and movie distributions for IMDb rated movies from 1986 through 2016?

This question helps to address the relationships between movie ratings, IMDb scores, and the overall distribution of movies which helps to show how the global audience feels about specific types of movies. Understanding this will help to identify truly good movies and may even help to predict which upcoming movies may receive a high IMDb score.

How do movie production and success metrics differ based on movie release season (fall, winter, spring, summer)?

This question helps to better understand what movie producers and financiers think of when selecting a release date (season). There are many different types of movies released at strategic times, and this question will help understand if some movie release seasons make for a more successful movie. [The season variable was not originally in the data, but was created based on a string decomposition and grouping of the release date variable.]

Analysis

Learning the question of how movie ratings impact IMDb scores and overall movie composition distributions starts with an EDA to understand preliminary relationships in the data.

Movie Ratings Impact on IMDb Scores and Movie Distribution Analysis

IMDb Score Distributions by Rating and Decade

Our first chart, below, shows how the distribution of IMDB scores differs across ratings and the decade the movie came out.