Netflix Dataset Analysis

Oyinkansola Awosan
4 min readMar 20, 2021

--

Entertainment is part of our daily life, in such a way that we find ways to keep ourselves entertained consciously or unconsciously. Nobody likes being bored, and everyone finds something that interests them, that they enjoy to do.

https://www.abstractapi.com?verify_gp=dBHdT25546d788

Often, we try to spend our lives entertained. Be it with work, games, movies, friends, social media, one of our ultimate goals is to be and remain entertained.

Netflix Inc. is an American company that is providing entertainment to users all over the world through a streaming service which offers a plethora of films and television series.

I am a Netflix user, hence I found this dataset very interesting and totally enjoyed analyzing it. The dataset contains a list of TV Shows and Movies on Netflix, it was gotten from Kaggle and can be found here.

This data was analyzed using different python libraries such as numpy, pandas, seaborn, plotly and matplotlib.

Let’s get to it.

IMPORT NEEDED LIBRARIES.

I started with importing the libraries needed to analyze and eventually visualize the data.

IMPORT THE DATA FROM DESKTOP.

I downloaded the dataset from the site, extracted the file from the zip file, and proceeded to import it to the notebook.

Imported the dataset after downloading.

After this, I decided to get a concise summary of the dataset, so I called the .info() function as seen below.

I then went ahead to sort the data using ratings, age, and date. After sorting the data, I proceeded to classify the movies by countries, after which I classified the movies and TV shows by genre as seen below.

Classification by countries
Classification by genre

I went ahead to analyze the entire dataset, get and visualize the top movie ratings based on rating system. I used seaborn to plot and visualize the graph as seen below.

Top movie ratings based on rating system.

Using Matplotlib, I visualized the number of viewers according to maturity, with maturity being grouped into 3: Adults, Kids, Teens. Here, I plotted the amount of viewers against the maturity level.

Number of viewers according to maturity level.

Using Seaborn, I showed the ratings for the movies and shows. I plotted the total count of movies against the TV ratings.

Ratings for movies and shows

After this, I visualized the data for various categories which include Director, Country, Cast, Listed_in.

Data visualization for different categories
5 top directors visualization.
5 Top countries visualizaton.

Please, do check out the complete code here.

From this dataset, the analysis and visualization, I noticed that:

About 69.05% of the content on Netflix are movies, while the remaining 30.95% are TV shows.

Most of the content were released after 2010.

The US has the highest number of movies/shows on the platform, followed by India.

Raul Campos and Jan Sutter are the joint highest directors, followed by Marcus Rabey, Jay Kara’s, Cathy García, Molina and Youssef Chachine in that order.

All the top actors are from India, with Anupam Kher appearing 41 times on the log, followed by Shah Rukh, Om Puri, Naseeruddin Shah and Akshay Kumar in that order.

The most viewed categories of music are International movies, followed by Dramas, Comedies, Documentaries, Actions and Adventure, in that order.

--

--

Oyinkansola Awosan
Oyinkansola Awosan

Written by Oyinkansola Awosan

Technical Writer, Open Source Enthusiast, Machine Learning & Site Reliability Engineer

No responses yet