This project is my second project working with sample data. This is part of the freeCodeCamp Data Analysis with Python Certification. For this project I will visualize time series data using a line chart, bar chart, and box plots. I will use Pandas, Matplotlib, and Seaborn to visualize a dataset containing the number of page views each day on the freeCodeCamp.org forum from 2016-05-09 to 2019-12-03. I will analyze the patterns in visits and identify yearly and monthly growth.
%%shell
jupyter nbconvert --to html "/content/drive/MyDrive/Colab Notebooks/time_series_visualizer/time_series_visualizer.ipynb"
[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab Notebooks/time_series_visualizer/time_series_visualizer.ipynb to html [NbConvertApp] Writing 874673 bytes to /content/drive/MyDrive/Colab Notebooks/time_series_visualizer/time_series_visualizer.html
When importing the data, we will set the index to the date column. We will use df.head(5) to see if the data has been imported correctly.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
# Import Data
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/time_series_visualizer/fcc-forum-pageviews.csv', delimiter = ',').set_index('date')
df.head(5)
| value | |
|---|---|
| date | |
| 2016-05-09 | 1201 |
| 2016-05-10 | 2329 |
| 2016-05-11 | 1716 |
| 2016-05-12 | 10539 |
| 2016-05-13 | 6933 |
Clean the data by filtering out days when the page views were in the top 2.5% of the dataset or bottom 2.5% of the dataset.
df = df [
(df['value'] < df['value'].quantile(0.975)) &
(df['value'] > df['value'].quantile(0.025))
]
draw_line_plot Function¶Create a draw_line_plot function that uses Matplotlib to draw a line chart. The title should be Daily freeCodeCamp Forum Page Views 5/2016-12/2019. The label on the x axis should be Date and the label on the y axis should be Page Views.
def draw_line_plot():
# Draw line plot
df.plot.line(color = 'red', figsize = (20,5))
plt.xlabel("Date")
plt.ylabel("Page Views")
plt.title("Daily freeCodeCamp Forum Page Views 5/2016-12/2019")
draw_line_plot()
draw_bar_plot Function¶Create a draw_bar_plot function that draws a bar chart. It should show average daily page views for each month grouped by year. The legend should show month labels and have a title of Months. On the chart, the label on the x axis should be Years and the label on the y axis should be Average Page Views.
def draw_bar_plot():
# Copy and modify data for monthly bar plot
df_bar = df.copy()
df_bar.reset_index(inplace=True)
df_bar['Years'] = pd.DatetimeIndex(df_bar['date']).year
df_bar['Months'] = pd.DatetimeIndex(df_bar['date']).month_name()
df_bar = df_bar.groupby(['Years', 'Months'])['value'].mean().reset_index()
# Draw bar plot
plt.figure(figsize=(10, 6))
plot = sns.barplot(data = df_bar, x = 'Years', y = 'value', hue = 'Months', hue_order = ('January', 'February', 'March', 'April', 'May', 'June', 'July',
'August', 'September', 'October', 'November', 'December'))
plt.xlabel('Years')
plt.ylabel('Average Page Views')
plt.title('Average Daily Page Views by Month and Year')
sns.move_legend(plot, 'upper left')
draw_bar_plot()
draw_box_plot Function¶Create a draw_box_plot function that uses Seaborn to draw two adjacent box plots. These box plots should show how the values are distributed within a given year or month and how it compares over time. The title of the first chart should be Year-wise Box Plot (Trend) and the title of the second chart should be Month-wise Box Plot (Seasonality). Make sure the month labels on bottom start at Jan and the x and y axis are labeled correctly. The boilerplate includes commands to prepare the data.
def draw_box_plot():
# Prepare data for box plots (this part is done!)
df_box = df.copy()
df_box.reset_index(inplace=True)
df_box['date'] = pd.to_datetime(df_box['date'])
df_box['year'] = [d.year for d in df_box['date']]
df_box['month'] = [d.strftime('%b') for d in df_box['date']]
# Draw box plots (using Seaborn)
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
sns.boxplot(data = df_box, x = 'year', y = 'value', ax = axes[0])
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Page Views')
axes[0].set_title('Year-wise Box Plot (Trend) ')
sns.boxplot(data = df_box, x = 'month', y = 'value', order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], ax = axes[1])
axes[1].set_xlabel('Month')
axes[1].set_ylabel('Page Views')
axes[1].set_title('Month-wise Box Plot (Seasonality)')
draw_box_plot()
Through this project, I was able to practice various plotting skills. The charts/graphs provide a more appealing way to visualize the data at hand.