# A Guide For Time Series Visualization With Python 3

### Introduction

Time-series analysis belongs to a subfigure of Statistics that involves the study of requested , often impermanent data. When relevantly enlisted , time-series analysis can show unexpected trends, extract useful statistics, and even forecast trends ahead into the time. For these reasons, it is enlisted across many environments including economics, weather forecasting, and capacity planning, to name a few.

In this tutorial, we will inform some communal methods used in time-series analysis and walk through the aspect stages demanded to manipulate, visualize time-series data.

## Prerequisites

This guide will cover how to do time-series analysis on either a local desktop or a far server. Working with enormous datasets can be memory intense, so in either case, the computer will need at least **2GB of memory** to perform some of the calculations in this guide.

For this tutorial, well be using **Jupyter Notebook** to work with the data. If you do not have it already, you should follow our tutorial to install and set up Jupyter Notebook for Python 3.

## Step 1 Installing Packages

We will leverage the `pandas`

library, which offers a lot of trait when manipulating data, and the `statsmodels`

library, which allows us to perform statistical reasoning
in Python. Used together, these two libraries expand Python to offer large practicality and significantly increase our analytical toolkit.

Like with other Python packages, we can install `pandas`

and `statsmodels`

with `pip`

. First, lets move into our local app environment or server-based app environment:

- cd environments

- . my_env/bin/activate

From here, lets create a new directory for our project. We will call it `timeseries`

and then move into the directory. If you call the project a non-identical name, be convinced to equivalent your name for `timeseries`

throughout the guide

- mkdir timeseries
- cd timeseries

We can now install `pandas`

, `statsmodels`

, and the data planning
package `matplotlib`

. Their states will also be installed:

- pip install pandas statsmodels matplotlib

At this point, we're now set up to commence working with `pandas`

and `statsmodels`

.

## Step 2 Loading Time-series Data

To commence working with our data, we will begin up Jupyter Notebook:

- jupyter notebook

To create a new notebook register, appoint **New** > **Python 3** from the top right pull-down menu:

This will ajar a notebook which allows us to load the demanded
libraries (notice the grade shorthands used to reference `pandas`

, `matplotlib`

and `statsmodels`

). At the top of our notebook, we should write the following:

```
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
```

After each code block in this tutorial, you should symbol `ALT + ENTER`

to run the code and move into a new code block within your notebook.

Conveniently, `statsmodels`

comes with built-in datasets, so we can load a time-series dataset continuous into memory.

We'll be working with a dataset labelled "Atmospheric dioxide from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.," which accumulated dioxide samples from procession 1958 to December 2001. We can bring in this data as follows:

```
data = sm.datasets.co2.load_pandas()
co2 = data.data
```

Let's check what the first 5 lines of our time-series data look like:

```
print(co2.head(5))
```

```
Output co2
1958-03-29 316.1
1958-04-05 317.3
1958-04-12 317.6
1958-04-19 317.5
1958-04-26 316.4
```

With our packages imported and the dioxide dataset prepared to go, we can move on to listing our data.

## Step 3 Indexing with Time-series Data

You may have noticed that the dates have been set as the index of our `pandas`

DataFrame. When working with time-series data in Python we should ensure that dates are used as an index, so make convinced to always check for that, which we can do by running the following:

```
co2.index
```

```
OutputDatetimeIndex(['1958-03-29', '1958-04-05', '1958-04-12', '1958-04-19',
'1958-04-26', '1958-05-03', '1958-05-10', '1958-05-17',
'1958-05-24', '1958-05-31',
...
'2001-10-27', '2001-11-03', '2001-11-10', '2001-11-17',
'2001-11-24', '2001-12-01', '2001-12-08', '2001-12-15',
'2001-12-22', '2001-12-29'],
dtype='datetime64[ns]', length=2284, freq='W-SAT')
```

The `dtype=datetime[ns]`

field confirms that our index is made of date symbol objects, while `length=2284`

and `freq='W-SAT'`

tells us that we have 2,284 weekly date symbols beginning
on Saturdays.

Weekly data can be untrustworthy to work with, so let's use the monthly statistics of our time-series instead. This can be obtained by using the handy `resample`

function, which allows us to team the time-series into containerfuls (1 month), registerly a function on each team (convey), and combine the result (one row per team).

```
y = co2['co2'].resample('MS').mean()
```

Here, the statement `MS`

means that we team the data in containerfuls by months and ensures that we are using the begin of each month as the timestamp:

```
y.head(5)
```

```
Output1958-03-01 316.100
1958-04-01 317.200
1958-05-01 317.120
1958-06-01 315.800
1958-07-01 315.625
Freq: MS, Name: co2, dtype: float64
```

an intriguing feature of `pandas`

is its ability to handle date symbol indices, which allow us to quickly slice our data. For instance, we can slice our dataset to only retrieve data points that come after the year `1990`

:

```
y['1990':]
```

```
Output1990-01-01 353.650
1990-02-01 354.650
...
2001-11-01 369.375
2001-12-01 371.020
Freq: MS, Name: co2, dtype: float64
```

Or, we can slice our dataset to only retrieve data points between October `1995`

and October `1996`

:

```
y['1995-10-01':'1996-10-01']
```

```
Output1995-10-01 357.850
1995-11-01 359.475
1995-12-01 360.700
1996-01-01 362.025
1996-02-01 363.175
1996-03-01 364.060
1996-04-01 364.700
1996-05-01 365.325
1996-06-01 364.880
1996-07-01 363.475
1996-08-01 361.320
1996-09-01 359.400
1996-10-01 359.625
Freq: MS, Name: co2, dtype: float64
```

With our data properly listed for working with impermanent data, we can move onto handling values that may be missing.

## Step 4 Handling Missing Values in Time-series Data

actual experience data tends be untidy. As we can see from the story, it is not especial for time-series data to include missing values. The uncomplicated route to check for those is either by directly planning the data or by using the control below that will show missing data in ouput:

```
y.isnull().sum()
```

```
Output5
```

This production tells us that there are 5 months with missing values in our time series.

Generally, we should "fill in" missing values if they are not too many so that we dont have gaps in the data. We can do this in `pandas`

using the `fillna()`

control. For quality, we can fill in missing values with the closest non-null ideal in our time series, although it is all-important to note that a rotating
convey would sometimes be desirable.

```
y = y.fillna(y.bfill())
```

With missing values filled in, we can once again check to see whether any invalid values exist to make convinced that our operation worked:

```
y.isnull().sum()
```

```
Output0
```

After performing these operations, we see that we have successfully filled in all missing values in our time series.

## Step 5 Visualizing Time-series Data

When working with time-series data, a lot can be showed through visualizing it. a few things to look out for are:

**seasonality**:*does the data display a clear cyclic pattern?***trend**:*does the data follow a consistent upwards or descending slope?***sound**:*are there any deviation points or missing values that are not consistent with the rest of the data?*

We can use the `pandas`

covering around the `matplotlib`

API to display a story of our dataset:

```
y.plot(figsize=(15, 6))
plt.show()
```

Some distinguishable patterns appear when we plot the data. The time-series has an obvious seasonality pattern, as well as an overall increasing trend. We can also visualize our data using a method called time-series decomposition. As its name suggests, time series decomposition allows us to decompose our time series into three distinct components: trend, seasonality, and sound.

Fortunately, `statsmodels`

provides the handy `seasonal_decompose`

function to perform seasonal decomposition out of the blow. If you are curious in learning more, the reference for its genuine implementation can be found in the following essay, "STL: a seasonal-Trend Decomposition method Based on Loess."

The script below shows how to perform time-series seasonal decomposition in Python. By failure, `seasonal_decompose`

returns a figure of relatively little size, so the first two lines of this code agglomeration ensure that the production figure is enormous enough for us to visualize.

```
from pylab import rcParams
rcParams['figure.figsize'] = 11, 9
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
fig = decomposition.plot()
plt.show()
```

Using time-series decomposition makes it simple to quickly identify a changing convey or variation in the data. The story above clearly shows the upwards trend of our data, along with its yearly seasonality. These can be used to understand the *structure* of our time-series. The intuition behind time-series decomposition is all-important, as many forecasting modes build upon this idea of structured decomposition to produce forecasts.

## Conclusion

If you've followed along with this guide, you now have experience visualizing and manipulating time-series data in Python.

To further enhance your skill set, you can load in another dataset and tell all the stages in this tutorial. For instance, you may wish to read a csv register using the `pandas`

library or use the `sunspots`

dataset that comes pre-loaded with the `statsmodels`

library: `data = sm.datasets.sunspots.load_pandas().data`

.