Tuesday, May 12

Plotting time-series DataFrames in pandas

Pandas provides a convenience method for plotting DataFrames: DataFrame.plot. There is also a quick guide here.

Unfortunately, when it comes to time series data, I don't always find the convenience method convenient. I often have a sparse DataFrame with lots of NaNs, which are not ignored by the convenience method. Additionally, I don't like the way that matplotlib places the lines hard against the left and right-hand sides of the canvas. I like a little bit of space at each end of the chart. Finally, I like playing with the tick marks and tick labels to get the right density of information on the x-axis.

Rather than use the inconvenient convenience method, I regularly find myself writing a short function to produce the plot layout I find a little more aesthetically pleasing. An example chart (from Mark the Ballot) and the associated python code follows.


import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, YearLocator, MonthLocator
plt.style.use('ggplot')

def plot(df, filename, heading=None):

    fig, ax = plt.subplots(figsize=(8, 4))

    min_date = None
    max_date = None
    for col_name in df.columns.values:

        # plot the column
        col = df[col_name]
        col = col[col.notnull()] # drop NAs
        dates = [zzz.to_timestamp().date() for zzz in col.index]
        ax.plot_date(x=dates, y=col, fmt='-', label=col_name,
            tz=None, xdate=True, ydate=False, linewidth=1.5)

        # establish the date range for the data
        if min_date:
            min_date = min(min_date, min(dates))
        else:
            min_date = min(dates)
        if max_date:
            max_date = max(max_date, max(dates))
        else:
            max_date = max(dates)

    # give a bit of space at each end of the plot - aesthetics
    span = max_date - min_date
    extra = int(span.days * 0.03) * datetime.timedelta(days=1)
    ax.set_xlim([min_date - extra, max_date + extra])

    # format the x tick marks
    ax.xaxis.set_major_formatter(DateFormatter('%Y'))
    ax.xaxis.set_minor_formatter(DateFormatter('\n%b'))
    ax.xaxis.set_major_locator(YearLocator())
    ax.xaxis.set_minor_locator(MonthLocator(bymonthday=1, interval=2))

    # grid, legend and yLabel
    ax.grid(True)
    ax.legend(loc='best', prop={'size':'x-small'})
    ax.set_ylabel('Percent')

    # heading
    if heading:
        fig.suptitle(heading, fontsize=12)
    fig.tight_layout(pad=1.5)

    # footnote
    fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', 
        va='bottom', fontsize=8, color='#999999')

    # save to file
    fig.savefig(filename, dpi=125)

No comments:

Post a Comment