Introduction¶

This page demonstrates how to recreate the example plot that was used in the introductory blog post on Fundamentals of data visualization, with Bokeh.

There are two ways in which you can display the visualisations created using Bokeh. By default, Bokeh displays the plots on the web browser but you can display your plots inline in your jupyter notebook by importing and running output_notebook() before showing the plots.

In [1]:
from bokeh.io import output_notebook

output_notebook()
Loading BokehJS ...

Line plot¶

Data preparation¶

In [2]:
# import the relevant libraries
from datetime import datetime
import pandas as pd
In [3]:
file = "../data/csv_files/ncdc_normals.csv.zip"
df = pd.read_csv(file)

# Create new dataframe for only the four locations
df["location"] = df["station_id"].map(
    {
        "USW00014819": "Chicago",
        "USW00093107": "San Diego",
        "USW00012918": "Houston",
        "USC00042319": "Death Valley",
    }
)
df.dropna(inplace=True)

# Select only the relevant columns
df = df[["month", "location", "temperature"]]
df.sort_values(["month"], inplace=True)
df.reset_index(drop=True, inplace=True)

# Get month name
get_month_name = lambda x: pd.to_datetime(str(x), format="%m").strftime("%b")

df = df.groupby(["month", "location"]).temperature.mean().unstack()
df = df.reset_index()
df["month"] = df["month"].apply(get_month_name)

df
Out[3]:
location month Chicago Death Valley Houston San Diego
0 Jan 24.848387 53.451613 54.041935 55.845161
1 Feb 28.906897 59.944828 57.341379 56.517241
2 Mar 38.848387 68.448387 63.338710 57.351613
3 Apr 50.450000 76.293333 69.856667 59.893333
4 May 60.900000 86.606452 77.051613 63.200000
5 Jun 71.006667 95.546667 81.996667 66.543333
6 Jul 75.845161 102.241935 83.796774 71.048387
7 Aug 74.148387 100.193548 84.145161 72.603226
8 Sep 66.403333 91.053333 80.060000 70.746667
9 Oct 54.251613 77.148387 72.148387 65.912903
10 Nov 41.550000 62.603333 63.150000 60.003333
11 Dec 29.000000 51.748387 55.596774 55.200000

Plotting¶

First, you create a figure object using the figure() method which serves as the basis for the plots. It accepts the following optional keyword arguments:

  • title: Specifies the title of the figure.

  • height, width: Sets the height and width of the figure in pixels.

  • sizing_mode: Specify how the plot behaves in a layout. For more info, check out sizing mode

  • toolbar_location: Specifies the position of the toolbar relative to the figure. You can also set it to None if you don't want to show it.

  • x_range: Sets the range of values for the x-axis.

  • x_axis_label, y_axis_label: Sets the labels for the x-axis and y-axis, respectively.

Then, you add the line() glyph to the figure object to generate the line plot. The line() glyph accepts the following parameters:

  • x, y (required): The x and y coordinates of the data points to be plotted. It can be provided as a single array-like object or a column name from a data source.

  • source (optional): The data source containing the x and y data. It can be a DataFrame, ColumnDataSource, or other data structure compatible with Bokeh.

  • legend_label (optional): The label to display in the legend for the line.

  • color (optional): The color of the line. It can be specified as a string (e.g., "red", "#FF0000") or as a column name from the data source that contains color values.

  • line_width (optional): The width of the line in pixels.

By calling the line() argument multiple times, you can add multiple lines to a single figure as shown below:

In [4]:
from bokeh.plotting import figure, show
from bokeh.models import FactorRange
In [5]:
# plot a line graph

# create figure object
p = figure(
    title="Figure 2.3 Daily temperature normals",
    height=400,
    sizing_mode="stretch_width",
    toolbar_location=None,
    x_range=FactorRange(factors=df.month),
    x_axis_label="month",
    y_axis_label="temperature (F)",
)

# call line glyph
p.line(
    x="month",
    y="Death Valley",
    legend_label="Death Valley",
    color="#FF9933",
    line_width=2.5,
    source=df,
)

p.line(
    x="month",
    y="Houston",
    legend_label="Houston",
    color="#3399FF",
    line_width=2.5,
    source=df,
)

p.line(
    x="month",
    y="San Diego",
    legend_label="San Diego",
    color="#00994C",
    line_width=2.5,
    source=df,
)

p.line(
    x="month",
    y="Chicago",
    legend_label="Chicago",
    color="#CC00CC",
    line_width=2.5,
    source=df,
)

p.legend.location = "bottom"

show(p)
In [ ]: