This page demonstrates how to recreate the example plot that was used in the introductory blog post on Fundamentals of data visualization, with Bokeh.
There are two ways in which you can display the visualisations created using Bokeh. By default, Bokeh displays the plots on the web browser but you can display your plots inline in your jupyter notebook by importing and running output_notebook()
before showing the plots.
# import the relevant libraries
from datetime import datetime
import pandas as pd
file = "../data/csv_files/ncdc_normals.csv.zip"
df = pd.read_csv(file)
# Create new dataframe for only the four locations
df["location"] = df["station_id"].map(
{
"USW00014819": "Chicago",
"USW00093107": "San Diego",
"USW00012918": "Houston",
"USC00042319": "Death Valley",
}
)
df.dropna(inplace=True)
# Select only the relevant columns
df = df[["month", "location", "temperature"]]
df.sort_values(["month"], inplace=True)
df.reset_index(drop=True, inplace=True)
# Get month name
get_month_name = lambda x: pd.to_datetime(str(x), format="%m").strftime("%b")
df = df.groupby(["month", "location"]).temperature.mean().unstack()
df = df.reset_index()
df["month"] = df["month"].apply(get_month_name)
df
location | month | Chicago | Death Valley | Houston | San Diego |
---|---|---|---|---|---|
0 | Jan | 24.848387 | 53.451613 | 54.041935 | 55.845161 |
1 | Feb | 28.906897 | 59.944828 | 57.341379 | 56.517241 |
2 | Mar | 38.848387 | 68.448387 | 63.338710 | 57.351613 |
3 | Apr | 50.450000 | 76.293333 | 69.856667 | 59.893333 |
4 | May | 60.900000 | 86.606452 | 77.051613 | 63.200000 |
5 | Jun | 71.006667 | 95.546667 | 81.996667 | 66.543333 |
6 | Jul | 75.845161 | 102.241935 | 83.796774 | 71.048387 |
7 | Aug | 74.148387 | 100.193548 | 84.145161 | 72.603226 |
8 | Sep | 66.403333 | 91.053333 | 80.060000 | 70.746667 |
9 | Oct | 54.251613 | 77.148387 | 72.148387 | 65.912903 |
10 | Nov | 41.550000 | 62.603333 | 63.150000 | 60.003333 |
11 | Dec | 29.000000 | 51.748387 | 55.596774 | 55.200000 |
First, you create a figure object using the figure()
method which serves as the basis for the plots. It accepts the following optional keyword arguments:
title
: Specifies the title of the figure.
height
, width
: Sets the height and width of the figure in pixels.
sizing_mode
: Specify how the plot behaves in a layout. For more info, check out sizing mode
toolbar_location
: Specifies the position of the toolbar relative to the figure. You can also set it to None if you don't want to show it.
x_range
: Sets the range of values for the x-axis.
x_axis_label
, y_axis_label
: Sets the labels for the x-axis and y-axis, respectively.
Then, you add the line()
glyph to the figure
object to generate the line plot. The line()
glyph accepts the following parameters:
x
, y
(required): The x and y coordinates of the data points to be plotted. It can be provided as a single array-like object or a column name from a data source.
source
(optional): The data source containing the x and y data. It can be a DataFrame, ColumnDataSource, or other data structure compatible with Bokeh.
legend_label
(optional): The label to display in the legend for the line.
color
(optional): The color of the line. It can be specified as a string (e.g., "red", "#FF0000") or as a column name from the data source that contains color values.
line_width
(optional): The width of the line in pixels.
By calling the line()
argument multiple times, you can add multiple lines to a single figure as shown below:
from bokeh.plotting import figure, show
from bokeh.models import FactorRange
# plot a line graph
# create figure object
p = figure(
title="Figure 2.3 Daily temperature normals",
height=400,
sizing_mode="stretch_width",
toolbar_location=None,
x_range=FactorRange(factors=df.month),
x_axis_label="month",
y_axis_label="temperature (F)",
)
# call line glyph
p.line(
x="month",
y="Death Valley",
legend_label="Death Valley",
color="#FF9933",
line_width=2.5,
source=df,
)
p.line(
x="month",
y="Houston",
legend_label="Houston",
color="#3399FF",
line_width=2.5,
source=df,
)
p.line(
x="month",
y="San Diego",
legend_label="San Diego",
color="#00994C",
line_width=2.5,
source=df,
)
p.line(
x="month",
y="Chicago",
legend_label="Chicago",
color="#CC00CC",
line_width=2.5,
source=df,
)
p.legend.location = "bottom"
show(p)