Creating beautiful graphs in Python with Plotly
After I managed to pull data from Twitter’s API using Python, I got thinking about what my next step would be. Then it hit me – I could draw data from external sources and find a nice extension to visualise it in some pretty graphs. Plotly is that extension.
My thinking was that this could be used in a SIEM-like way to make sense of the huge amounts of data spat out by cyber security solutions. To test it out, though, I decided to fight back against those writing Chelsea off after a bit of a rocky period by comparing their Premier League performance this year to the beginning of the title-winning 2016/17 season.
Setting everything up
As with most Python projects, we have to begin with a little bit of setup. Plotly’s quite simple in this regard, though, and can be imported with a single line – or two words, to be exact.
Giving Plotly our data
Next up, we need to prepare our data for Plotly. This involves setting up variables for the X and Y axes for each series – in this case, the 2016/17 and 2017/18 Premier League seasons.
lastseason_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
lastseason_y = [3, 6, 9, 10, 10, 10, 13, 16, 19, 22, 25, 28, 31]
thisseason_x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
thisseason_y = [0, 3, 6, 9, 10, 13, 13, 13, 16, 19, 22, 25, 26]
The X axis values are matchdays and the Y axis values represent Chelsea’s point count. If I was using Plotly to generate a graph for a more dynamic data set – figures fetched using an API, for example – this would be a bit more complex.
Plotting our data on the graph
Now we need to tell Plotly how exactly we want this data to appear on the graph. I’ll define two lines to be plotted – one representing last season and one for this season.
lastseason_line = plotly.graph_objs.Scatter(
x = lastseason_x,
y = lastseason_y,
mode = ‘lines+markers’,
name = ‘2016/17’
thisseason_line = plotly.graph_objs.Scatter(
x = thisseason_x,
y = thisseason_y,
mode = ‘lines+markers’,
name = ‘2017/18’
As you can see, I’ve assigned the X and Y axis values I’ve just entered to their corresponding lines, and given the series names. The mode “lines+markers” displays the data as a line with a node at each data point – great for comparing the two seasons.
Formatting and exporting the graph
Now to bring everything together and make it look pretty. As you can see, I’ve created a variable for my data series and a layout variable, which contains a bunch of formatting options that pretty much do what they say (“dtick” represents the interval between the axis labels). Handily, Plotly accepts RGB or HTML hex colour codes for the colour settings.
data = [lastseason_line, thisseason_line]
layout = dict(title = ‘Chelsea Premier League performance – 2016/17 vs. 2017/18’,
xaxis = dict(title = ‘Matchday’,
dtick = 1),
yaxis = dict(title = ‘Points’,
font = dict(family = ‘Arial’,
color = ‘ddd’),
fig = dict(data=data, layout=layout)
I then import all of these into a variable called fig, which I feed into Plotly’s offline graph generation function, telling it to output the results to a file that will be called cfc_points.htm.
As you can see, the end result is a neat interactive graph that looks much flashier than anything anybody ever produced in Microsoft Excel. And you can also see that Chelsea’s point tally is only marginally lower than it was at this stage in the last Premier League season – not bad considering we’re also in the Champions League this year!
This was a fairly simple, trivial example, but at some point I’d love to try to grab some live data from somewhere and use Plotly to generate near-live graphs that make it easier to make sense of. Perhaps I’ll have a chance over my Christmas break…
A note: I’m only just delving into the world of Python, and these posts are as much to get things straight in my own head as they are to show them to others. If anything looks wrong, or there’s a more efficient way of doing something, please let me know!
Photo © @cfcunofficial (Chelsea Debs) (CC BY-SA 2.0). Cropped.