Fetching and filtering tweets in Python with Tweepy

2017-11-24 Programming

If you hadn’t already heard, I’ve been experimenting with some Python recently, trying to build on the foundation that I picked up at the SANS Cyber Retraining Academy. While I’d already had some success playing around with isolated little scripts, I decided it was time to start pulling data from external sources – starting with Twitter.

As an exercise to familiarise myself with these kinds of functions, I attempted to build a simple app that would pull the most recent tweets about the London Underground’s Northern line and give an idea of how it’s running at any particular time. Here’s what I cobbled together using Tweepy and a few handy online tutorials.

Setting everything up

Before we get started, we’ve got some setup to do. First, we’ll import both Tweepy itself and HTMLParser (we’ll see what that’s used for a bit later on).

import tweepy
import HTMLParser

auth = tweepy.OAuthHandler(“KEY_HERE”, “KEY_HERE”)
auth.set_access_token(“KEY_HERE”, “KEY_HERE”)
api = tweepy.API(auth)

To communicate with Twitter correctly, Tweepy needs a few details. To get these, you’ll need to set up a Twitter app through the social network’s developer website. The keys needed here are available through the app developer dashboard (make sure they’re not accessible if you’re developing something that’ll be used beyond your own computer).

Fetching the tweets

Next, it’s time to pull the @NorthernLine tweets from Twitter’s API. Luckily, Tweepy makes it easy to grab a user’s tweets in just a single line of code.

# Gets ten most recent tweets from @NorthernLine
nl_tweets = api.user_timeline(“northernline”, count = 10)
tweetno = 1

If it’s not already obvious, the first line gets the tweets. If you’re looking to use posts from another user, simply substitute “northernline” for their username. The count attribute tells Tweepy how many tweets we want. Although my target is to display three tweets, some may be replies or retweets that I want to filter out, so I’m fetching ten.

Next I establish the tweetno variable. This number will keep track of how many tweets we successfully print so I can limit the number displayed on-screen to three.

Filtering the tweets

Now we want to parse the text, filter out replies, and only print the first three relevant tweets. To do that, we’ll start by iterating through each tweet in our data.

for tweet in nl_tweets:

# Gets and parses each tweet’s text
currenttweet = HTMLParser.HTMLParser().unescape(tweet.text)

# Filters out tweets beginning with @ (replies)
if currenttweet[:1].startswith(‘@’):
continue

# Breaks the loop if more than three tweets printed
elif tweetno > 3:
break

For each tweet, HTMLParser is used to convert HTML character codes into the relevant symbols. This is done by unescaping Tweepy’s tweet.text, which means all our quotes, ampersands and other symbols are displayed correctly in our output.

Next, I filter out replies by checking to see if the tweet begins with an @ symbol. If this check comes back positive, we can assume that the tweet is a reply and continue with the iteration, leaving this one out and moving onto the next to be processed.

Finally, I check how many tweets we’ve already printed by querying the tweetno variable. If this value is greater than three, we’re finished and we stop the iteration.

Printing the filtered tweets

Now we’re left with the tweets we actually want – original tweets (not retweets or replies) with their proper symbols – we want to print them in a neat little list.

# Prints tweets that meet the criteria
else:
print tweet.created_at
print currenttweet
print(”)
tweetno = tweetno + 1

My else statement prints the tweet’s date and time with Tweepy’s tweet.created_at, prints the contents of my currenttweet variable, and adds a blank line as a space between the returned tweets. Finally, we advance the tweetno counter by one.

Output

If everything works correctly, this will give us a neat little list of the three most recent tweets from @NorthernLine – perfect for checking for any transport issues before I leave my flat!

A note: I’m only just delving into the world of Python, and these posts are as much to get things straight in my own head as they are to show them to others. If anything looks wrong, or there’s a more efficient way of doing something, please let me know!

Looking for the comments? My website doesn't have a comments section because it would take a fair amount of effort to maintain and wouldn't usually present much value to readers. However, if you have thoughts to share I'd love to hear from you - feel free to send me a tweet or an email.

« Three of the best cyber security podcasts around

The day Instagram advertising read my mind »

MattCASmith