How passwords work – a simple demonstration in Python
We all use passwords every day, but how exactly do they work? It would be easy to assume that the services we use all hold huge databases with our usernames and passwords side by side, but the reality is much more interesting – and, of course, much more secure.
It’s easy to see that storing passwords in the aforementioned manner would be a security nightmare, albeit one that we’ve unfortunately seen in real life in the past. The alternative? Complex mathematical algorithms that generate irreversible hashes of our passwords. Here’s how these password systems use them to keep our credentials secure…
Disclaimer: The basic Python code in this post is intended to demonstrate the concepts behind the way passwords are stored and verified, but it is not secure in itself and should not be used for any kind of real password processing or storage. I’ll cover some of the ways in which the code is flawed and how it could be improved later in the post.
Setting things up
As with any Python project, the first step is to import any libraries we’ll need to call on later in the code. In this case, we’ll be importing two very useful modules.
Hashlib allows us to generate hashes, which is the way passwords are (or at least should be) stored. If passwords were stored in plain text, which is a terrible security practice but does happen, they are immediately compromised if the database is accessed.
Instead, we run passwords through complex, one-way mathematical formulas to generate hashes. Even if an attacker accesses the database, they’d have to try thousands – if not millions – of possible passwords before finding one that matches a single hash.
Getpass is much simpler – it provides an alternative to the raw_input function that hides the user’s input and prevents shoulder surfing when they’re entering sensitive data.
Setting a password
In order to check the user’s password, first we’re going to have to ask them to set one. After they enter a password, we’ll generate a hash and store that in a variable.
As you can see, this function loads the global variable pass_hash before asking the user for a password and generating an MD5 hash of the string. The hash is stored and the plain text password is forgotten so it’s not lying around waiting for someone to find it.
Checking the password
When called, the following function will check the password by generating a hash from the user’s input and comparing it to the stored hash of their password.
When the user enters their password, the generated hash is stored in the variable pass_attempt_hash and the entered password is discarded. If the attempt hash matches the stored hash, the login is successful. If it is not, the test fails.
Here’s what the user sees. To make things easier to follow, I’ve removed the getpass functionality and added lines that show the hashes that are being generated to make it clear what is happening behind the scenes at each stage.
A hash is generated when the user sets their password, and a further hash is created for each password entry and checked against it. If a match is detected, the login is successful.
Problems with this code
As I mentioned in the disclaimer at the start of this post, this is a simple demonstration of how passwords work, and there are several reasons why this code is not secure and should never be used in any real project (besides the lack of a username prompt). Some of these are…
1. MD5 is not a secure hashing algorithm
While MD5 has its uses – in checking the integrity of files, for example – it is not complex enough to be used to store passwords. For one, it has been discovered to generate clashes (where two values generate the same hash), and although it is not possible to reverse the hash generation process, there are large, searchable databases of possible passwords and their hashes available online for any potential attacker to browse.
2. The system lacks standard password handling safeguards
A user could enter absolutely anything as their password for this system, whether that’s “password”, “password123”, or even just a single character. Criteria for length and complexity should be established. Also, attackers could throw as many possible passwords as they like at this code as quickly as they could manage in a brute force attack. Limits should be placed on login attempts and lockouts initiated if too many incorrect attempts are detected.
3. Hashing alone is not enough
Hashing algorithms are necessary, but they are not secure on their own. If two users had identical passwords, an attacker would only need to crack one to gain access to both accounts because their hashes would be identical. To combat this, random strings called salts are added to each user’s password before the hashing process. These are unique to each user and mean each user’s hash is different, even if their passwords are not.
In the screenshot above, I’ve manually added salts to the end of two identical passwords. Note that the hashes generated are different – if an attacker cracked one of these passwords, they wouldn’t be able to determine the second based on its hash alone.
A note: I’m only just delving into the world of Python, and these posts are as much to get things straight in my own head as they are to show them to others. If anything looks wrong, or there’s a more efficient way of doing something, please let me know!
Photo from Pixabay (CC0). Cropped.