Python tools for Windows forensics: Deleted files in the Recycle Bin

2018-12-15  Cyber Security,   Programming

In a previous post I began building a Python tool that gathers Windows forensic artefacts and parses them into a timeline. In that post I wrote a function that gathers Windows Prefetch application data – this time, let’s take a look at the Recycle Bin.

What is the Recycle Bin and how does it help our investigation?

As you probably know, the Recycle Bin is where Windows puts files that the user has deleted. Information on deleted files is stored in C:$Recycle.Bin\ inside a folder named after the user’s security identifier (SID), and held in two different types of file.

The first of these begins with $R and a string of letters and contains the contents of the file, meaning it can tell us the file size. The second starts with $I and the same string and contains metadata including the original file name, path, and deletion date and time.

Together, the information stored in both of these files gives us a decent amount of data on what the user deleted, when they deleted it, and where they deleted it from.

Some changes to the Python Windows forensics tool

Before we start digging into the Recycle Bin data, I should point out that I’ve made some changes to the structure of the forensics tool (which I’ve catchily named MCAS Windows Forensic Gatherer, or MCASWFG). Basically, I’ve split it up into functions.

The Prefetch module is one function, this Recycle Bin module will be another, and any future modules will be added as further functions. A main function gets the key information (username, Windows drive letter) from the analyst and then calls each of the modules.

As well as making the application easier to manage, this means I can use try statements to run each module individually and return an on-screen error if they fail, rather than causing the whole program to stop and preventing any subsequent modules from running.

Fetching the user’s SID

First off, we’ve got a bit of pre-work to do. As the Recycle Bin files sit in a directory named after the user’s SID, first we have to fetch the SID itself. I’m doing this in a separate function to keep the code nice and tidy and to make it easier to work with and modify in future.

    global username
    global user_sid
    wmic_query = "wmic useraccount where name=\"" + username + "\" get sid"
    user_sid = subprocess.check_output(wmic_query, shell=True)
    user_sid = user_sid[4:].strip()

Attempt one at getting the user’s SID uses WMIC at the command line with the username variable populated by the analyst at the beginning of the main function. The output comes with the prefix “SID” and some blank space afterwards, so I’ve stripped this padding away.

        if user_sid == "":
        print "Unable to retrieve user SID."
        user_sid = raw_input("Enter SID manually, or press return: ")
        if user_sid == "":
            raise Exception
    print "User SID is %s." % user_sid
    print ""

If this fails for any reason, I give the analyst another chance to enter the SID manually. If they choose not to enter one, the main function will return an exception. If the SID is gathered successfully, it is printed to the screen to keep the analyst informed.

Setting things up

Now onto the Recycle Bin function, where we first import the windows_drive and user_SID variables and use them to construct the Recycle Bin directory we’ll be querying.

    global windows_drive
    global user_sid
    if user_sid == "":
        print "No SID provided."
        raise Exception
    recycled_directory = windows_drive + ":\$Recycle.Bin\\" + user_sid + "\\"
    print "Recycle Bin directory is %s." % recycled_directory

If no SID has been provided, the Recycle Bin module fails and raises an exception. If one is present, the directory is calculated and printed to the screen.

Iterating through Recycle Bin files and finding $I files

Now we know where we’re looking, we can use the os.listdir function to list the files that are stored in the user’s SID folder within C:\Recycle.Bin\ and see what we find.

    recycled_files = os.listdir(recycled_directory)
    for deleted_file in recycled_files:
        if deleted_file[1] == "I":
            full_path = recycled_directory + deleted_file

We’re currently only interested in $I files, so we iterate through the files in the directory and check if the second character of their filenames is I. If it is, we know the full path to that file is the Recycle Bin SID directory plus the filename. This is saved to full_path.

Parsing the file’s original directory

Time to find the file’s original directory – i.e. where it was deleted from. To do this we need to open the $I file we located in the previous section and parse the contents.

            deleted_file_content = open(full_path, "r")
            deleted_file_path = deleted_file_content.read()
            deleted_file_content.close()

Unfortunately, it’s not as simple as reading the file. If we check an $I file’s contents, we can see the directory is stored after some random characters and has a space between every letter:

<img src=”/wp-content/uploads/2018/12/recylcebin_ifiles.png />

To fix this, we’ll remove the first 28 characters from the beginning of each file and then iterate through the letters in the remaining string to remove the spaces.

            deleted_file_path = deleted_file_path[28:]
            string_length = len(deleted_file_path)
            deleted_file_path_parsed = ""
            x = 0
            while x &lt; string_length:
                deleted_file_path_parsed += deleted_file_path[x]
                x += 2

This is achieved by calculating the length of the string with len. We then move through the letters two at a time, taking the characters either side of the spaces to store the deleted file’s original directory in the deleted_file_path_parsed variable.

Finding the original filename and deletion time

Let’s extract the remaining information from the $I file – specifically the name of the original file and the date and time at which it was deleted by the user.

filename = deleted_file_path_parsed.rsplit('\\', 1)[-1]

The filename is simple enough to find – we already have the full path to the file, so it’s just a matter of returning everything after the final backslash.

Next up we’ll fetch the deletion time. For the sake of the completeness of our forensic timeline, I’m going to use os.path.getctime, os.path.getmtime, and os.path.getatime to find the file’s creation time, modified time, and access time respectively.

            creation_time = os.path.getctime(full_path)
            creation_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(creation_time))

            modified_time = os.path.getmtime(full_path)
            modified_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modified_time))

            access_time = os.path.getatime(full_path)
            access_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(access_time))

As you can see, the process is nearly identical for all three timestamps. We fetch the time value using the file’s full path and then use a combination of time.strftime and time.localtime to convert it to a timestamp in a format consistent with the rest of our timeline.

Getting the original file size from the $R file

All that remains is to find the file size of the deleted file – an artefact that could be valuable in determining the state that a particular file was in when it was deleted.

            r_filename = "$R" + deleted_file[2:]
            file_size = str(os.path.getsize(recycled_directory + r_filename))

We remove the “$I” from the start of the filename and add “$R”, then use os.path.getsize to retrieve the file size and convert it to a string so it can be added to the CSV timeline.

Writing the results to the CSV timeline

The only thing left is to combine all of the artefacts we’ve gathered and add them as a new line in our CSV timeline, which will be sorted into chronological order by the main function.

    deleted_file_line = creation_time + "," + "Deleted file/folder" + "," + filename[:-1] + "," + deleted_file_path_parsed[:-1] + "," + username + "," + file_size + "," + creation_time + "," + access_time + "," + modified_time + "," + "Recycle Bin" + "," + deleted_file + "\n"
    timeline_csv.write(deleted_file_line)
    print "Recycle Bin data gathered."
    print ""

Once the line has been added to the CSV file, we return a simple message to let the user know that the execution of this module was successful and the artefacts were gathered.

The output

If we filter our forensic timeline CSV file to show only entries fetched from the Recycle Bin, we can now see that all the relevant data gained from the $I and $R files – deleted file name, original path, file size, deletion time, and username – has been added correctly.

The MCAS Windows Forensic Gatherer is now capable of gathering Prefetch application data and information on deleted files in the Recycle Bin. Next month, I’ll be taking a look at how to extract logon and logoff information from the Windows Security event log.

Photo by Steve Johnson on Unsplash

Looking for the comments? My website doesn't have a comments section because it would take a fair amount of effort to maintain and wouldn't usually present much value to readers. However, if you have thoughts to share I'd love to hear from you - feel free to send me a tweet or an email.