Get a site
Guide

Writing a FUSE filesystem in Python

We ran into a problem last week. Our web application produces a lot of documents that have to be accessed frequently for a couple of months after they’re created. However, in less than a year these documents will be almost never accessed anymore, but we need to keep them available for the web application and for tons of other legacy apps that might need to access them.

Now, these documents take a lot of space on our expensive but super fast storage system (let’s call it primary storage system or PSS from now on) and we would like to be able to move them on the cheaper, not so good and yet quite slow storage system (that we’re going to call secondary storage system or SSS) when we believe that they will not be accessed anymore.

Our idea was to move the older files to the SSS and to modify all the software that needs to access the storage so to look at the PSS first and in the case, nothing was found, to look at the SSS. This approach, however, meant that we should have to modify all the client software we had…

“there are no problems, only opportunities” — cit I.R.

So, wouldn’t it be great if we could create a virtual filesystem to map both the PSS and the SSS into a single directory?

And that’s what we’re gonna do today.

From the client software perspective, everything will remain unchanged, but under the hood all our read and write operations will be forwarded to the correct storage system.

Please note: I’m not saying that this is the best solution ever for this specific problem. There are probably better solutions to address this problem but… we have to talk about Python, don’t we?

What we’ll need

To start this project we just need to satisfy a couple of prerequisites:

  • Python
  • A good OS

I assume that you already have Python (if not… what are you doing here?), and for what about the OS keep in mind that this article is based on FUSE.

According to Wikipedia, FUSE is

a software interface for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a “bridge” to the actual kernel interfaces.

FUSE is available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3, Android andmacOS

So, if you use macOS you need to download and install FUSE, if you use Linux, keep in mind that Fuse has been merged into the mainstream Linux kernel in the 2.6.14 version, originally released in 2005, on October the 27th, so every recent version of Linux has it yet.

If you use Windows… well… I mean… I’m sorry buddy, but you didn’t satisfy the second prerequisite…

The fusepy module

First of all, to communicate with the FUSE module from Python you will need to install the fusepy module. This module is just a simple interface to FUSE and MacFUSE. Nothing more than this, so go on and install it by using pip:

pip install fusepy

Let’s start

There’s a great start point for building our filesystem, and it’s the Stavros Korokithakis code. What Stavros made is available on his GitHub repo and I will report it here:

Take a minute to analyze Stavros’ code. It just implements a “passthrough filesystem”, that just mount a directory into a mount point. For each operation requested to the mount point, it returns the python implementation on the real file of the mounted directory.

So, to try this code just save this file as Passthrough.py and run

python Passthrough.py [directoryToBeMounted] [directoryToBeUsedAsMountpoint]

That’s it! Now, your bare new filesystem is mounted on what you specified in the[directoryToBeUsedAsMountpoint] parameter and all the operations you will do on this mount point will be silently passed to what you specified in the [directoryToBeMounted] parameter.

Really cool, even if a little bit useless so far… 🙂

So, how can we implement our filesystem as said before? Thanks to Stavros, our job is quite simple. We just need to create a class that inherits from Stavros’ base class and overrides some methods.

The first method we have to override is the _full_path method. This method is used in the original code to take the mount point relative path and translate it to the real mounted path. In our filesystem, this will be the most difficult piece of code, because we will need to add some logic to define if the requested path belongs to the PSS or to the SSS. However, also this “most difficult piece of code” is quite trivial.

We just need to verify if the requested path exists at least in one storage system. If it does, we will return the real path, if not, we will assume that the path has been requested for a write operation on a file that does not exist yet. So we will try to look if the directory name of the path exists in one of the storage systems and we will return the correct path.

a look at the code will make things more clear:

Done this, we have almost finished. If we’re using a Linux system we have also to override the “getattr” function to return also the ‘st_blocks’ attribute (it turned out that without this attribute the “du” bash command doesn’t work as expected).

So, we need just to override this method and return the extra attribute:

And then we need to override the “readdir” function, that is the generator function that is called when someone does a “ls” in our mount point. In our case, the “ls” command has to list the content of both our primary storage system and our secondary storage system.

We’ve almost finished, we just need to override the “main” method because we need an extra parameter (in the original code we had one directory to be mounted and one directory to be used as a mount point, in our filesystem we have to specify two directories to be mounted into the mount point).

So here there is the full code of our new file system “dfs” (the “Dave File System” 😀 )

That’s it, now if we issue the command …

python dfs.py /home/dave/Desktop/PrimaryFS/ /home/dave/Desktop/FallbackFS/ /home/dave/Desktop/myMountpoint/

… we get a mount point (/home/dave/Desktop/myMountpoint/) that lists both the content of /home/dave/Desktop/PrimaryFS/ and /home/dave/Desktop/FallbackFS/ and that works as expected.

Yes, it was THAT easy!

A couple of notes

It worth to be noted that:

  1. when we instantiate the FUSE object with foreground=False we can run the operation in the background.
  2. The **{‘allow_other’: True} is really important if you need to share the mount point over the network with Samba (omitting this prevents you to share this directory).

That’s all folks, now stop reading and start to develop your first filesystem with Python! 🙂

D.

Developer and editor of "the Python corner". Apple user, Python and Swift addicted. NFL, Rugby and Chess lover. Constantly hungry and foolish.

%d bloggers like this: