Guide

Writing a FUSE filesystem in Python

We ran into a problem last week. Our web application produces a lot of documents that have to be accessed frequently for a couple of months after they’re created. However, in less than a year these documents will be almost never accessed anymore, but we need to keep them available for the web application and for tons of other legacy apps that might need to access them.

Now, these documents take a lot of space on our expensive but super fast storage system (let’s call it primary storage system or PSS from now on) and we would like to be able to move them on the cheaper, not so good and yet quite slow storage system (that we’re going to call secondary storage system or SSS) when we believe that they will not be accessed anymore.

Our idea was to move the older files to the SSS and to modify all the software that needs to access the storage so to look at the PSS first and in the case nothing was found, to look at the SSS. This approach, however, meant that we should have to modify all the client software we had…

“there are no problems, only opportunities” — cit I.R.

So, wouldn’t it be great if we could create a virtual filesystem to map both the PSS and the SSS into a single directory?

And that’s what we’re gonna do today.

From the client software perspective, everything will remain unchanged, but under the hood all our read and write operations will be forwarded to the correct storage system.

Please note: I’m not saying that this is the best solution ever for this specific problem. There are probably better solutions to address this problem but… we have to talk about Python, don’t we?

What we’ll need

To start this project we just need to satisfy a couple of prerequisites:

  • Python
  • A good OS

I assume that you already have Python (if not… what are you doing here?), and for what about the OS keep in mind that this article is based on FUSE.

According to Wikipedia, FUSE is

a software interface for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a “bridge” to the actual kernel interfaces.

FUSE is available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3, Android andmacOS

So, if you use macOS you need to download and install FUSE, if you use Linux, keep in mind that Fuse has been merged into the mainstream Linux kernel in the 2.6.14 version, originally released in the 2005, on October the 27th, so every recent version of Linux has it yet.

If you use Windows… well… I mean… I’m sorry buddy, but you didn’t satisfy the second prerequisite…

The fusepy module

First of all, to comunicate with the FUSE module from Python you will need to install the fusepy module. This module is just a simple interface to FUSE and MacFUSE. Nothing more than this, so go on and install it by using pip:

pip install fusepy

Let’s start

There’s a great start point for building our filesystm, and it’s the Stavros Korokithakis code. What Stavros made is available on his github repo and I will report it here:

https://gist.github.com/mastro35/0c87b3b96278ef1bd0a6401ff552195e

Take a minute to analyze Stavros’ code. It just implements a “passthrough filesystem”, that just mount a directory into a mountpoint. For each operation requested to the mountpoint, it returns the python implementation on the real file of the mounted directory.

So, to try this code just save this file as Passthrough.py and run

python Passthrough.py [directoryToBeMounted] [directoryToBeUsedAsMountpoint]

That’s it! Now, your bare new filesystem is mounted on what you specified in the[directoryToBeUsedAsMountpoint] parameter and all the operations you will do on this mountpoint will be silently passed to what you specified in the [directoryToBeMounted] parameter.

Really cool, even if a little bit useless so far… 🙂

So, how can we implement our filesystem as said before? Thanks to Stavros, our job is quite simple. We just need to create a class that inherits from Stavros’ base class and overrides some methods.

The first method we have to override is the _full_path method. This method is used in the original code to take the mountpoint relative path and translate it to the real mounted path. In our filesystem this will be the most difficult piece of code, because we will need to add some logic to define if the requested path belongs to the PSS or to the SSS. However, also this “most difficult piece of code” is quite trivial.

We just need to verify if the requested path exists at least in one storage system. If it does, we will return the real path, if not, we will assume that the path has been requested for a write operation on a file that does not exists yet. So we will try to look if the directory name of the path exists in one of the storage system and we will return the correct path.

a look at the code will make things more clear:

https://gist.github.com/mastro35/6ae5e47f29334f89c0ebfbb088184767

Done this, we have almost finished. If we’re using a Linux system we have also to override the getattrfunction to return also the ‘st_blocks’ attribute (it turned out that without this attribute the “du” bash command doesn’t work as expected).

So, we need just to override this method and return the extra attribute:

https://gist.github.com/mastro35/8f55e87811c05ccdc5a860058d950354

And then we need to override the readdir function, that is the generator function that is called when someone do a “ls” in our mountpooint. In our case, the “ls” command has to list the content of both our primary storage system and our secondary storage system.

https://gist.github.com/mastro35/fc1937dfb13653c6d6a8848864384abf

We’ve almost finished, we just need to override the “main” method because we need an extra parameter (in the original code we had one directory to be mounted and one directory to be used as a mountpoint, in our filesystem we have to specify two directories to be mounted into the mountpoint).

So here’s the full code of our new file system “dfs” (the “Dave File System” 😀 )

https://gist.github.com/mastro35/9ae0e4f4bbe6bda0c540986cb9f7c47c

That’s it, now if we issue the command …

python dfs.py /home/dave/Desktop/PrimaryFS/ /home/dave/Desktop/FallbackFS/ /home/dave/Desktop/myMountpoint/

… we get a mountpoint (/home/dave/Desktop/myMountpoint/) that lists both the content of /home/dave/Desktop/PrimaryFS/ and /home/dave/Desktop/FallbackFS/ and that works as expected.

Yes, it was THAT easy!

A couple of notes

It worth to be noted that:

  1. when we instanciate the FUSE object with foreground=False we can run the operation in background.
  2. The **{‘allow_other’: True} is really important if you need to share the mountpoint over the network with Samba (omitting this prevents you to share this directory).

That’s all folks, now stop reading and start develop your first filesystem with Python! 🙂

D.

Please follow and like us:

Developer and editor of "the Python corner". Apple user, Python and Swift addicted. NFL, Rugby and Chess lover. Constantly hungry and foolish.

Leave a Reply

Your email address will not be published.