HOWTO AWS: mount S3 buckets from a Linux EC2 instance with s3fs

As we all know, the preferred way to access AWS S3 is to use the S3 API. However, you should know that it's also possible to mount an S3 bucket and access it like a normal filesystem.

As always with Open Source, you have a number of different options: the one I'll show you today is a user-land filesystem (aka FUSE) called s3fs. The project is quite active and provides actual support to its users :)

This how-to assumes that you already have an AWS account with EC2 instances and S3 buckets. If not, go to the AWS website and get started :)

As usual, I'll be using a vanilla Ubuntu 12.04 instance, let's 'ssh' into it and install what we need.

First, let's grab the s3fs sources (at the time of writing, the latest version is 1.71):

$ wget http://s3fs.googlecode.com/files/s3fs-1.71.tar.gz

Then, let's add all the development packages required to build it. I'm starting from a fresh instance, so this is really needed. If your VM is already populated with all these packages, you can skip this step.

$ sudo apt-get update

$ sudo apt-get install make gcc g++ pkg-config libfuse-dev libcurl4-openssl-dev libxml2-dev

Now, let's build s3fs:

$ tar xvfz s3fs-1.71.tar.gz

$ cd s3fs-1.71

$ ./configure
$ make

$ sudo make install

The next step creates a file storing the AWS keys (insert your own!) required by S3 to allow your s3fs requests:

$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY/ > ~/.passwd-s3fs

$ chmod 400 ~/.passwd-s3fs

The last step is to create a mount point with the right ownership, as well as a cache directory:

$ sudo mkdir /mnt/s3

$ sudo chown ubuntu:ubuntu /mnt/s3

$ mkdir ~/cache

We are now ready to mount our bucket (again, use your own bucket name):

$ id

uid=1000(ubuntu) gid=1000(ubuntu)

$ s3fs -o uid=1000,gid=1000,use_cache=/home/ubuntu/cache myBucket /mnt/s3

$ mount

output removed for brevity

s3fs on /mnt/s3 type fuse.s3fs (rw,nosuid,nodev,user=ubuntu)

This worked. Let's copy a file and see what kind of performance we can get:

$ time cp /mnt/s3/6MegabyteFile .

real 0m0.572s

user 0m0.000s

sys 0m0.012s

About 10 Megabytes per second :-/ S3 is not famous for its speed and this is another proof. In this example, the EC2 instance and the S3 buckets are located in the same zone (eu-west). I'm not sure I want to find out what happens when they're not!

All the more reason to check that the cache works, then:

$ ls -l /home/ubuntu/cache/myBucket

-rw-r--r-- 1 ubuntu ubuntu 6196315 Jul 31 17:39 6MegabyteFile

$ time cp -f /mnt/s3/6MegabyteFile .

real 0m0.027s

user 0m0.000s

sys 0m0.020s

Yes, it does. Let's give it another shot:

$ rm /home/ubuntu/cache/myBucket/6MegabyteFile

$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.787s

user 0m0.000s

sys 0m0.012s

Argh, even slower! Now, let's unmount the bucket like any other filesystem:

$ sudo umount /mnt/s3/

Being able to access your buckets locally is a great feature for testing, debugging and maybe some simple production use cases. I've always been a sucker for filesystems ("Everything Is A File", remember?) and this one is definitely cool. Try it out!