HOWTO AWS: mount S3 buckets from a Linux EC2 instance with s3fs

Published: 2013-07-31
As we all know, the preferred way to access AWS S3 is to use the S3 API. However, you should know that it's also possible to mount an S3 bucket and access it like a normal filesystem.

As always with Open Source, you have a number of different options: the one I'll show you today is a user-land filesystem (aka FUSE) called s3fs. The project is quite active and provides actual support to its users :)

This how-to assumes that you already have an AWS account with EC2 instances and S3 buckets. If not, go to the AWS website and get started :)

As usual, I'll be using a vanilla Ubuntu 12.04 instance, let's 'ssh' into it and install what we need.

First, let's grab the s3fs sources (at the time of writing, the latest version is 1.71):

$ wget http://s3fs.googlecode.com/files/s3fs-1.71.tar.gz

Then, let's add all the development packages required to build it. I'm starting from a fresh instance, so this is really needed. If your VM is already populated with all these packages, you can skip this step.

$ sudo apt-get update
$ sudo apt-get install make gcc g++ pkg-config libfuse-dev libcurl4-openssl-dev libxml2-dev

Now, let's build s3fs:

$ tar xvfz s3fs-1.71.tar.gz
$ cd s3fs-1.71
$ ./configure
$ make
$ sudo make install

The next step creates a file storing the AWS keys (insert your own!) required by S3 to allow your s3fs requests:

$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY/ > ~/.passwd-s3fs
$ chmod 400 ~/.passwd-s3fs

The last step is to create a mount point with the right ownership, as well as a cache directory:

$ sudo mkdir /mnt/s3
$ sudo chown ubuntu:ubuntu /mnt/s3
$ mkdir ~/cache

We are now ready to mount our bucket (again, use your own bucket name):

$ id
uid=1000(ubuntu) gid=1000(ubuntu) 
$ s3fs -o uid=1000,gid=1000,use_cache=/home/ubuntu/cache myBucket /mnt/s3
$ mount
output removed for brevity
s3fs on /mnt/s3 type fuse.s3fs (rw,nosuid,nodev,user=ubuntu)

This worked. Let's copy a file and see what kind of performance we can get:

$ time cp /mnt/s3/6MegabyteFile .
real 0m0.572s
user 0m0.000s
sys 0m0.012s

About 10 Megabytes per second :-/ S3 is not famous for its speed and this is another proof. In this example, the EC2 instance and the S3 buckets are located in the same zone (eu-west). I'm not sure I want to find out what happens when they're not!

All the more reason to check that the cache works, then:


$ ls -l /home/ubuntu/cache/myBucket
-rw-r--r-- 1 ubuntu ubuntu 6196315 Jul 31 17:39 6MegabyteFile
$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.027s
user 0m0.000s
sys 0m0.020s

Yes, it does. Let's give it another shot:

$ rm /home/ubuntu/cache/myBucket/6MegabyteFile
time cp -f /mnt/s3/6MegabyteFile .
real 0m0.787s
user 0m0.000s
sys 0m0.012s

Argh, even slower!  Now, let's unmount the bucket like any other filesystem:

$ sudo umount /mnt/s3/

Being able to access your buckets locally is a great feature for testing, debugging and maybe some simple production use cases. I've always been a sucker for filesystems ("Everything Is A File", remember?) and this one is definitely cool. Try it out!

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.

Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.