As we all know, the preferred way to access AWS S3 is to use the
S3 API. However, you should know that it's also possible to mount an S3 bucket and access it like a normal filesystem.
As always with Open Source, you have a number of different options: the one I'll show you today is a user-land filesystem (aka FUSE) called
s3fs. The project is quite active and provides actual support to its users :)
This how-to assumes that you already have an AWS account with EC2 instances and S3 buckets. If not, go to the AWS website and
get started :)
As usual, I'll be using a vanilla Ubuntu 12.04 instance, let's '
ssh' into it and install what we need.
First, let's grab the
s3fs sources (at the time of writing, the latest version is 1.71):
$ wget http://s3fs.googlecode.com/files/s3fs-1.71.tar.gz
Then, let's add all the development packages required to build it. I'm starting from a fresh instance, so this is really needed. If your VM is already populated with all these packages, you can skip this step.
$ sudo apt-get update
$ sudo apt-get install make gcc g++ pkg-config libfuse-dev libcurl4-openssl-dev libxml2-dev
Now, let's build s3fs:
$ tar xvfz s3fs-1.71.tar.gz
$ cd s3fs-1.71
$ ./configure
$ make
$ sudo make install
The next step creates a file storing the AWS keys (insert your own!) required by S3 to allow your s3fs requests:
$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY/ > ~/.passwd-s3fs
$ chmod 400 ~/.passwd-s3fs
The last step is to create a mount point with the right ownership, as well as a cache directory:
$ sudo mkdir /mnt/s3
$ sudo chown ubuntu:ubuntu /mnt/s3
$ mkdir ~/cache
We are now ready to mount our bucket (again, use your own bucket name):
$ id
uid=1000(ubuntu) gid=1000(ubuntu)
$ s3fs -o uid=1000,gid=1000,use_cache=/home/ubuntu/cache myBucket /mnt/s3
$ mount
output removed for brevity
s3fs on /mnt/s3 type fuse.s3fs (rw,nosuid,nodev,user=ubuntu)
This worked. Let's copy a file and see what kind of performance we can get:
$ time cp /mnt/s3/6MegabyteFile .
real 0m0.572s
user 0m0.000s
sys 0m0.012s
About 10 Megabytes per second :-/ S3 is not famous for its speed and this is another proof. In this example, the EC2 instance and the S3 buckets are located in the same zone (eu-west). I'm not sure I want to find out what happens when they're not!
All the more reason to check that the cache works, then:
$ ls -l /home/ubuntu/cache/myBucket
-rw-r--r-- 1 ubuntu ubuntu 6196315 Jul 31 17:39 6MegabyteFile
$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.027s
user 0m0.000s
sys 0m0.020s
Yes, it does. Let's give it another shot:
$ rm /home/ubuntu/cache/myBucket/6MegabyteFile
$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.787s
user 0m0.000s
sys 0m0.012s
Argh, even slower! Now, let's unmount the bucket like any other filesystem:
$ sudo umount /mnt/s3/
Being able to access your buckets locally is a great feature for testing, debugging and maybe some simple production use cases. I've always been a sucker for filesystems ("Everything Is A File", remember?) and this one is definitely cool. Try it out!
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.