AWS has three storage options: EBS (block storage), EFS (file storage) and S3 (object storage). I don't know the exact differences and don't know when you should use one over the other. Please read their comparison and use Google for more information on your specific use case.
For me, I use EBS as the storage for the OS'es. I don't use EFS, but can image this to be perfect for sharing files between instances or containers. For example two WordPress instances who share the same settings files. Or maybe even my application files which I now put on a second EBS volume (not sure if that works though).
S3 is the option we use for bulk storing data that goes out into the world. Objects like images, documents, videos that are send to your user or site visitor.
Choosing to use S3 is not the end though. S3 is subdivided into multiple classes. From the standard S3 class through cheaper Infrequent Access classes to the very cheap archive classs, Glacier.
As explained in part 3 about the architecture, we will be using three classes:
- S3 Standard for user and website files. And for recent backups.
- S3 Standard-IA for backups that are older than 6 months or 2 application versions.
- S3 Glacier for backups older than 1 year.
To understand S3 and object storage let's consider an image, for example the jodiBooks logo. An object is the collection of a globally unique key, object data and metadata.
The object data is the actual file you store in S3; the bits that describe that data. Metadata is the description of the file type, tags, size, creation date, version and access rights.
Every object needs to have a globally unique key. This is a string containing the bucket, folder and filename + extension.
The last two parts are the key for the object. This does not need to be globally unique. The bucket will do that part. In this example we put the file in a subfolder of the bucket, so the key is not
A bucket is a globally unique, DNS friendly name, for a collection of objects. I compare it with files (objects) in a folder (bucket). It's not completely correct, but good enough. You can set different access credentials to every bucket you make, not to individual objects. E.g. our image bucket is publicly available, while the user data bucket is only accessible by our application.
We want to put the logo in our emails, so for the recipients to download and see it, it has to be the former: publicly accessible. We called our public bucket
jodibooks-public-cdn. It's all in the name...
When creating a bucket you have to choose the region. The region closest to our customers is
eu-central-1. Choosing a region close to your users, customers or visitors minimizes latency. When you have customers in multiple regions, consider using CloudFront. This service will cache files close to your customers.
When creating the link to your objects you don't have to specify the region. You can do it if, for example, you would have a different logo in Asia, Europe and the Americas. But I'd think you would use other techniques to achieve that.
If you enable versioning, S3 will also add a version ID to the object metadata. Every version is in effect a new object. If you didn't enable versioning, or always want to retrieve the latest, you can ignore this. S3 will send the latest (or only) version by default.
Now that we have all the object data we can make a hyperlink that let's you retrieve the file or object data directly. There are unfortunately multiple ways to format this link. We just have to choose which for us is easiest I guess.
In our logo example this is the link using the last format:
https://jodibooks-public-cdn.s3.eu-central-1.amazonaws.com/logo.png. (Since writing this how-to switched to hosting the files through CloudFront, so the direct S3 link is not functional anymore.)
Obviously an important part about data storage anywhere is security. We don't want unauthorized access to our users private data. And nobody needs to see our backups. The S3 service has multiple features that allow us to control this.
If you want to have a publicly available bucket, images on your homepage and blog, you can set a bucket to allow public access. The default value is to block all public access.
If you disable some "blocks", thus enabling public access, AWS will show you warnings in multiple of their "advisory" services (Trusted Advisor, Access analyzer).
Public bucket warning in Trusted Advisor
Allowing access to a bucket can be done through Access Control Lists. This is done really easy through the console. The downside is you don't have much settings. Granting public read access is doable (see picture below), but if you want to restrict access to certain users or roles, it's not possible.
That's where bucket policies come in. Through a bucket policy you can deny or allow certain users, roles or services to access (read, write, delete, list, etc.) the bucket or its subfolders.
Nothing will be 100% secure, so if someone might get a copy of your data it better be encrypted. We send our data over https to the bucket, so it is encrypted during transfer. Amazon offers a service to encrypt it when it is stored. If someone makes a copy or steal some physical servers, he can't access the actual data.
If you want even more protection you can set your own keys through KMS or encrypt all your data yourself (client-side).
Although this is not security per se, it does help. S3 will not overwrite an object, but version it when it is modified or deleted. In effect it will only change the metadata, not the object data. If you make a mistake, you can go back.
Once enable you cannot disable it. You have to delete and recreate the bucket or delete versions manually.
Now that we covered the S3 basics, let's create a public bucket and then turn it private.
Go to the S3 console. It will show a list of all your buckets. Obviously none if you haven't created one. Click Create bucket.
Enter a Bucket name. This should be globally unique and DNS-compliant. What that means can be found in the AWS docs (see section Rules for Bucket Naming in the link below). Choose a Region close to your users and click Next.
We want to enable Versioning and enable Default encryption using AES-256. Click Next.
We are making a public bucket, we uncheck Block all public access and Acknowledge that we understand the implications.
In the next screen we can review our settings and Create the bucket.
Out bucket shows up in the list and we can click the bucket to open it.
Now we go to the Permissions tab, click on Access Control List, select group Everyone under the Public access header and select List objects. Once we've pressed Save everyone can download (Get) objects from this bucket.
We can make a new bucket, but let's just turn the one we just made into a private bucket. You can see the bucket is currently public by the yellow tag that's visible everywhere. This also shows where the bucket is made public, in our case through the Access Control List.
Open the bucket and go to the Permissions tab. Block public access is selected by default. Click Edit and check Block all public access. Press Save.
confirmin the text field as requested and click Confirm. You don't want to do this accidentally when users need access to the bucket.
The only way to get into the bucket now, is through a user or role with sufficient credentials. As we configured ourselves as admins in part 4, we have full access. Our EC2 instances however have no S3 permissions through their Roles and thus cannot access any bucket. We'll configure that in part 14 and 15.
For our applications we need 5 buckets. Let's create them using the steps above, all in the region
With that we have configured our AWS environment. We will need to add and edit some small things, like specific policies and roles, but we'll get to that when configuring the software that needs it.
In the next part, we'll be setting up the tools and applications needed to run our ASP.NET apps and our WordPress blog. We'll also make sure the apps are actually running.