How to do static site caching in CloudFront

How to do static site caching in CloudFront

How-to written and screenshots taken on 2021 January 305 min read

Introduction

One big advantage of static website is that they are, well, static. This means they don't change between users. The website is rendered differently on the client, depending on the users input, but the downloaded files are the same for every user. This allows for efficient hosting when using the correct caching strategies (multiple users download the same file, but the file only has to be requested from the host once).

When hosting a static websites in CloudFront, you probably use a Lambda@edge function to link requests (paths) to S3 origin locations (origin request) and a Lambda@edge function to add or modify headers in the response (origin response). We can use the latter to add the Cache-Control header to the response. This header tells the browser to cache the file for the time we set. CloudFront also reads this header and uses it to determine its caching time for. So we have a double benefit:

  1. We get less viewer requests, improving the page load times and lowering our bandwidth (and thus costs).
  2. We need less origin requests, improving page load times and lowering the amount of S3 GET requests, again lowering costs.

More info about hosting in CloudFront:

List static files and folders

At jodiBooks we use three static site generators or frameworks: Gatbsy.js, Next.js and Docusaurus v2. All of them serialize the static files (images, scripts, stylesheets, downloads) or put them in a serialized folder. Unfortunately each framework uses its own method of grouping the static files and it is quite difficult (or impossible) to find the exact specifics in their docs. So you have to figure this out yourself. I'll save you some trouble with the table below, which lists the cacheable resources I identified.

File typesDocusaurusGatbsy.jsNext.js
Scripts (.js, .js.map)*.js (in main folder only)*.js, *.js.map (everywhere)/_next/static
Stylesheets (.css)*.css (in main folder only)*.css (everywhere)/_next/static
Images (.png, .jpg, .webp, .gif)/assets/images/static/_next/static
Downloads/asset (.pdf, .zip)/assets/files/assets (when manually versioned)/_next/static
Excluded folders (non-serialized)all other folders (/css, /js specifically)all other foldersall other folders

Create Node Lambda functions

We want to cache as many of the serialized files as possible and as each framework has a slightly different strategy we have to either create 3 separate Lambda functions or write a more complicated one with some conditional logic.

Depending on your framework, you can take one of the JavaScript functions from this Github gist and connect them to the origin response of your CloudFront distribution. Also included are jest tests to check if indeed the findings in the table above are indeed honored.

Combine and conquer them all

If you feel adventurous or don't want to have multiple functions, basically performing the same task, you can also combine them into one. I made one assumption here (as that applied to our use case): all the downloads (assets) in the Gatsby.js site are manually versioned.

The Lambda function

// index.js
// Attached to: ORIGIN RESPONSE
const cacheableFolders = [
'/assets/', // assumes you manually version all files in this folder in Gatsby.js
'/static/',
'/_next/static/'
];
// Docusaurus: main folder only!
const cacheableExtensions = [
'.css',
'.js',
'js.map',
];
// Docusaurus: excluded folders!
const excludedFolders = [
'/css/',
'/js/'
];
const headerCacheControl = 'Cache-Control';
const headerCacheControlLC = headerCacheControl.toLowerCase();
const defaultTimeToLive = 60 * 60 * 24 * 365; // 365 days in seconds
const noCacheResponseHeader = {
key: headerCacheControl,
value: `public, max-age=0, must-revalidate`,
};
const cacheResponseHeader = {
key: headerCacheControl,
value: `public, max-age=${defaultTimeToLive}, immutable`,
};
exports.handler = (event, context, callback) => {
// Extract the request and response from the Cloudfront Origin Response event
const { request, response } = event.Records[0].cf;
const { headers } = response;
if (response.status === '200') {
if (!headers[headerCacheControlLC]) {
// Cache file if it is within one of 'cacheableFolders'
if (cacheableFolders.some(folder => request.uri.includes(folder))) {
headers[headerCacheControlLC] = [cacheResponseHeader];
// Or if filetype is one of 'cacheableExtensions' (use 'endsWith()' to exclude 'json'; using 'includes()' also includes 'json')
} else if (cacheableExtensions.some(extension => request.uri.endsWith(extension))) {
// if request is to an excluded folder do not cache
if (excludedFolders.some(folder => request.uri.includes(folder))) {
headers[headerCacheControlLC] = [noCacheResponseHeader];
}
// Otherwise (not in an excluded folder) do cache requested file
else {
headers[headerCacheControlLC] = [cacheResponseHeader];
}
}
// Otherwise (not in a cacheable folder) do not cache requested file
else {
headers[headerCacheControlLC] = [noCacheResponseHeader];
}
}
}
// Return to Cloudfront Origin Response event
callback(null, response);
};