Engineering

Image Processing in Gatsby

Headshot, Derek Foster
By Derek Foster

Learn how Gatsby's image processing works under the hood, from compressing images and setting cache-control headers to enabling responsive images.

Back to all articles
Post hero image
Engineering

Image Processing in Gatsby

Headshot, Derek Foster
By Derek Foster

Learn how Gatsby's image processing works under the hood, from compressing images and setting cache-control headers to enabling responsive images.

According to The HTTP Archive, 90% of the pages on the web contain about 7MB of data, with 5.2MB being unoptimized images. That means no matter what you do to your site, if your images aren't optimized then your site is falling behind in terms of performance alone. Add in accessibility concerns and responsive design, and you have a recipe for disaster.

Luckily, Gatsby handles most image optimization problems for us. Out of the box, Gatsby compresses your images, converts them to modern formats (e.g. webp), and adds cache-control headers to them. With Gatsby's newest image plugin, you can also set up responsive images for your entire site. The topic of this blog post is not to discuss these optimizations themselves (look out for a future blog post on that), but rather how Gatsby does all that ‘magic' for you and how you can get the most out of Gatsby's image processing.

Ready to dive in?

The Basics

First and foremost, make sure you understand image formats.

Image formats

The old school, raster image formats are PNG and JPEG (or JPG). These image formats are best suited for actual photographs and pictures because the pixels in these images are explicitly defined.

We also have vector images, such as SVG. These image formats are for everything else that isn't a photograph. Icons and designed graphics come to mind here. SVGs should make up most of your site's images and be used whenever possible. They have huge benefits over raster image formats, such as:

  • Smaller file sizes. SVGs define shapes, lines, etc. and let the browser fill those in. Since the images don't need to explicitly define each pixel, file sizes are much smaller.
  • Inherently responsive. Again because of not explicitly defining pixels, SVGs will look nice and clean on any screen size. You can dynamically resize them too!
  • Can be written inline. SVGs don't actually have to be assets. You can write them inline in your HTML (or React).

Lastly, we have the new school raster image formats, WebP and AVIF. These follow the same usage guidelines as PNG and JPEG: only use them for photographs/images. The difference with these image formats is they are better for compression and quality.

Importing assets in Gatsby

Gatsby leverages a widely-used module bundler called webpack. Through webpack, you as a developer can import assets directly in your JavaScript files, like the example below:

import React from 'react'
// path can be absolute (like below) or relative to the current file
import heroAsset from '~/dev/super-sophisticated-site/hero-asset.png'

export default class MyComponent extends React.Component {
	render () {
		return (
			<img src={heroAsset} alt="Super Sophisticated's Hero Asset" />
		)
	}
}

Webpack is what enables line 3, where we import an image directly into JS

This import syntax, enabled by webpack, automatically minifies and bundles your assets, pushes any errors to build time instead of run time, and sets up caching on your assets.

What's also important to note here for image optimization is that these imports are where Gatsby applies a conversion for images to data URIs if the image is less than 10,000 bytes. It's more cost effective to embed the data directly in HTML in this case - the less network requests, the better!

The static folder

The last piece of the basics of images in Gatsby, is the static folder. It sits at the top-level directory of your project, and serves as an escape hatch for everything I talked about in the previous section on importing assets.

We do not use the static folder here at Anvil, and neither should you. All the optimizations above improve your site's performance, usability, and maintainability drastically, so only use this folder if you really need to. Gatsby has a good list on when you should use the static folder, but in general it's used for when you want to keep the name of an asset the same at run time or if you want to serve a custom script.

Under the hood

With the basics out of the way, let's get into what powers Gatsby's system to perform actions like data sourcing, dynamic image processing, and generating responsive images.

Data layer

At the core of Gatsby is the data layer. This is built into Gatsby and provides a uniform interface to access any data you need for your site.

Most static sites are article-based. But how do you host your content? Do you opt for a CMS like WordPress, Sanity, or Contentful? Or what if you want something simple, like keeping your content in markdown alongside your code? There are many places for your content to live, and the questions above are only for static content. Many more questions arise when user-specific, dynamic content is involved, likely requiring a database.

Designing the right solution for your content and site is complex, but the data layer Gatsby provides is a way to simplify accessing your data regardless of its source. Gatsby also has ways to transform your data nodes after sourcing it into the data layer, so by the time you access the data on your site, it is exactly in the format we want: optimized for production.

Gatsby's data layer

The diagram above uses 'source plugins' to gather data. We'll take a deeper look at plugins later on, but for now let's start an example of sourcing data using the gatsby-source-filesystem plugin.

Let's say we are starting a blog website that will feature technical writers. Since the blog authors will be technical, we aren't going to need a CMS; we are comfortable with the authors writing in markdown and potentially even having access to the entire codebase. With this in mind, all we need is markdown articles served right from the same repository as our code. This will be our directory structure:

super-sophisticated-technical-website
	| - src
	    | - components
	    | - pages
	| - markdown-articles
	    | - blog-posts
	| - gatsby-config.js
	| - gatsby-node.js
	| - package.json

Super Sophisticated™'s directory structure

With this directory structure, the configuration for our technical blog will be as follows:

const path = require('path')

export default {
	siteMetadata: {
		siteName: 'Super Sophisticated',
    title: 'The most sophisticated business',
    description:
      'Reduce costs and unlock growth by transitioning from paper and PDF-based processes to simple and flexible online workflows.',
    keywords: 'website,sophisticated,super,gatsby',
	},
	plugins: [
		{
    	resolve: 'gatsby-source-filesystem',
      options: {
        path: path.resolve('src/markdown-articles/blog-posts'),
        name: 'blog-posts',
      },
    },
	]
}

The plugins section uses gatsby-source-filesystem to load our markdown blog posts into Gatsby's data layer.

While not the topic of this blog post, the data in your gatsby-config.js's siteMetadata field gets automatically loaded into the data layer. So siteName, title, description, and keywords are all accessible throughout your site - this is useful for setting HTML meta tags for all pages.

Accessing your data

GraphQL is how you access your data from Gatsby's data layer. There are two ways to query for data:

  • Page queries
  • "Building block" components' static query

If you are unfamiliar with GraphQL, I highly recommend you take the time to learn it. GraphQL makes data querying much more efficient than REST endpoints, and maps to any data you need within your data model. If you're coming from REST and need a place to start, we have a blog post on consuming GraphQL APIs just for you. GraphQL uses REST under the hood, after all :)

Back to Gatsby and its GraphQL API. To start accessing your data, you will need to use the named import graphql from gatsby for both page queries and static queries. To start, let's look at a page query for a blog post template:

import React from 'react'
import { graphql } from 'gatsby'
import RehypeReact from 'rehype-react'

const MyGatsbyPage = ({ data }) => {

	/* AST renderer we use at Anvil: https://github.com/rehypejs/rehype-react */
	const renderAst = new RehypeReact({
	  createElement: React.createElement,
	}).Compiler

	return (
		<div>
			<h1>{data.frontmatter.title}</h1>
			<h2>{data.frontmatter.date}</h2>
			<h3>By ${data.frontmatter.author}</h3>
			<p>{data.frontmatter.summary}</p>
			<div id="blog-content">
				{renderAst(data.htmlAst)}
			</div>
		</div>
	)
}

export default MyGatsbyPage

export const pageQuery = graphql`
	query BlogPostByPath($slug: String!) {
		site {
			siteMetadata {
				siteName
				title
				description
				keywords
			}
		}
		markdownRemark(frontmatter: { path: { eq: $slug } }) {
			htmlAst
			frontmatter {
				date
				title
				summary
				author
				image {
					publicURL
					childImageSharp {
						gatsbyImageData
					}
				}
			}
		}
	}
`

Accessing data via a Gatsby page query, using GraphQL

Page queries, like the one above, are exported in the page component. Gatsby recognizes the exported const pageQuery and at build time will execute your query. The resulting data from the query is then passed into your page component (in our case, MyGatsbyPage) as a prop called data. You can now access any data from your query directly in your component!

The above example is supposed to be simple to show how page queries work... so for understanding this section, just focus on the export const pageQuery part and how we are able to use the prop data with data directly from the data layer. The other parts (markdownRemark, frontmatter, htmlAst, and childImageSharp) are from Gatsby plugins to make blog posts from markdown work and to enable responsive images. We'll talk about plugins in a bit!

As a note, the syntax I used in getting the variable data is called destructuring. The equivalent example without destructuring would be:

const MyGatsbyPage = (props) => {

	const data = props.data

	/* AST renderer we use at Anvil: https://github.com/rehypejs/rehype-react */
	const renderAst = new RehypeReact({
	  createElement: React.createElement,
	}).Compiler

	return (
		<div>
			<h1>{data.frontmatter.title}</h1>
			<h2>{data.frontmatter.date}</h2>
			<h3>By ${data.frontmatter.author}</h3>
			<p>{data.frontmatter.summary}</p>
			<div id="blog-content">
				{renderAst(data.htmlAst)}
			</div>
		</div>
	)
}

Accessing the data prop without destructuring

The other way to access data is in what Gatsby calls 'building block' components. These components aren't anything special; they just aren't page components… which makes them components you reuse anywhere and everywhere possible. Since they aren't page components, Gatsby has provided the StaticQuery component (old school way, made for class components) and the useStaticQuery hook (new school way, made for function components). The useStaticQuery is much cleaner and simpler to use, so let's do an example with it.

import React from 'react'
import { useStaticQuery, graphql } from 'gatsby'

const Header = () => {
  const data = useStaticQuery(graphql`
    query {
      site {
        siteMetadata {
          title
					description
        }
      }
    }
  `)
  return (
    <header>
      <h1>{data.site.siteMetadata.title}</h1>
			<h3>{data.site.siteMetadata.description}</h3>
    </header>
  )
}
export default Header

Accessing data in a non-page component, using the useStaticQuery hook

Using this hook isn't much different than the page query, the only difference is the data is retrieved as part of the component itself.

Now that you know the difference between the two types of queries, which should you use? Page queries are easier to spot. But 'building block' components are a bit trickier. When you create a new component that needs data and is not just presentational, ask yourself where the data is coming from.

Page queries actually cover most cases because you can pass down the queried data to your reusable components. The only time you should use a static query is if the component should always use your site's data independently of any page. A good example is your site's title or any site-wide configuration you set. Using static queries this way helps isolate data querying to the spot it's actually needed (in that building block component) and modularize your site (consumers of the component don't need to know data querying is happening).

Take the Header component above for example; the header of our site will never change - it will always have the title and description of our site. It's page independent! As such, we use useStaticQuery to get the data, and now all consumers of the Header component don't care about its internals; developers just use the component and it works.

Plugins

In the data layer section we configured the gatsby-source-filesystem plugin. But what are Gatsby plugins even for? There are two core reasons for the plugin system:

  • Modularity. Even though our example sources data from the filesystem, many other Gatsby-powered sites don't need that functionality. They either source from a CMS, a database, or don't source data at all.
  • Unified interface. Gatsby is an open source web framework made of many other technologies, like React, webpack, and babel. Customizing and optimizing your site yourself is a long process and oftentimes is cognitive overload; plugins provide enhancements to the underlying technology for us and provide an easier interface to customize each plugin via gatsby-config.js.

Gatsby provides official plugins, like gatsby-source-filesystem, and also has community plugins written by open source developers in the Gatsby community. The rest of this blog post will be covering the plugins you need for optimal image processing.

One note on plugins, you'll notice I highlighted the data layer's sourcing and transforming of data in the data layer section. That is because those terms are directly related to certain plugins in the Gatsby ecosystem: plugins used to source data into the data layer are named gatsby-source-<whatever_source_here> and plugins used to transform data while in the data layer are named gatsby-transformer-<whatever_file_format_here>.

Examples of source plugins:

Examples of transformer plugins:

Sharp

Now that we know about Gatsby's data layer and plugin system, let's add image processing to our Super Sophisticated™ site. Sharp is a Node.js image processing library that Gatsby uses for its plugins. It resizes and compresses images, as well as converts them to web-optimal formats (WebP and AVIF).

Since we are sourcing data from the filesystem, we don't need to worry about getting our images into the data layer. We do need to add Sharp image processing, so let's add and configure two new plugins:

export default {
	...
	plugins: [
		{
      resolve: 'gatsby-plugin-sharp',
      options: {
        defaults: {
          placeholder: `blurred`,
        },
      },
    },
    'gatsby-transformer-sharp',
		// any other plugins
	]
}

Adding the Sharp plugins in to our gatsby-config.js

gatsby-plugin-sharp adds the low-level Sharp capabilities to our system. You can think of it like adding the sharp NPM package to our project, but we haven't actually used it yet. This plugin is where you will configure the actual image processing options to your liking; you'll notice that I configured the default placeholder to be blurred. I'll touch on that in the next section more when we talk about responsive images.

The gatsby-transformer-sharp plugin is the Gatsby transformer to allow the images to be usable in Gatsby. After the data is sourced into the data layer, this plugin will leverage Sharp image processing to provide you with optimized images. This is where we actually use the capabilities from gatsby-plugin-sharp (making gatsby-plugin-sharp a pseudo peer dependency in NPM terms).

Gatsby Image Plugin

The last piece of the Gatsby image processing puzzle is the relatively new gatsby-plugin-image. After installing and configuring the plugin, it leverages the two Sharp plugins we installed in the previous section and gives us responsive images. Responsive images solve two problems with image loading on the web: the art direction problem and the resolution switching problem.

If you're curious about these topics, look out for the future blog post on web optimized images. In the meantime, let's see how Gatsby makes including responsive images as easy as importing and using a React component.

StaticImage

gatsby-plugin-image gives us two core components to use in our site: StaticImage and GatsbyImage.

The StaticImage component has two required props. Both are standard props on HTML img tags: src and alt. Using StaticImage is for images that never change and are known before build time. This is strictly enforced by this component, as you can only pass an absolute path, a relative path, or a local variable that resolves to a path to the src parameter. You cannot pass props from other components into src. Actually, you can't pass any props from other components into any props on this component at all! This is a static image, so absolutely no dynamic data is allowed.

A good example of an image to be used for StaticImage is your hero asset, or any design assets throughout your site. They are never going to change (unless you redesign your site), so referencing them directly by path for StaticImage makes sense over importing them as JavaScript modules.

import React from 'react'
import { StaticImage } from 'gatsby-plugin-image'

export default function MyStaticImage () {
	return (
		<StaticImage
			src="../static/img/hero90em.png"
			alt="Hero Giant"
		/>
	)
}

StaticImage usage. Both src and alt are required

GatsbyImage

Tying everything we've covered in this blog post gives us the GatsbyImage component. This is the component you will use for dynamic images that are determined at build time. The order for the images used by this component are as follows:

  1. Sourced into Gatsby's data layer (by gatsby-source-filesystem, in our example)
  2. Transformed and optimized (by gatsby-transformer-sharp and gatsby-plugin-sharp)
  3. Accessed via a GraphQL query (either as a page query or a static query)
  4. Converted to responsive images (by gatsby-plugin-image and this component)
  5. Used in your JSX & generated HTML

Because GatsbyImages use dynamic data from the data layer, we are able to pass in any props we want. This is desirable for setting up a blog because we can create a template for a blog post and then pass in the appropriate content to the template at build time and create as many blog posts as we want.

There are several helper functions to use as well, but the main one you will use is getImage. Since the returned data from the data layer is from gatsby-transformer-sharp, it will be in a childSharp object and in sharp format. The getImage helper will take that object, and return to us the data correctly to be used in GatsbyImage.

Let's see this in action, using an image file named dynamic-test.png:

import React from 'react'
import { useStaticQuery, graphql } from 'gatsby'
import { GatsbyImage, getImage } from 'gatsby-plugin-image'

export default function MyGatsbyImage () {
	const data = useStaticQuery(graphql`
		  query ImageQuery {
		    file(
					ext: { eq: "png" },
					name: { eq: "dynamic-test"}
				) {
					childImageSharp {
						gatsbyImageData
					}
				}
		  }
		`)

	return (
		<GatsbyImage
      image={getImage(data.image.childImageSharp.gatsbyImageData)}
      alt="Dynamic Test Image"
    />
	)
}

We need to query for data to use GatsbyImage. Using the helper getImage, we are able to pass the correct data to the component after our GraphQL query against the data layer.

Shared props & options

StaticImage and GatsbyImage have different props for setting the image (src for StaticImage, and image for GatsbyImage), but most of their props are shared. The props for these components are mainly about styling the images, especially as they are loaded.

Easily confused with props, there are also options for these components. All the options are shared between the two components, but there are two differences in how the options get applied to StaticImage and GatsbyImage. Remember the core difference between the two components (one is completely static, one is dynamic) and those differences will make a lot of sense.

One option I want to highlight is the placeholder option. This option controls the initial placeholder for the image when the page is first loaded, until the image fully loads. The user should see this for only a second at most (hopefully), and then the full image will replace the placeholder.

In the Sharp section, I explicitly set this option to blurred. The default for this option is dominantColor, which I think is the worst value besides none. dominantColor analyzes the image to figure out which color is most prevalent, and then the entire image dimensions are filled with this color. To me, seeing a giant block of black, red, green, or whatever color is determined is very noticeable. It's even more noticeable when the block disappears and your image is put in - it's jarring, to say the least. This is why I'm a fan of blurred. Instead, a low-resolution, blurred out version of the real image is used. When the real image is loaded, it feels like a much more natural transition.

Summary

By now, you should have a great understanding of Gatsby's image processing, and more importantly, the data layer and plugin system that underpins the entire Gatsby ecosystem. Even if you don't understand the optimizations applied to your images, you are able to get the most out of them through your knowledge of Gatsby and this incredible framework. If you have any comments, questions, or image optimization tips you'd like to share, we'd love to hear from you at developers@useanvil.com.

Lastly, if you'd like to see the gatsby-config.js all together for our Super Sophisticated™ example, check it out below. It includes plugins we did not explicitly go over, but are needed to build our technical blog website.

Let's continue to build the web fast for all ⚡️

const path = require('path')

export default {
	siteMetadata: {
		siteName: 'Super Sophisticated',
    title: 'The most sophisticated business',
    description:
      'Reduce costs and unlock growth by transitioning from paper and PDF-based processes to simple and flexible online workflows.',
    keywords: 'website,sophisticated,super,gatsby',
	},
	plugins: [
		'gatsby-plugin-resolve-src',
		{
    	resolve: 'gatsby-source-filesystem',
      options: {
        path: path.resolve('src/markdown-articles/blog-posts'),
        name: 'blog-posts',
      },
    },
		{
      resolve: 'gatsby-plugin-sharp',
      options: {
        defaults: {
          placeholder: `blurred`,
        },
      },
    },
    'gatsby-transformer-sharp',
		'gatsby-plugin-image',
		{
      resolve: 'gatsby-transformer-remark',
      options: {
        plugins: [
          'gatsby-remark-autolink-headers',
          'gatsby-remark-images',
          'gatsby-remark-copy-linked-files',
					'gatsby-remark-prismjs'
        ],
      },
    },
	]
}

Final gatsby-config.js for our technical blog site