WEDNESDAY 28TH OF OCTOBER 2020
Back in May, we shared that SpeedCurve supports Google's Core Web Vitals in both our synthetic monitoring and LUX real user monitoring tools. Two of the Web Vitals – Largest Contentful Paint (LCP) and First Input Delay (FID) – were actually available in SpeedCurve for quite a while prior to the announcement. The newcomer to the scene was Cumulative Layout Shift (CLS), and, not surprisingly, it's the metric that's gotten the most questions.
A few of the questions I've been asked (or asked myself) about Cumulative Layout Shift:
Six months in, I've had a chance to gather and look at a lot of data, talk with customers, and learn from our friends in the performance community. Here's what I've learned so far.
Cumulative Layout Shift measures how visually stable a page is. It's a formula-based metric that, put very (very) simply, takes into account how much visual content shifts within the viewport, combined with the distance that those visual elements shifted. You can dig deeper into the mechanics of how it's calculated, but the human-friendly definition is that CLS helps you understand how likely a page is to deliver a janky, unpleasant experience to viewers.
According to Google, pages should maintain a CLS score of less than 0.1 at the 75th percentile for mobile and desktop devices. A score greater than 0.25 is considered poor. (It's important to note that CLS is a boundless measure that, while abnormal, can be greater than 1.)
One of the things I like about Cumulative Layout Shift is that it can be measured in both synthetic and RUM. This lets us explore it in a bunch of complementary ways. One of the first things I want to do is look at synthetic data, which lets us generate filmstrip views of pages and see where layout shifts are actually happening.
SpeedCurve's Industry Benchmarks dashboard – which tracks the performance of top sites in retail, media, travel, and other industries – is a good place to start. Here are the current US media benchmarks on a fast desktop connection, ranked by CLS score.
As you can see right away in these filmstrips, a better or worse Cumulative Layout Shift doesn't correlate to faster initial rendering. That's fine, as we don't expect it to. The Washington Post (the bottom filmstrip) has the poorest CLS score, but it's also the first to start rendering. So is this a good user experience, or a poor one? This seems like an invitation to go in for a closer look, so let's do that.
Drilling down into a detailed test page for The Washington Post, the CLS score is 0.8417 – much worse than the 0.1 recommended by Google. As the layout shift visuals below demonstrate, the two biggest culprits appear to be:
Looking at these frames, it's easy to see the issues that generated this page's high CLS score. It's a bit trickier to figure out how much these layout shifts actually hurt user-perceived performance – and ultimately the business.
This is where it's helpful to look at real user data. If you're already capturing metrics like bounce rate or conversion rate, you can correlate Cumulative Layout Shift scores against them and see if you can spot any trends. (Learn more about correlation charts here.)
In this next set of charts, I looked at a month's worth of anonymized real user monitoring (RUM) data for four different retail sites. The results were interesting.
For this site, conversion behaviour followed Google's CLS guidelines very closely. You can see that the conversion rate is highest in the "good" zone, and it drops off significantly right before hitting 0.1. It plateaus in the "needs improvement" zone, and then takes another sharp dip in the "poor" zone after 0.25. For this site, CLS arguably correlates to buyer behaviour. This site owner might want to look into page issues that could be causing jank and driving away customers.
Similar to the chart above, the chart below indicates that user behaviour correlates predictably with poorer CLS scores. For this site, as CLS worsens, so does bounce rate:
If you stopped after looking at the first two correlation charts, you might be convinced that CLS absolutely is a predictor of user experience and behaviour. The chart below, however, shows only a mild worsening of bounce rate as the CLS score degrades:
And finally, this chart shows that, as the CLS score gets worse, bounce rate actually gets better:
The reason for sharing these charts is not to cast doubt on the validity and usefulness of CLS as a metric. The point is to illustrate that – like all your other metrics – your CLS scores need to be validated within the context of your own site.
Speaking of validation, this is a good opportunity to talk about validating how you measure CLS via different tools. (Hat tip to my awesome colleague Cliff for sharing his insights with me in this next section!)
While the method for calculating CLS is consistent across tools, there are a couple of gotchas to be aware of when comparing mixed data sets.
1. CLS is an accumulation of layout shifts that occur during a page's life cycle. However, measurement of the life cycle may differ across technologies.
The differences in the duration of the page life cycle can account for some of the larger discrepancies we've seen in CLS.
*CrUX captures data from Chrome for users who have opted-in, not set up a Sync passphrase, and have usage statistic reporting enabled.
2. Synthetic monitoring is well known for creating a 'lab' environment, where there are few, if any, changes between measurements. Browser version, device, viewport size, network throttling, and CPU throttling all remain constant. This allows you to more accurately baseline the target application.
In the real world, however, you have a complex distribution of various devices and environmental conditions. This can lead to a bit of variability when accumulating layout shifts. If a viewport is much smaller, some shifts may occur below the fold. As network conditions vary, you may see differences in CLS due to slow/fast loading of fonts, and so on.
The point is, measurements in the wild will vary when comparing to a baseline synthetic measurement (or perhaps a Lighthouse test run).
While your CLS score might not always reflect how likely users are to bounce from your site, it's still a useful metric for investigating performance issues. One of the greatest benefits of Cumulative Layout Shift is that it makes us think outside of the usual time-based metrics, and instead it gets us thinking about the other subtle ways that unoptimized page elements can degrade the user experience.
As you start to track CLS on your own site, keep in mind that your results may vary depending on how your pages are built, which measurement tools you use, and whether you're looking at RUM or synthetic data. If you use both synthetic and RUM monitoring:
I'd love to hear people's thoughts on measuring and analyzing CLS in the real world!comments powered by Disqus