HTML5 Rocks

HTML5 Rocks

Profiling Long Paint Times with DevTools' Continuous Painting Mode

By Paul Irish at

Continuous painting mode for paint profiling is now available in Chrome Canary. This article explains how you identify a problem in page painting time and how you can use this new tool to detect bottlenecks in painting performance.

Investigating painting time on your page

So you noticed that your page doesn't scroll smoothly. This is how you would start tackling the problem. For our example, we'll use the demo page Things We Left On The Moon by Dan Cederholm as our example.

You open the Web Inspector, start a Timeline recording and scroll your page up and down. Then you look at the vertical timelines, that show you what happened in each frame.

If you see that most time is spent painting (big green bars above 60fps), you need to take a closer look at why this is happening. To investigate your paints, use the Show paint rectangles setting of the Web Inspector (cog icon in the bottom right corner of the Web Inspector). This will show you the regions where Chrome paints.

There are different reasons for Chrome to repaint areas of the page:

  • DOM nodes get changed in JavaScript, which causes Chrome to recalculate the layout of the page.
  • Animations are playing that get updated in a frame-based cycle.
  • User interaction, like hovering, causes style changes on certain elements.
  • Any other operation that causes the page layout to change.

As a developer you need to be aware of the repaints happening on your page. Looking at the paint rectangles is a great way of doing that. In the example screenshot above you can see that the whole screen is covered in a big paint rectangle. This means the whole screen is repainted as you scroll, which is not good. In this specific case this is caused by the CSS style background-attachment:fixed which causes the background image of the page to stay at the same position while the content of the page moves on top of it as you scroll.

If you identify that the repaints cover a big area and/or take a long time, you have two options:

  1. You can try to change the page layout to reduce the amount of painting. If possible Chrome paints the visible page only once and adds parts that have not been visible as you scroll down. However, there are cases when Chrome needs to repaint certain areas. For example the CSS rule position:fixed, which is often used for navigation elements that stay in the same position, can cause these repaints.

  2. If you want to keep your page layout, you can try to reduce the painting cost of the areas that get repainted. Not every CSS style has the same painting cost, some have little impact, others a lot. Figuring out the painting costs of certain styles can be a lot of work. You can do this by toggling styles in the Elements panel and looking at the difference in the Timeline recording, which means switching between panels and doing lots of recordings. This is where continuous painting mode comes into play.

Continuous painting mode

Continuous painting mode is a tool that helps you identify which elements are costly on the page. It puts the page into an always repainting state, showing a counter of how much painting work is happening. Then, you can hide elements and mutate styles, watching the counter, in order to figure out what is slow.

Setup

In order to use continuous painting mode you need to use Chrome Canary.

On Linux systems (and some Macs) you need to make sure that Chrome runs in compositing mode. This can be permanently enabled using the GPU compositing on all pages setting in about:flags.

How To Begin

Continuous painting mode can be enabled via the checkbox Enable continuous page repainting in the Web Inspector's settings (cog icon in the bottom right corner of the Web Inspector).

The small display in the top right corner shows you the measured paint times in milliseconds. More specifically it shows:

  • The last measured paint time on the left.
  • The minimum and maximum of the current graph on the right.
  • A bar chart displaying the history of the last 80 frames on the bottom (the line in the chart indicates 16ms as a reference point).

The paint time measurements are dependent on screen resolution, window size and the hardware Chrome is running on. Be aware that these things are likely to be different for your users.

Workflow

This is how you can use continuous painting mode to track down elements and styles that add a lot of painting cost:

  1. Open the Web Inspector's settings and check Enable continuous page repainting.
  2. Go to the Elements panel and traverse the DOM tree with the arrow keys or by picking elements on the page.
  3. Use the H keyboard shortcut, a newly introduced helper, to toggle visibility on an element.
  4. Look at the paint time graph and try to spot an element that adds a lot of painting time.
  5. Go through the CSS styles of that element, toggling them on and off while looking at the graph, to find the style that causes the slow down.
  6. Change this style and do another Timeline recording to check if this made your page perform better.

The animation below shows toggling styles and its affect on paint time:

continuouspaint screencast

This example demonstrates how turning either one of the CSS styles box-shadow or border-radius off, reduces the painting time by a big amount. Using both box-shadow andborder-radius on an element leads to very expensive painting operations, because Chrome can't optimize for this. So if you have an element that gets a lot of repaints, like in the example, you should avoid this combination.

Notes

Continuous painting mode repaints the whole visible page. This is usually not the case when browsing a web page. Scrolling usually only paints the parts that haven't been visible before. And for other changes on the page, only the smallest possible area is repainted. So check with another Timeline recording if your style improvements actually had an impact on the paint times of your page.

When using continuous painting mode you might discover that e.g. the CSS styles border-radius and box-shadow add a lot of painting time. It is not discouraged to use those features in general, they are awesome and we are happy they are finally here. But it's important to know when and where to use them. Avoid using them in areas with lots of repaints and avoid overusing them in general.

Learn more about painting and related topics on jankfree.com

Live Demo

Click below for a demo where Paul Irish uses continuous painting to identify a uniquely expensive paint operation.

Stick your landings! position: sticky lands in WebKit

By Eric Bidelman at

position: sticky is a new way to position elements and is conceptually similar to position: fixed. The difference is that an element with position: sticky behaves like position: relative within its parent, until a given offset threshold is met in the viewport.

Use cases

Paraphrasing from Edward O’Connor's original proposal of this feature:

Many web sites have elements that alternate between being in-flow and having position: fixed, depending on the user's scroll position. This is often done for elements in a sidebar that the page author wants to be always visible as the user scrolls, but which slot into a space on the page when scrolled to the top. Good examples are news.google.com (the "Top Stories" sidebar) and yelp.com (search results map).

Introducing sticky positioning

LAUNCH DEMO

By simply adding position: sticky (vendor prefixed), we can tell an element to be position: relative until the user scrolls the item (or its parent) to be 15px from the top:

.sticky {
  position: -webkit-sticky;
  position: -moz-sticky;
  position: -ms-sticky;
  position: -o-sticky;
  top: 15px;
}

At top: 15px, the element becomes fixed.

To illustrate this feature in a practical setting, I've put together a DEMO which sticks blog titles as you scroll.

Old approach: scroll events

Until now, to achieve the sticky effect, sites setup scroll event listeners in JS. We actually use this technique as well on html5rocks tutorials. On screens smaller than 1200px, our table of contents sidebar changes to position: fixed after a certain amount of scrolling.

Here's the (now old way) to have a header that sticks to the top of the viewport when the user scrolls down, and falls back into place when the user scrolls up:

<style>
.sticky {
  position: fixed;
  top: 0;
}
.header {
  width: 100%;
  background: #F6D565;
  padding: 25px 0;
}
</style>

<div class="header"></div>

<script>
var header = document.querySelector('.header');
var origOffsetY = header.offsetTop;

function onScroll(e) {
  window.scrollY >= origOffsetY ? header.classList.add('sticky') :
                                  header.classList.remove('sticky');
}

document.addEventListener('scroll', onScroll);
</script>

Try it: http://jsbin.com/omanut/2/

This is easy enough, but this model quickly breaks down if you want to do this for a bunch of DOM nodes, say, every <h1> title of a blog as the user scrolls.

Why JS is not ideal

In general, scroll handlers are never a good idea. Folks tend to do too much work and wonder why their UI is janky.

Something else to consider is that more and more browsers are implementing hardware accelerated scrolling to improve performance. The problem with this is that on JS scroll handlers are in play, browsers may fall back into a slower (software) mode. Now we're no longer running on the GPU. Instead, we're back on the CPU. The result? User's perceive more jank when scrolling your page.

Thus, it makes a lot of sense to have such feature be declarative in CSS.

Support

Unfortunately, there isn't a spec for this one. It was proposed on www-style back in June and just landed in WebKit. That means there's no good documentation to point to. One thing to note however, according to this bug, if both left and right are specified, left wins. Likewise, if top and bottom are used at the same time, top wins.

Support right now is Chrome 23.0.1247.0+ (current Canary) and WebKit nightly.

When milliseconds are not enough: performance.now()

By Paul Irish at

The High Resolution Timer was added by the WebPerf Working Group to allow measurement in the Web Platform that's more precise than what we've had with +new Date and the newer Date.now().

So just to compare, here are the sorts of values you'd get back:

   Date.now()         //  1337376068250
   performance.now()  //  20303.427000007
   

You'll notice the two above values are many orders of magnitude different. performance.now() is a measurement of floating point milliseconds since that particular page started to load (the performance.timing.navigationStart timeStamp to be specific). You could argue that it could have been the number of milliseconds since the unix epoch, but rarely does a web app need to know the distance between now and 1970. This number stays relative to the page because you'll be comparing two or more measurements against eachother.

Monotonic time

Another added benefit here is that you can rely on the time being monotonic. Let's let WebKit engineer Tony Gentilcore explain this one:

Perhaps less often considered is that Date, based on system time, isn't ideal for real user monitoring either. Most systems run a daemon which regularly synchronizes the time. It is common for the clock to be tweaked a few milliseconds every 15-20 minutes. At that rate about 1% of 10 second intervals measured would be inaccurate.

Use Cases

There are a few situations where you'd use this high resolution timer instead of grabbing a basic timestamp:

  • benchmarking
  • game or animation runloop code
  • calculating framerate with precision
  • cueing actions or audio to occur at specific points in an animation or other time-based sequence

Availability

The high resolution timer is currently available in Chrome (Stable) as window.performance.webkitNow(), and this value is generally equal to the new argument value passed into the requestAnimationFrame callback. Pretty soon, WebKit will drop its prefix and this will be available through performance.now(). The WebPerfWG in particular, led by Jatinder Mann of Microsoft, has been very successful in unprefixing their features quite quickly.

In summary, performance.now() is...

  • a double with microseconds in the fractional
  • relative to the navigationStart of the page rather than to the UNIX epoch
  • not skewed when the system time changes
  • available in Chrome stable, Firefox 15+, and IE10.

How to convert ArrayBuffer to and from String

By Renato Mangini at

ArrayBuffers are used to transport raw data and several new APIs rely on them, including WebSockets, Web Intents, XMLHttpRequest version 2 and WebWorkers. However, because they recently landed in the JavaScript world, sometimes they are misinterpreted or misused.

Semantically, an ArrayBuffer is simply an array of bytes viewed through a specific mask. This mask, an instance of ArrayBufferView, defines how bytes are aligned to match the expected structure of the content. For example, if you know that the bytes in an ArrayBuffer represent an array of 16-bit unsigned integers, you just wrap the ArrayBuffer in a Uint16Array view and you can manipulate its elements using the brackets syntax as if the Uint16Array was an integer array:

       // suppose buf contains the bytes [0x02, 0x01, 0x03, 0x07]
       // notice the multibyte values respect the hardware endianess, which is little-endian in x86
       var bufView = new Uint16Array(buf);
       if (bufView[0]===258) {   // 258 === 0x0102
         console.log("ok");
       }
       bufView[0] = 255;    // buf now contains the bytes [0xFF, 0x00, 0x03, 0x07]
       bufView[0] = 0xff05; // buf now contains the bytes [0x05, 0xFF, 0x03, 0x07]
       bufView[1] = 0x0210; // buf now contains the bytes [0x05, 0xFF, 0x10, 0x02]
   

One common practical question about ArrayBuffer is how to convert a String to an ArrayBuffer and vice-versa. Since an ArrayBuffer is, in fact, a byte array, this conversion requires that both ends agree on how to represent the characters in the String as bytes. You probably have seen this "agreement" before: it is the String's character encoding (and the usual "agreement terms" are, for example, Unicode UTF-16 and iso8859-1). Thus, supposing you and the other party have agreed on the UTF-16 encoding, the conversion code could be something like:

     function ab2str(buf) {
       return String.fromCharCode.apply(null, new Uint16Array(buf));
     }
    function str2ab(str) {
       var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
       var bufView = new Uint16Array(buf);
       for (var i=0, strLen=str.length; i<strLen; i++) {
         bufView[i] = str.charCodeAt(i);
       }
       return buf;
     }
   

Note the use of Uint16Array. This is an ArrayBuffer view that aligns bytes of the ArrayBuffers as 16-bit elements. It doesn't handle the character encoding itself, which is handled as Unicode by String.fromCharCode and str.charCodeAt.

Note: A robust implementation of the String to ArrayBuffer conversion capable of handling more encodings is provided by the stringencoding library. But, for simple usage where you control both sides of the communication pipe, the code above is probably enough. A standardized API specification for String encoding is being drafted by the WHATWG working group.

A popular StackOverflow question about this has a highly voted answer with a somewhat convoluted solution to the conversion: create a FileReader to act as a converter and feed a Blob containing the String into it. Although this method works, it has poor readability and I suspect it is slow. Since unfounded suspicions have driven many mistakes in the history of humanity, let's take a more scientific approach here. I have jsperf'ed the two methods and the result confirms my suspicion:

In Chrome 20, it is almost 27 times faster to use the direct ArrayBuffer manipulation code on this article than it is to use the FileReader/Blob method.

requestAnimationFrame API: now with sub-millisecond precision

By Paul Irish at

If you've been using requestAnimationFrame you've enjoyed seeing your paints synchronized to the refresh rate of the screen, resulting in the most high-fidelity animations possible. Plus, you're saving your users CPU fan noise and battery-power when they switch to another tab.

There is about to be a change to part of the API, however. The Timestamp that is passed into your callback function is changing from a typical Date.now()-like timestamp to a high-resolution measurement of floating point milliseconds since the page was opened. If you use this value, you will need to update your code, based on the explanation below.

Just to be clear, here is what I'm talking about:

   // assuming requestAnimationFrame method has been normalized for all vendor prefixes..
   requestAnimationFrame(function(timestamp){
       // the value of timestamp is changing
   });
   

If you're using the common requestAnimFrame shim provided here, then you're not using the timestamp value. You're off the hook. :)

Why

Why? Well rAF helps you get the ultimate 60 fps that is ideal, and 60 fps translates to 16.7ms per frame. But measuring with integer milliseconds means we have a precision of 1/16 for everything we want to observe and target.

As you can see above, the blue bar represents the maximum amount of time you have to do all your work before you paint a new frame (at 60fps). You're probably doing more than 16 things, but with integer milliseconds you only have the ability to schedule and measure in those very chunky increments. That's not good enough.

The High Resolution Timer solves this by providing a far more precise figure:

   Date.now()         //  1337376068250
   performance.now()  //  20303.427000007
   

The high resolution timer is currently available in Chrome as window.performance.webkitNow(), and this value is generally equal to the new argument value passed into the rAF callback. Once the spec progresses through standards further, the method will drop the prefix and be available through performance.now().

You'll also notice the two above values are many orders of magnitude different. performance.now() is a measurement of floating point milliseconds since that particular page started to load (the performance.navigationStart to be specific).

In use

The key issue that crops is animation libraries that use this design pattern:

   function MyAnimation(duration) {
      this.startTime = Date.now();
      this.duration = duration;
      requestAnimFrame(this.tick.bind(this));
   }
   MyAnimation.prototype.tick = function(time) {
      var now = Date.now();
      if (time > now) {
        this.dispatchEvent("ended");
        return;
      }
       ...
     requestAnimFrame(this.tick.bind(this));
   }
   

An edit to fix this is pretty easy... augment the startTime and now to use window.performance.now().

   this.startTime = window.performance.now ?
                    (performance.now() + performance.timing.navigationStart) : 
                    Date.now();
   

This is a fairly naive implementation, it doesn't use a prefixed now() method and also assumes Date.now() support, which isn't in IE8.

Feature detection

If you're not using the pattern above and just want to identify which sort of callback value you're getting you can use this technique:

   requestAnimationFrame(function(timestamp){

if (timestamp < 1e12){ // .. high resolution timer } else { // integer milliseconds since unix epoch }

// ...

Checking if (timestamp < 1e12) is a quick duck test to see how big of a number we're dealing with. Technically it could false positive but only if a webpage is open continuously for 30 years. But we're not able to test if it's a floating point number (rather than floored to an integer). Ask for enough high resolution timers and you're bound to get integer values at some point.


We plan on pushing this change out in Chrome 21, so if you're already taking advantage of this callback parameter, be sure to update your code!

Big boost to DOM performance - WebKit's innerHTML is 240% faster

By Sam Dutton at

We're very happy to see that some common DOM operations have just skyrocketed in speed. The changes were at the WebKit level, boosting performance for both Safari (JavaScriptCore) and Chrome (V8).

Chrome Engineer Kentaro Hara made seven code optimisations within WebKit; below are the results, which show just how much faster JavaScript DOM access has become:

DOM performance boosts summary

Below, Kentaro Hara gives details on some of the patches he made. The links are to WebKit bugs with test cases, so you can try out the tests for yourself. The changes were made between WebKit r109829 and r111133: Chrome 17 does not include them; Chrome 19 does.

Improve performance of div.innerHTML and div.outerHTML by 2.4x (V8, JavaScriptCore)

Previous behavior in WebKit:

  1. Create a string for each tag.
  2. Append a created string to Vector<string>, parsing the DOM tree.
  3. After the parsing, allocate a string whose size is the sum of all strings in the Vector<string>.
  4. Concatenate all strings in Vector<string>, and return it as innerHTML.

New behavior in WebKit:

  1. Allocate one string, say S.
  2. Concatenate a string for each tag to S, incrementally parsing the DOM tree.
  3. Return S as innerHTML.

In a nutshell, instead of creating a lot of strings and then concatenating them, the patch creates one string and then simply append strings incrementally.

Improve performance of div.innerText and div.outerText in Chromium/Mac by 4x (V8/Mac)

The patch just changed the initial buffer size for creating innerText. Changing the initial buffer size from 2^16 to 2^15 improved Chromium/Mac performance by 4x. This difference depends on the underlying malloc system.

Improve performance of CSS property accesses in JavaScriptCore by 35%

(Note: This is a change for Safari, not for Chrome.)

A CSS property string (e.g. .fontWeight, .backgroundColor) is converted to an integer ID in WebKit. This conversion is heavy. The patch caches the conversion results in a map (i.e. a property string => an integer ID), so that the conversion won't be conducted multiple times.

How do the tests work?

They measure the time of property accesses. In case of innerHTML (the performance test in bugs.webkit.org/show_bug.cgi?id=81214), the test just measures the time to run the following code:

for (var i = 0; i < 1000000; i++)
    document.body.innerHTML;

The performance test uses a large body copied from the HTML spec.

Similarly, the CSS property-accesses test measures the time of the following code:

var spanStyle = span.style;
for (var i = 0; i < 1000000; i++) {
    spanStyle.invalidFontWeight;
    spanStyle.invalidColor;
    spanStyle.invalidBackgroundColor;
    spanStyle.invalidDisplay;
}

The good news is that Kentaro Hara believes more performance improvements will be possible for other important DOM attributes and methods.

Bring it on!

Kudos to Haraken and the rest of the team.

Round-up of Web Browser Internals Resources

By Paul Irish at

In many cases, we treat web browsers as a black box. But as we gain a better understanding of how they work, we not only recognize where to make smart optimizations but also we push them farther.

The links below capture most of the resources that explain the innerworkings of web browsers.

Thanks Codrops for the fanciness. Thank you Anthony Ricaud for the resources.

If you know of other browser internals posts to capture, link them in the comments!

Optimizing JavaScript

By Seth Ladd at

JavaScript is relatively fast, but it can always go faster. Read more about how to optimize your JavaScript for performance.

How to write low garbage real-time JavaScript from Scirra, the HTML5 game making tool.

Optimizing for V8 – Introduction is written by Florian Loitsch, engineer on Dart’s JavaScript generation.

Optimizing for V8 – Inling, Deoptimizations is part 2 in Florian’s series.

From Console to Chrome – HTML5 and JavaScript for game developers from Lilli Thompson, Chrome Games engineer.

Transferable Objects: Lightning Fast!

By Eric Bidelman at

Chrome 13 introduced sending ArrayBuffers to/from a Web Worker using an algorithm called structured cloning. This allowed the postMessage() API to accept messages that were not just strings, but complex types like File, Blob, ArrayBuffer, and JSON objects. Structured cloning is also supported in later versions of Firefox.

Faster is better

Structured cloning is great, but it's still a copy operation. The overhead of passing a 32MB ArrayBuffer to a Worker can be hundreds of milliseconds. New versions of Chrome contain a huge performance improvement for message passing, called Transferable Objects.

With transferable objects, data is transferred from one context to another. It is zero-copy, which vastly improves the performance of sending data to a Worker. Think of it as pass-by-reference if you're from the C/C++ world. However, unlike pass-by-reference, the 'version' from the calling context is no longer available once transferred to the new context. For example, when transferring an ArrayBuffer from your main app to Worker, the original ArrayBuffer is cleared and no longer usable. Its contents are (quiet literally) transferred to the Worker context.

To play with transferables, there's a new version of postMessage() in Chrome/V8 that supports transferable objects:

worker.webkitPostMessage(arrayBuffer, [arrayBuffer]);
window.webkitPostMessage(arrayBuffer, targetOrigin, [arrayBuffer]);

For the worker case, the first argument is the ArrayBuffer message. The second argument is a list of items that should be transferred.

Benchmark demo

To see the performance gains of transferrables, I've put together a demo.

The demo sends a 32MB ArrayBuffer to a worker and back using webkitPostMessage(). If your browser doesn't support transferables, the sample falls back to structured cloning. Averaging 5 runs in different browsers, here's what I got:

On a MacBook Pro/10.6.8/2.53 GHz/Intel Core 2 Duo, FF was the fastest using structured cloning. On average, it took 302ms to send the 32MB ArrayBuffer to a worker and post it back to the main thread (RRT - Round Trip Time). Comparing that with transferables, the same test took 6.6ms. That is a huge perf boost!

Having these kinds of speeds allows massive WebGL textures/meshes to be seamlessly passed between a Worker and main app.

Feature detecting

Feature detecting is a bit tricky with this one. My recommendation is to send a small ArrayBuffer to your worker. If the buffer is transferred and not copied, its .byteLength will go to 0:

worker.postMessage = worker.webkitPostMessage || worker.postMessage;

var ab = new ArrayBuffer(1);
worker.postMessage(ab, [ab]);
if (ab.byteLength) {
  alert('Transferables are not supported in your browser!');
} else {
  // Transferables are supported.
}

Support: Currently Chrome 17+

Updated (2011-12-13): Code snippet to show webkitPostMessage() signature is different for window and worker.

Use mediump precision in WebGL when possible

By Ilmari Heikkinen at

Heads-up from our friends at Opera, who have been testing WebGL on actual OpenGL ES 2.0 hardware: many demos and applications use highp precision in fragment shaders when it’s not really warranted.

Highp in fragment shaders is an optional part of the OpenGL ES 2.0 spec, so not all hardware supports it (and even when they do, there may be a performance hit). Using mediump will usually be good enough and it will ensure that your applications will work on mobile devices as well.

In practice, if your fragment shader previously started with

precision highp float;

Changing it to the following should do the trick:
precision mediump float; // or lowp