HTML5 Rocks

HTML5 Rocks

Easier ArrayBuffer <-> String conversion with the Encoding API

By Jeff Posnick at

Over two years ago, Renato Mangini described a method for converting between raw ArrayBuffers and the corresponding string representation of that data. At the end of the post, Renato mentioned that an official standardized API to handle the conversion was in the process of being drafted. The specification has now matured, and both Firefox and Google Chrome have added native support for the TextDecoder and TextEncoder interfaces.

As demonstrated by this live sample, excerpted below, the Encoding API makes it simple to translate between raw bytes and native JavaScript strings, regardless of which of the many standard encodings you need to work with.

<pre id="results"></pre>

<script>
  if ('TextDecoder' in window) {
    // The local files to be fetched, mapped to the encoding that they're using.
    var filesToEncoding = {
      'utf8.bin': 'utf-8',
      'utf16le.bin': 'utf-16le',
      'macintosh.bin': 'macintosh'
    };

    Object.keys(filesToEncoding).forEach(function(file) {
      fetchAndDecode(file, filesToEncoding[file]);
    });
  } else {
    document.querySelector('#results').textContent = 'Your browser does not support the Encoding API.'
  }

  // Use XHR to fetch `file` and interpret its contents as being encoded with `encoding`.
  function fetchAndDecode(file, encoding) {
    var xhr = new XMLHttpRequest();
    xhr.open('GET', file);
    // Using 'arraybuffer' as the responseType ensures that the raw data is returned,
    // rather than letting XMLHttpRequest decode the data first.
    xhr.responseType = 'arraybuffer';
    xhr.onload = function() {
      if (this.status == 200) {
        // The decode() method takes a DataView as a parameter, which is a wrapper on top of the ArrayBuffer.
        var dataView = new DataView(this.response);
        // The TextDecoder interface is documented at http://encoding.spec.whatwg.org/#interface-textdecoder
        var decoder = new TextDecoder(encoding);
        var decodedString = decoder.decode(dataView);
        // Add the decoded file's text to the <pre> element on the page.
        document.querySelector('#results').textContent += decodedString + '\n';
      } else {
        console.error('Error while requesting', file, this);
      }
    };
    xhr.send();
  }
</script>

The sample above uses feature detection to determine whether the required TextDecoder interface is available in the current browser, and displays an error message if it’s not. In a real application, you would normally want to fall back on an alternative implementation if native support isn’t available. Fortunately, the text-encoding library that Renato mentioned in his original article is still a good choice. The library uses the native methods on browsers that support them, and offers polyfills for the Encoding API on browsers that haven’t yet added support.

Update, September 2014: Modified the sample to illustrate checking whether the Encoding API is available in the current browser.

How to convert ArrayBuffer to and from String

By Renato Mangini at

ArrayBuffers are used to transport raw data and several new APIs rely on them, including WebSockets, Web Intents, XMLHttpRequest version 2 and WebWorkers. However, because they recently landed in the JavaScript world, sometimes they are misinterpreted or misused.

Semantically, an ArrayBuffer is simply an array of bytes viewed through a specific mask. This mask, an instance of ArrayBufferView, defines how bytes are aligned to match the expected structure of the content. For example, if you know that the bytes in an ArrayBuffer represent an array of 16-bit unsigned integers, you just wrap the ArrayBuffer in a Uint16Array view and you can manipulate its elements using the brackets syntax as if the Uint16Array was an integer array:

       // suppose buf contains the bytes [0x02, 0x01, 0x03, 0x07]
       // notice the multibyte values respect the hardware endianess, which is little-endian in x86
       var bufView = new Uint16Array(buf);
       if (bufView[0]===258) {   // 258 === 0x0102
         console.log("ok");
       }
       bufView[0] = 255;    // buf now contains the bytes [0xFF, 0x00, 0x03, 0x07]
       bufView[0] = 0xff05; // buf now contains the bytes [0x05, 0xFF, 0x03, 0x07]
       bufView[1] = 0x0210; // buf now contains the bytes [0x05, 0xFF, 0x10, 0x02]
   

One common practical question about ArrayBuffer is how to convert a String to an ArrayBuffer and vice-versa. Since an ArrayBuffer is, in fact, a byte array, this conversion requires that both ends agree on how to represent the characters in the String as bytes. You probably have seen this "agreement" before: it is the String's character encoding (and the usual "agreement terms" are, for example, Unicode UTF-16 and iso8859-1). Thus, supposing you and the other party have agreed on the UTF-16 encoding, the conversion code could be something like:

     function ab2str(buf) {
       return String.fromCharCode.apply(null, new Uint16Array(buf));
     }
    function str2ab(str) {
       var buf = new ArrayBuffer(str.length*2); // 2 bytes for each char
       var bufView = new Uint16Array(buf);
       for (var i=0, strLen=str.length; i<strLen; i++) {
         bufView[i] = str.charCodeAt(i);
       }
       return buf;
     }
   

Note the use of Uint16Array. This is an ArrayBuffer view that aligns bytes of the ArrayBuffers as 16-bit elements. It doesn't handle the character encoding itself, which is handled as Unicode by String.fromCharCode and str.charCodeAt.

Note: A robust implementation of the String to ArrayBuffer conversion capable of handling more encodings is provided by the stringencoding library. But, for simple usage where you control both sides of the communication pipe, the code above is probably enough. A standardized API specification for String encoding is being drafted by the WHATWG working group.

A popular StackOverflow question about this has a highly voted answer with a somewhat convoluted solution to the conversion: create a FileReader to act as a converter and feed a Blob containing the String into it. Although this method works, it has poor readability and I suspect it is slow. Since unfounded suspicions have driven many mistakes in the history of humanity, let's take a more scientific approach here. I have jsperf'ed the two methods and the result confirms my suspicion:

In Chrome 20, it is almost 27 times faster to use the direct ArrayBuffer manipulation code on this article than it is to use the FileReader/Blob method.

Update, August 2014: The Encoding API specification has matured, and a number of browsers now support it natively. The information in this article still applies for browsers that don’t yet support the Encoding API, but the recommended approach is to use the official API wherever possible. See Easier ArrayBuffer <-> String conversion with the Encoding API for more details.