HTML5 Rocks

HTML5 Rocks

Web apps that talk - Introduction to the Speech Synthesis API

By Eric Bidelman at

The Web Speech API adds voice recognition (speech to text) and speech synthesis (text to speech) to JavaScript. The post briefly covers the latter, as the API recently landed in Chrome 33 (mobile and desktop). If you're interested in speech recognition, Glen Shires had a great writeup a while back on the voice recognition feature, "Voice Driven Web Apps: Introduction to the Web Speech API".

Basics

The most basic use of the synthesis API is to pass the speechSynthesis.speak() and utterance:

var msg = new SpeechSynthesisUtterance('Hello World');
window.speechSynthesis.speak(msg);

Try it!

However, you can also alter parameters to effect the volume, speech rate, pitch, voice, and language:

var msg = new SpeechSynthesisUtterance();
var voices = window.speechSynthesis.getVoices();
msg.voice = voices[10]; // Note: some voices don't support altering params
msg.voiceURI = 'native';
msg.volume = 1; // 0 to 1
msg.rate = 1; // 0.1 to 10
msg.pitch = 2; //0 to 2
msg.text = 'Hello World';
msg.lang = 'en-US';

msg.onend = function(e) {
  console.log('Finished in ' + event.elapsedTime + ' seconds.');
};

speechSynthesis.speak(msg);

Setting a voice

The API also allows you to get a list of voice the engine supports:

speechSynthesis.getVoices().forEach(function(voice) {
  console.log(voice.name, voice.default ? '(default)' :'');
});

Then set a different voice, by setting .voice on the utterance object:

var msg = new SpeechSynthesisUtterance('I see dead people!');
msg.voice = speechSynthesis.getVoices().filter(function(voice) { return voice.name == 'Whisper'; })[0];
speechSynthesis.speak(msg);

Demo

In my Google I/O 2013 talk, "More Awesome Web: features you've always wanted" (www.moreawesomeweb.com), I showed a Google Now/Siri-like demo of using the Web Speech API's SpeechRecognition service with the Google Translate API to auto-translate microphone input into another language:

DEMO: http://www.moreawesomeweb.com/demos/speech_translate.html

Unfortunately, it used an undocumented (and unofficial API) to perform the speech synthesis. Well now we have the full Web Speech API to speak back the translation! I've updated the demo to use the synthesis API.

Browser Support

Chrome 33 has full support for the Web Speech API, while Safari for iOS7 has partial support.

Feature detection

Since browsers may support each portion of the Web Speech API separately (e.g. the case with Chromium), you may want to feature detect each feature separately:

if ('speechSynthesis' in window) {
 // Synthesis support. Make your web apps talk!
}

if ('SpeechRecognition' in window) {
  // Speech recognition support. Talk to your apps!
}

What's the CSS :scope pseudo-class for?

By Eric Bidelman at

:scope is defined in CSS Selectors 4 as:

A pseudo-class which represents any element that is in the contextual reference element set. This is is a (potentially empty) explicitly-specified set of elements, such as that specified by the querySelector(), or the parent element of a <style scoped> element, which is used to "scope" a selector so that it only matches within a subtree.

An example of using this guy is within a <style scoped> (more info):

<style>
  li {
    color: blue;
  }
</style>

<ul>
  <style scoped>
    li {
      color: red;
    } 
    :scope {
      border: 1px solid red;
    }
  </style>
  <li>abc</li>
  <li>def</li>
  <li>efg</li>
</ul>

<ul>
  <li>hij</li>
  <li>klm</li>
  <li>nop</li>
</ul>

Note: <style scoped> can be enabled in Chrome using the "Enable experimental WebKit features" flag in about:flags.

This colors the li elements in the first ul red and, because of the :scope rule, puts a border around the ul. That's because in the context of this <style scoped>, the ul matches :scope. It's the local context. If we were to add a :scope rule in the outer <style> it would match the entire document. Essentially, equivalent to :root.

Contextual elements

You're probably aware of the Element version of querySelector() and querySelectorAll(). Instead of querying the entire document, you can restrict the result set to a contextual element:

<ul>
 <li id="scope"><a>abc</a></li>
 <li>def</li>
 <li><a>efg</a></li>
</ul>
<script>
  document.querySelectorAll('ul a').length; // 2

  var scope = document.querySelector('#scope');
  scope.querySelectorAll('a').length; // 1
</script>

When these are called, the browser returns a NodeList that's filtered to only include the set of nodes that a.) match the selector and b.) which are also descendants of the context element. So in the the second example, the browser finds all a elements, then filters out the ones not in the scope element. This works, but it can lead to some bizarre behavior if you're not careful. Read on.

When querySelector goes wrong

There's a really important point in the Selectors spec that people often overlook. Even when querySelector[All]() is invoked on an element, selectors still evaluate in the context of the entire document. This means unanticipated things can happen:

scope.querySelectorAll('ul a').length); // 1
scope.querySelectorAll('body ul a').length); // 1

WTF! In the first example, ul is my element, yet I'm still able to use it and matches nodes. In the second, body isn't even a descendant of my element, but "body ul a" still matches. Both of these are confusing and not what you'd expect.

It's worth making the comparison to jQuery here, which takes the right approach and does what you'd expect:

$(scope).find('ul a').length // 0
$(scope).find('body ul a').length // 0

...enter :scope to solve these semantic shenanigans.

Fixing querySelector with :scope

WebKit recently landed support for using the :scope pseudo-class in querySelector[All](). You can test it in Chrome Canary 27.

You can use it restrict selectors to a context element. Let's see an example. In the following, :scope is used to "scope" the selector to the scope element's subtree. That's right, I said scope three times!

scope.querySelectorAll(':scope ul a').length); // 0
scope.querySelectorAll(':scope body ul a').length); // 0
scope.querySelectorAll(':scope a').length); // 1

Using :scope makes the semantics of the querySelector() methods a little more predictable and inline with what others like jQuery are already doing.

Performance win?

Not yet :(

I was curious if using :scope in qS/qSA gives a performance boost. So...like a good engineer I threw together a test. My rationale: less surface area for the browser to do selector matching means speedier lookups.

In my experiment, WebKit currently takes ~1.5-2x longer than not using :scope. Drats! When crbug.com/222028 gets fixed, using it should theoretically give you a slight performance boost over not using it.

Voice Driven Web Apps: Introduction to the Web Speech API

By Glen Shires at

The new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later. Here's an example with the recognized text appearing almost immediately while speaking.

DEMO / SOURCE

Let’s take a look under the hood. First we check to see if the browser supports the Web Speech API by checking if the webkitSpeechRecognition object exists. If not, we suggest the user upgrades his browser. (Since the API is still experimental, it's currently vendor prefixed.) Lastly, we create the webkitSpeechRecognition object which provides the speech interface, and set some of its attributes and event handlers.

if (!('webkitSpeechRecognition' in window)) {
  upgrade();
} else {
  var recognition = new webkitSpeechRecognition();
  recognition.continuous = true;
  recognition.interimResults = true;

  recognition.onstart = function() { ... }
  recognition.onresult = function(event) { ... }
  recognition.onerror = function(event) { ... }
  recognition.onend = function() { ... }
  ...

The default value for continuous is false, meaning that when the user stops talking, speech recognition will end. This mode is great for simple text like short input fields. In this demo, we set it to true, so that recognition will continue even if the user pauses while speaking.

The default value for interimResults is false, meaning that the only results returned by the recognizer are final and will not change. The demo sets it to true so we get early, interim results that may change. Watch the demo carefully, the grey text is the text that is interim and does sometimes change, whereas the black text are responses from the recognizer that are marked final and will not change.

To get started, the user clicks on the microphone button, which triggers this code:

function startButton(event) {
  ...
  final_transcript = '';
  recognition.lang = select_dialect.value;
  recognition.start();

We set the spoken language for the speech recognizer "lang" to the BCP-47 value that the user has selected via the selection drop-down list, for example “en-US” for English-United States. If this is not set, it defaults to the lang of the HTML document root element and hierarchy. Chrome speech recognition supports numerous languages (see the “langs” table in the demo source), as well as some right-to-left languages that are not included in this demo, such as he-IL and ar-EG.

After setting the language, we call recognition.start() to activate the speech recognizer. Once it begins capturing audio, it calls the onstart event handler, and then for each new set of results, it calls the onresult event handler.

  recognition.onresult = function(event) {
    var interim_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        final_transcript += event.results[i][0].transcript;
      } else {
        interim_transcript += event.results[i][0].transcript;
      }
    }
    final_transcript = capitalize(final_transcript);
    final_span.innerHTML = linebreak(final_transcript);
    interim_span.innerHTML = linebreak(interim_transcript);
  };
}

This handler concatenates all the results received so far into two strings: final_transcript and interim_transcript. The resulting strings may include "\n", such as when the user speaks “new paragraph”, so we use the linebreak function to convert these to HTML tags <br> or <p>. Finally it sets these strings as the innerHTML of their corresponding <span> elements: final_span which is styled with black text, and interim_span which is styled with gray text.

interim_transcript is a local variable, and is completely rebuilt each time this event is called because it’s possible that all interim results have changed since the last onresult event. We could do the same for final_transcript simply by starting the for loop at 0. However, because final text never changes, we’ve made the code here a bit more efficient by making final_transcript a global, so that this event can start the for loop at event.resultIndex and only append any new final text.

That’s it! The rest of the code is there just to make everything look pretty. It maintains state, shows the user some informative messages, and swaps the GIF image on the microphone button between the static microphone, the mic-slash image, and mic-animate with the pulsating red dot.

The mic-slash image is shown when recognition.start() is called, and then replaced with mic-animate when onstart fires. Typically this happens so quickly that the slash is not noticeable, but the first time speech recognition is used, Chrome needs to ask the user for permission to use the microphone, in which case onstart only fires when and if the user allows permission. Pages hosted on HTTPS do not need to ask repeatedly for permission, whereas HTTP hosted pages do.

So make your web pages come alive by enabling them to listen to your users!

We’d love to hear your feedback...

Content Security Policy 1.0 is officially awesome.

By Mike West at

It's official! The W3C has advanced the Content Security Policy 1.0 specification from Working Draft to Candidate Recommendation, and issued a call for implementations. Cross-site scripting attacks are one step closer to being (mostly) a thing of the past.

Chrome Canary and WebKit nightlies now support the unprefixed Content-Security-Policy header, and will be using the prefixed X-WebKit-CSP header to begin experimenting with some new behavior that's being specified as part of Content Security Policy 1.1. Instead of writing:

X-WebKit-CSP: script-src 'self'; object-src 'none'

You'll write:

Content-Security-Policy: script-src 'self'; object-src 'none'

We expect other browser vendors to follow suit within the next few revisions, so it's a great idea to start sending the canonical header today.

Content Securawhat?

Content Security Policy! It helps you reduce the risk of cross-site scripting and other content injection attacks in your applications. It's a huge step forward in terms of the protection you can offer your users, and we highly recommend taking a hard look at implementing it. You can get all the details in the ever so cleverly named "An Introduction to Content Security Policy".

Live Web Audio Input Enabled!

By Chris Wilson at

I'm really excited by a new feature that went in to yesterday's Chrome Canary build (23.0.1270.0) - the ability to get low-latency access to live audio from a microphone or other audio input on OSX! (This has not yet been enabled on Windows - but don't worry, we're working on it!)

[UPDATE Oct 8, 2012: live audio input is now enabled for Windows, as long as the input and output device are using the same sample rate!]

To enable this, you need to go into chrome://flags/ and enable the "Web Audio Input" item near the bottom, and relaunch the browser; now you're ready to roll! Note: If you're using a microphone, you may need to use headphones for any output in order to avoid feedback. If you are using a different audio source, such as a guitar or external audio feed, or there's no audio output from the demo, this may not be a problem. You can test out live audio input by checking out the spectrum of your input using the live input visualizer.

For those Web Audio coders among you, here's how to request the audio input stream, and get a node to connect to any processing graph you like!

// success callback when requesting audio input stream
function gotStream(stream) {
    window.AudioContext = window.AudioContext || window.webkitAudioContext;
    var audioContext = new AudioContext();

    // Create an AudioNode from the stream.
    var mediaStreamSource = audioContext.createMediaStreamSource( stream );

    // Connect it to the destination to hear yourself (or any other node for processing!)
    mediaStreamSource.connect( audioContext.destination );
}

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia;
navigator.getUserMedia( {audio:true}, gotStream );

There are many rich possibilities for low-latency audio input, particularly in the musical space. You can see a quick example of how to make use of this in a simple pitch detector I threw together - try plugging in a guitar, or even just whistling into the microphone.

And, as promised, I've added live audio as an input source to the Vocoder I wrote for Google IO - just select "live input" under modulator. You may need to adjust the Modulator Gain and the Synth Level. There's a slight lag due to processing (not due to input latency). Now that I have live audio input, it's time for another round of tweaking!

Finally, you may want to take a look at the collection of my web audio demos - by the time you read this, I may have some more live audio demos up!

Stick your landings! position: sticky lands in WebKit

By Eric Bidelman at

position: sticky is a new way to position elements and is conceptually similar to position: fixed. The difference is that an element with position: sticky behaves like position: relative within its parent, until a given offset threshold is met in the viewport.

Use cases

Paraphrasing from Edward O’Connor's original proposal of this feature:

Many web sites have elements that alternate between being in-flow and having position: fixed, depending on the user's scroll position. This is often done for elements in a sidebar that the page author wants to be always visible as the user scrolls, but which slot into a space on the page when scrolled to the top. Good examples are news.google.com (the "Top Stories" sidebar) and yelp.com (search results map).

Introducing sticky positioning

LAUNCH DEMO

By simply adding position: sticky (vendor prefixed), we can tell an element to be position: relative until the user scrolls the item (or its parent) to be 15px from the top:

.sticky {
  position: -webkit-sticky;
  position: -moz-sticky;
  position: -ms-sticky;
  position: -o-sticky;
  top: 15px;
}

At top: 15px, the element becomes fixed.

To illustrate this feature in a practical setting, I've put together a DEMO which sticks blog titles as you scroll.

Old approach: scroll events

Until now, to achieve the sticky effect, sites setup scroll event listeners in JS. We actually use this technique as well on html5rocks tutorials. On screens smaller than 1200px, our table of contents sidebar changes to position: fixed after a certain amount of scrolling.

Here's the (now old way) to have a header that sticks to the top of the viewport when the user scrolls down, and falls back into place when the user scrolls up:

<style>
.sticky {
  position: fixed;
  top: 0;
}
.header {
  width: 100%;
  background: #F6D565;
  padding: 25px 0;
}
</style>

<div class="header"></div>

<script>
var header = document.querySelector('.header');
var origOffsetY = header.offsetTop;

function onScroll(e) {
  window.scrollY >= origOffsetY ? header.classList.add('sticky') :
                                  header.classList.remove('sticky');
}

document.addEventListener('scroll', onScroll);
</script>

Try it: http://jsbin.com/omanut/2/

This is easy enough, but this model quickly breaks down if you want to do this for a bunch of DOM nodes, say, every <h1> title of a blog as the user scrolls.

Why JS is not ideal

In general, scroll handlers are never a good idea. Folks tend to do too much work and wonder why their UI is janky.

Something else to consider is that more and more browsers are implementing hardware accelerated scrolling to improve performance. The problem with this is that on JS scroll handlers are in play, browsers may fall back into a slower (software) mode. Now we're no longer running on the GPU. Instead, we're back on the CPU. The result? User's perceive more jank when scrolling your page.

Thus, it makes a lot of sense to have such feature be declarative in CSS.

Support

Unfortunately, there isn't a spec for this one. It was proposed on www-style back in June and just landed in WebKit. That means there's no good documentation to point to. One thing to note however, according to this bug, if both left and right are specified, left wins. Likewise, if top and bottom are used at the same time, top wins.

Support right now is Chrome 23.0.1247.0+ (current Canary) and WebKit nightly.

Arrived! xhr.send(ArrayBufferViews)

By Eric Bidelman at

And here you thought we were done improving XHR!

For a while now XHR2's overloaded send() method has supported sending a ArrayBuffer (a raw byte array).

Chrome 22 (current Canary) deprecates this feature by replacing it with sending ArrayBufferViews instead. JS Typed Arrays are just special ArrayBufferViews, so all this really means is that you can now send a typed array directly across the wire without touching its underlying buffer. This change is in match step with the recent updates to the XMLHttpRequest2 spec.

So for example, instead of sending an ArrayBuffer:

var xhr = new XMLHttpRequest();
xhr.open('POST', '/server', true);
xhr.onload = function(e) { ... };

var uInt8Array = new Uint8Array([1, 2, 3]);

xhr.send(uInt8Array.buffer);

Just send the typed array itself:

xhr.send(uInt8Array);

Eventually, sending ArrayBuffers will be removed, but for the time being you'll get console warnings when trying to send a buffer.

As always, you can keep up with these types of changes by following chromestatus.com.

Chrome for Android: Accelerating the Mobile Web

By Boris Smus at

You've probably already heard that Chrome for Android Beta launched today. This new browser is based on the Chromium open source project, and brings with it many of the latest HTML5 features that Chrome developers have come to know and love. For an overview of the new hotness, see the launch announcement on blog.chromium.org and a more detailed overview on code.google.com. I'll quickly go through the stuff I personally find most interesting:

UI Improvements

Chrome for Android makes it easy for developers to create modern mobile web user interfaces using fixed positioning, and overflow: scroll for individually scrollable elements. In addition, native-like scroll behavior is enabled by default. Chrome for Android supports the old flexbox model, though be aware that the original flexbox model is deprecated in favor of a new one. Also supported are DateTime pickers, and early support for <input type="range">.

Fast graphics

Chrome for Android also supports hardware accelerated canvas, and performs quite well. There's also support for requestAnimationFrame, which is important for mobile, letting the browser decide when to render, giving it a chance to manage battery life more efficiently in GPU intensive applications. Chrome for Android introduces a slew of other notable HTML5 features including File System API, IndexedDB, Web Workers and Web Sockets.

Remote debugging

Hands down, my personal favorite feature of Chrome for Android is remote debugging through the Chrome Developer Tools. Remote debugging makes it very easy for web developers to debug their application as it's running live on their mobile device, without having to resort to clever hacks such as Weinre. Here's a quick screencast showing this feature in action:

For more information about remote debugging, see this remote debugging article.

Try Chrome for Android Beta for yourself by downloading it from the Android Market. If you've written a mobile web app to use a feature, but Chrome for Android doesn't support it, keep in mind that this is a beta release, and see if this is already a known issue, and star it if it is. Otherwise, please log a bug via new.mcrbug.com.

I'm stoked about the positive impact Chrome for Android will make on the mobile web developer community, and looking forward to see the great things we can build together! If you have additional questions, see if they are already answered in this FAQ. Otherwise, if you have a Chrome-specific mobile web development question, please post it on Stack Overflow, tagged with the google-chrome and android tags.

Pointer Lock API Brings FPS Games to the Browser

By Ilmari Heikkinen at

The Pointer Lock API recently landed in Chrome Canary and the Dev channel, all rejoice! Wait, what? You haven't heard of the Pointer Lock API? Well, in a nutshell, the Pointer Lock API makes it possible to write proper first-person shooters for the web.

The Chrome implementation lets a full-screen webpage ask your permission to capture the mouse pointer so that you can't move it outside the page. This lets web developers write 3D games and applications without having to worry about the mouse cursor moving outside of the page. When the pointer is locked, the pointer move events have movementX and movementY attributes defined that tell how much the mouse moved since the last move event. As usual with bleeding edge APIs, these attributes are vendor-prefixed, so you need to use webkitMovementX and suchlike.

To enable the Pointer Lock API in current Chrome builds, the easiest way is to go to about:flags and turn on the "Enable Pointer Lock"-flag. You can also turn it on by starting Chrome using the --enable-pointer-lock command line flag.

There are already a couple cool demos out taking advantage of this feature. Check out the Quake 3 WebGL demo by Brandon Jones to see how Pointer Lock API makes WebGL FPS games a viable prospect. Another cool demo is the Webgl Street Viewer

To get started with the Pointer Lock API, here's a small snippet cribbed from MDN:

<button onclick="document.body.webkitRequestFullScreen();">No, you lock it up!</button>
<script>
navigator.pointer = navigator.pointer || navigator.webkitPointer;

var onError = function() {
  console.log("Mouse lock was not successful.");
};

document.addEventListener('webkitfullscreenchange', function(e) {
  if (document.webkitIsFullScreen) {
    navigator.pointer.lock(document.body, function() {
      // Locked and ready to play.
    }, onError);
  }
}, false);

document.body.addEventListener('webkitpointerlocklost', function(e) {
  console.log('Pointer lock lost!');
}, false);

document.body.addEventListener('mousemove', function(e) {
  if (navigator.pointer.isLocked) { // got a locked pointer
    var movementX = e.movementX || e.webkitMovementX;
    var movementY = e.movementY || e.webkitMovementY;
  }
}, false);
</script>

You can see a fuller example at html5-demos.com. For more information, have a look at:

HTML5 <audio> and the Web Audio API are BFFs!

By Eric Bidelman at

DEMO

As part of the MediaStream Integration with WebRTC, the Web Audio API recently landed an undercover gem known as createMediaElementSource(). Basically, it allows you to hook up an HTML5 <audio> element as the input source to the API. In layman's terms...you can visualize HTML5 audio, do realtime sound mutations, filtering, etc!

Normally, the Web Audio API works by loading a song via XHR2, file input, whatever,....and you're off. Instead, this hook allows you to combine HTML5 <audio> with the visualization, filter, and processing power of the Web Audio API.

Integrating with<audio> is ideal for streaming fairly long audio assets. Say your file is 2 hours long. You don't want to decode that entire thing! It's also interesting if you want to build a high-level media player API (and UI) for play/pause/seek, but wish to apply some additional processing/analysis.

Here's what it looks like:

// Create an <audio> element dynamically.
var audio = new Audio();
audio.src = 'myfile.mp3';
audio.controls = true;
audio.autoplay = true;
document.body.appendChild(audio);

var context = new webkitAudioContext();
var analyser = context.createAnalyser();

// Wait for window.onload to fire. See crbug.com/112368
window.addEventListener('load', function(e) {
  // Our <audio> element will be the audio source.
  var source = context.createMediaElementSource(audio);
  source.connect(analyser);
  analyser.connect(context.destination);

  // ...call requestAnimationFrame() and render the analyser's output to canvas.
}, false);

As noted in the code, there's a bug that requires the source setup to happen after window.onload.

The next logical step is to fix crbub.com/112367. Once that puppy is ready, you'll be able to wire up WebRTC (the navigator.getUserMedia() API in particular) to pipe audio input (e.g mic, mixer, guitar) to an <audio> tag, then visualize it using the Web Audio API. Mega boom!