HTML5 Rocks

HTML5 Rocks

Web Audio live audio input - now on Android!

By Chris Wilson at

One of the consistent questions I keep fielding over on Stack Overflow is “why doesn’t audio input work?” – to which the answer kept turning out to be “because you’re testing on Android, and we don’t have it hooked up yet.”

Well, I’m happy to announce that the Beta version of Chrome for Android (v31.0.1650+) has support for audio input in Web Audio! Check out my Audio Recorder demo on an Android device with Chrome Beta. We’re still testing it out with all devices; I’ve personally tested on a Nexus 4, a Galaxy S4 and a Nexus 7, but if you run into issues on other devices, please file them.

When I saw the support get checked in, I flipped back through some of the audio input demos I’ve done in the past to find a good demo to show it off. I quickly found that my Audio Recorder demo functioned well on mobile, but it wasn’t really designed for a good user experience on mobile devices.

So, I quickly used the skills I’m teaching in the upcoming Mobile Web Development course to whip it into shape – viewport, media queries, and flexbox to the rescue! Be sure to preregister for the course if you’re interested in taking your web development skills to the mobile world, too!

Voice Driven Web Apps: Introduction to the Web Speech API

By Glen Shires at

The new JavaScript Web Speech API makes it easy to add speech recognition to your web pages. This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later. Here's an example with the recognized text appearing almost immediately while speaking.

DEMO / SOURCE

Let’s take a look under the hood. First we check to see if the browser supports the Web Speech API by checking if the webkitSpeechRecognition object exists. If not, we suggest the user upgrades his browser. (Since the API is still experimental, it's currently vendor prefixed.) Lastly, we create the webkitSpeechRecognition object which provides the speech interface, and set some of its attributes and event handlers.

if (!('webkitSpeechRecognition' in window)) {
  upgrade();
} else {
  var recognition = new webkitSpeechRecognition();
  recognition.continuous = true;
  recognition.interimResults = true;

  recognition.onstart = function() { ... }
  recognition.onresult = function(event) { ... }
  recognition.onerror = function(event) { ... }
  recognition.onend = function() { ... }
  ...

The default value for continuous is false, meaning that when the user stops talking, speech recognition will end. This mode is great for simple text like short input fields. In this demo, we set it to true, so that recognition will continue even if the user pauses while speaking.

The default value for interimResults is false, meaning that the only results returned by the recognizer are final and will not change. The demo sets it to true so we get early, interim results that may change. Watch the demo carefully, the grey text is the text that is interim and does sometimes change, whereas the black text are responses from the recognizer that are marked final and will not change.

To get started, the user clicks on the microphone button, which triggers this code:

function startButton(event) {
  ...
  final_transcript = '';
  recognition.lang = select_dialect.value;
  recognition.start();

We set the spoken language for the speech recognizer "lang" to the BCP-47 value that the user has selected via the selection drop-down list, for example “en-US” for English-United States. If this is not set, it defaults to the lang of the HTML document root element and hierarchy. Chrome speech recognition supports numerous languages (see the “langs” table in the demo source), as well as some right-to-left languages that are not included in this demo, such as he-IL and ar-EG.

After setting the language, we call recognition.start() to activate the speech recognizer. Once it begins capturing audio, it calls the onstart event handler, and then for each new set of results, it calls the onresult event handler.

  recognition.onresult = function(event) {
    var interim_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        final_transcript += event.results[i][0].transcript;
      } else {
        interim_transcript += event.results[i][0].transcript;
      }
    }
    final_transcript = capitalize(final_transcript);
    final_span.innerHTML = linebreak(final_transcript);
    interim_span.innerHTML = linebreak(interim_transcript);
  };
}

This handler concatenates all the results received so far into two strings: final_transcript and interim_transcript. The resulting strings may include "\n", such as when the user speaks “new paragraph”, so we use the linebreak function to convert these to HTML tags <br> or <p>. Finally it sets these strings as the innerHTML of their corresponding <span> elements: final_span which is styled with black text, and interim_span which is styled with gray text.

interim_transcript is a local variable, and is completely rebuilt each time this event is called because it’s possible that all interim results have changed since the last onresult event. We could do the same for final_transcript simply by starting the for loop at 0. However, because final text never changes, we’ve made the code here a bit more efficient by making final_transcript a global, so that this event can start the for loop at event.resultIndex and only append any new final text.

That’s it! The rest of the code is there just to make everything look pretty. It maintains state, shows the user some informative messages, and swaps the GIF image on the microphone button between the static microphone, the mic-slash image, and mic-animate with the pulsating red dot.

The mic-slash image is shown when recognition.start() is called, and then replaced with mic-animate when onstart fires. Typically this happens so quickly that the slash is not noticeable, but the first time speech recognition is used, Chrome needs to ask the user for permission to use the microphone, in which case onstart only fires when and if the user allows permission. Pages hosted on HTTPS do not need to ask repeatedly for permission, whereas HTTP hosted pages do.

So make your web pages come alive by enabling them to listen to your users!

We’d love to hear your feedback...

Live Web Audio Input Enabled!

By Chris Wilson at

I'm really excited by a new feature that went in to yesterday's Chrome Canary build (23.0.1270.0) - the ability to get low-latency access to live audio from a microphone or other audio input on OSX! (This has not yet been enabled on Windows - but don't worry, we're working on it!)

[UPDATE Oct 8, 2012: live audio input is now enabled for Windows, as long as the input and output device are using the same sample rate!]

To enable this, you need to go into chrome://flags/ and enable the "Web Audio Input" item near the bottom, and relaunch the browser; now you're ready to roll! Note: If you're using a microphone, you may need to use headphones for any output in order to avoid feedback. If you are using a different audio source, such as a guitar or external audio feed, or there's no audio output from the demo, this may not be a problem. You can test out live audio input by checking out the spectrum of your input using the live input visualizer.

For those Web Audio coders among you, here's how to request the audio input stream, and get a node to connect to any processing graph you like!

// success callback when requesting audio input stream
function gotStream(stream) {
    window.AudioContext = window.AudioContext || window.webkitAudioContext;
    var audioContext = new AudioContext();

    // Create an AudioNode from the stream.
    var mediaStreamSource = audioContext.createMediaStreamSource( stream );

    // Connect it to the destination to hear yourself (or any other node for processing!)
    mediaStreamSource.connect( audioContext.destination );
}

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia;
navigator.getUserMedia( {audio:true}, gotStream );

There are many rich possibilities for low-latency audio input, particularly in the musical space. You can see a quick example of how to make use of this in a simple pitch detector I threw together - try plugging in a guitar, or even just whistling into the microphone.

And, as promised, I've added live audio as an input source to the Vocoder I wrote for Google IO - just select "live input" under modulator. You may need to adjust the Modulator Gain and the Synth Level. There's a slight lag due to processing (not due to input latency). Now that I have live audio input, it's time for another round of tweaking!

Finally, you may want to take a look at the collection of my web audio demos - by the time you read this, I may have some more live audio demos up!