Exploring the Text-to-Speech with the Web Speech API and CodePen Example

Text-to-speech (TTS) is a really helpful technology that lets you hear written text out loud. It's great for folks who may have difficulty reading or seeing what's on a screen, and it's also used to give spoken feedback or assistance. In this article, we'll take a look at how to use TTS with JavaScript on the browser.

Check if your browser supports all Text-to-Speech features

Here is a Web Tool that lets you enter your own text and have it read aloud.

Here are the things you can do with this tool:

Check if your current browser supports the Web Speech API's Text-to-Speech Component. If it's not supported, the tool will display a message.
Enter any text that you want to be spoken.
Select a voice from the available voices.
Adjust the speech speed and pitch as per your preference.
You can start the speech, and also pause, resume, and cancel the speech.

You can find the source code of this tool on GitHub or edit it on CodePen.

Here is a simple version of this code. You can also open it on CodePen.

<textarea id="txt" rows="4" style="width: 98%;">Write your text here, and click the "Speak" button.</textarea>
<button onclick="speak()">Speak</button>

<script>
  function speak() {
    const utter = new SpeechSynthesisUtterance();
    utter.rate = 1.0; // speed: 0.1-10
    utter.volume = 1.0; // 0-1
    utter.pitch = 0.5; // 0 (low) - 2 (high)
    utter.voice = speechSynthesis.getVoices()[0];
    utter.text = document.querySelector('#txt').value;
    speechSynthesis.speak(utter);
  }
</script>

Take a look at the post if you want to find out how to check if Text-to-Speech is supported in JavaScript or get a list of browsers that support it.

How to use Text-to-Speech

We can use the speechSynthesis.speak() method with the SpeechSynthesisUtterance instance that has the text that needs to be spoken, this is the simple usage of Text-to-Speech on a browser with JavaScript. We can also set options to specify speed, voice, tone, and other features.

Here is the simple JavaScript code:

function speak() {
  const utter = new SpeechSynthesisUtterance();
  utter.text = 'This text will be spoken.';
  speechSynthesis.speak(utter);
}

What is SpeechSynthesis?

SpeechSynthesis is used to read out aloud the text on the browser. It is one of the components of the Web Speech API which has two components: SpeechSynthesis (Text-to-Speech) and SpeechRecognition (Asynchronous Speech Recognition).

We can reach the SpeechSynthesis instance with window.speechSynthesis or just speechSynthesis.

How to make text spoken using SpeechSynthesis

We can use the speak() method of the SpeechSynthesis instance to make text spoken. We pass a SpeechSynthesisUtterance instance as an argument to the speak() method. The SpeechSynthesisUtterance instance has a text parameter which contains the text that we want to be spoken. We can also pass the text value as a constructor parameter of SpeechSynthesisUtterance.

Here is an example:

speechSynthesis.speak(
  new SpeechSynthesisUtterance('This text will be spoken.')
);

How to change the voice

Here is an example code of how to change the voice:

const voices = speechSynthesis.getVoices();
utter.voice = voices[0];

Check out the article to learn more about changing the voice and solving the empty array issue of getVoices() .

How to pause and resume

We can use the pause() and resume() methods of the SpeechSynthesis to pause or resume a Text-to-Speech (TTS) process in JavaScript.

The pause() method will pause the ongoing speech, and it will hold on to where it reached. When we call the resume() method, the speech continues where it left off.

Here is an example of how to use the pause() and resume() methods on button click:

pauseButton.onclick = () => {
  speechSynthesis.pause();
};

resumeButton.onclick = () => {
  speechSynthesis.resume();
};

Note that not all browsers support the pause() method, and some browsers might have different behavior for the pause() method.

Stop or cancel ongoing speech

To halt any ongoing speech, we can use the cancel() method of the SpeechSynthesis object. This method will remove any queued or currently speaking speech and reset the SpeechSynthesis object.

cancelButton.onclick = () => {
  speechSynthesis.cancel();
};

Change the speed of speech

We can use the rate property of the SpeechSynthesisUtterance object to modify the speed at which text is spoken. This property accepts values between 0.1 and 10, with higher values indicating faster speaking speeds. The default rate is 1.0, which means that if no value is specified for rate, the text will be spoken at a normal pace.

In the following example, the speech rate is set to twice the normal speed.

function speak() {
  const utter = new SpeechSynthesisUtterance('This text will be spoken.');
  utter.rate = 2.0;
  speechSynthesis.speak(utter);
}

Reading text with different volume levels

You can adjust the volume of speech using the volume property of SpeechSynthesisUtterance. This property accepts values between 0.0 and 1.0, with 0.0 being the quietest and 1.0 being the loudest. If the volume property is not specified, the default value of 1.0 will be used.

In the following example, the volume of the speech is set to half the default value:

function speak() {
  const utter = new SpeechSynthesisUtterance('This text will be spoken.');
  utter.volume = 0.5;
  speechSynthesis.speak(utter);
}

How to set the language

We can specify the language for the text to be spoken using the lang property of SpeechSynthesisUtterance. This property accepts BCP 47 language tags. If the lang property is not set, the HTML document or user-agent language will be used.

function speak() {
  const utter = new SpeechSynthesisUtterance('This text will be spoken.');
  utter.lang = 'en-US';
  speechSynthesis.speak(utter);
}

Using high or low pitch

Using the pitch property of SpeechSynthesisUtterance we can set the pitch of the Text-to-Speech. This property accepts values between 0 (low) and 2 (high), with higher values resulting in a higher-pitched voice. If the pitch property is not set, the default pitch of the platform or voice will be used.

In the following example, the text is spoken with a lower pitch than the default pitch of the platform:

function speak() {
  const utter = new SpeechSynthesisUtterance('This text will be spoken.');
  utter.pitch = 0.5;
  speechSynthesis.speak(utter);
}

How to know if the given text has started or finished being spoken

The start event of SpeechSynthesisUtterance gets fired when it begins to speak the given text. We can use the onstart event handler property as well.

Similar to the start event, the end event of SpeechSynthesisUtterance gets fired when it finishes speaking the given text. Alternatively, we can use the onend event handler property.

const utter = new SpeechSynthesisUtterance('This text will be spoken.');

utter.addEventListener("start", (event) => {
  console.log('Start', event.utterance.text);
});

utter.addEventListener("end", (event) => {
  console.log('End', event.utterance.text);
});

We can also use onstart and onend event handlers:

utter.onstart = (event) => {
  console.log('Start', event.utterance.text);
};

utter.onend = (event) => {
  console.log('End', event.utterance.text);
};

Select the word being read aloud

We can select the word being read with the help of the charIndex property of the boundary event of SpeechSynthesisUtterance, which fires when a word is about to be spoken. This event is an instance of SpeechSynthesisEvent and has a property utterance where we can get the current text and a charIndex property which is the index position of the word. With the help of this index position, we can easily select the word.

Here is a simple example:

const utter = new SpeechSynthesisUtterance();
utter.onboundary = (event) => { 
  textarea.focus(); 
  textarea.setSelectionRange(
    event.charIndex,
    event.utterance.text.indexOf(' ', event.charIndex),
  );
};

Using Speech Synthesis Markup Language (SSML)

While the documentation says that SpeechSynthesisUtterance.text supports SSML, it can be challenging to find a browser that fully supports it. Speech Synthesis Markup Language (SSML) is an XML-based markup language that allows for more customization such as emphasis, multiple voices, inserting audio, pausing, inserting marks, and more.

In the following example, onmark event will be fired when it reaches the <mark name="mark1"/> SSML tag if SSML is supported:

const ssml = `<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                  http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
        xml:lang="en-US">
  Take a <emphasis level="strong"> deep </emphasis> breath <break/>
  then continue. <mark name="mark1"/>
  Press 1 or wait for the tone. <break time="5s"/>
  I didn't hear you! <break strength="weak"/> Please repeat.
</speak>`;

const utter = new SpeechSynthesisUtterance();
utter.text = ssml;
utter.onmark = (event) => {
  console.log(`A mark was reached: ${event.name}`);
};
speechSynthesis.speak(utter);

CodePen examples

Here is the Text-to-Speech code that you can edit and immediately check on CodePen. By default, you can enter text, select a voice, change the speed, and start the read-out. Additionally, you can pause, resume, and cancel the speech.

It is also available on GitHub.

See the Pen Text-To-Speech by Altynbek Usenbekov (@usenbekov) on CodePen.

Basic example

Here is a basic version of the Text-to-Speech code on CodePen. By default, you can enter the text and start the speech with the first available voice.

See the Pen Text-To-Speech: Simple example by Altynbek Usenbekov (@usenbekov) on CodePen.

Check if your browser supports all Text-to-Speech features#

How to use Text-to-Speech#

What is SpeechSynthesis?#

How to make text spoken using SpeechSynthesis#

How to change the voice#

How to pause and resume#

Stop or cancel ongoing speech#

Change the speed of speech#

Reading text with different volume levels#

How to set the language#

Using high or low pitch#

How to know if the given text has started or finished being spoken#

Select the word being read aloud#

Using Speech Synthesis Markup Language (SSML)#

CodePen examples#

Basic example#