Text-to-speech (TTS) is a really helpful technology that lets you hear written text out loud. It's great for folks who may have difficulty reading or seeing what's on a screen, and it's also used to give spoken feedback or assistance. In this article, we'll take a look at how to use TTS with JavaScript on the browser.
Check if your browser supports all Text-to-Speech features
Here is a Web Tool that lets you enter your own text and have it read aloud.
Here are the things you can do with this tool:
- Check if your current browser supports the Web Speech API's Text-to-Speech Component. If it's not supported, the tool will display a message.
- Enter any text that you want to be spoken.
- Select a voice from the available voices.
- Adjust the speech speed and pitch as per your preference.
- You can start the speech, and also pause, resume, and cancel the speech.
You can find the source code of this tool on GitHub or edit it on CodePen.
Here is a simple version of this code. You can also open it on CodePen.
<textarea id="txt" rows="4" style="width: 98%;">Write your text here, and click the "Speak" button.</textarea>
<button onclick="speak()">Speak</button>
<script>
function speak() {
const utter = new SpeechSynthesisUtterance();
utter.rate = 1.0; // speed: 0.1-10
utter.volume = 1.0; // 0-1
utter.pitch = 0.5; // 0 (low) - 2 (high)
utter.voice = speechSynthesis.getVoices()[0];
utter.text = document.querySelector('#txt').value;
speechSynthesis.speak(utter);
}
</script>
Take a look at the post if you want to find out how to check if Text-to-Speech is supported in JavaScript or get a list of browsers that support it.
How to use Text-to-Speech
We can use the speechSynthesis.speak()
method with the SpeechSynthesisUtterance
instance that has the text that needs to be spoken, this is the simple usage of Text-to-Speech on a browser with JavaScript. We can also set options to specify speed, voice, tone, and other features.
Here is the simple JavaScript code:
function speak() {
const utter = new SpeechSynthesisUtterance();
utter.text = 'This text will be spoken.';
speechSynthesis.speak(utter);
}
What is SpeechSynthesis?
SpeechSynthesis is used to read out aloud the text on the browser. It is one of the components of the Web Speech API which has two components: SpeechSynthesis
(Text-to-Speech) and SpeechRecognition
(Asynchronous Speech Recognition).
We can reach the SpeechSynthesis
instance with window.speechSynthesis
or just speechSynthesis
.
How to make text spoken using SpeechSynthesis
We can use the speak()
method of the SpeechSynthesis
instance to make text spoken. We pass a SpeechSynthesisUtterance
instance as an argument to the speak()
method. The SpeechSynthesisUtterance
instance has a text
parameter which contains the text that we want to be spoken. We can also pass the text value as a constructor parameter of SpeechSynthesisUtterance
.
Here is an example:
speechSynthesis.speak(
new SpeechSynthesisUtterance('This text will be spoken.')
);
How to change the voice
Here is an example code of how to change the voice:
const voices = speechSynthesis.getVoices();
utter.voice = voices[0];
Check out the article to learn more about changing the voice and solving the empty array issue of getVoices()
.
How to pause and resume
We can use the pause()
and resume()
methods of the SpeechSynthesis
to pause or resume a Text-to-Speech (TTS) process in JavaScript.
The pause()
method will pause the ongoing speech, and it will hold on to where it reached. When we call the resume()
method, the speech continues where it left off.
Here is an example of how to use the pause()
and resume()
methods on button click:
pauseButton.onclick = () => {
speechSynthesis.pause();
};
resumeButton.onclick = () => {
speechSynthesis.resume();
};
Note that not all browsers support the pause()
method, and some browsers might have different behavior for the pause()
method.
Stop or cancel ongoing speech
To halt any ongoing speech, we can use the cancel()
method of the SpeechSynthesis
object. This method will remove any queued or currently speaking speech and reset the SpeechSynthesis
object.
cancelButton.onclick = () => {
speechSynthesis.cancel();
};
Change the speed of speech
We can use the rate
property of the SpeechSynthesisUtterance
object to modify the speed at which text is spoken. This property accepts values between 0.1 and 10, with higher values indicating faster speaking speeds. The default rate
is 1.0, which means that if no value is specified for rate
, the text will be spoken at a normal pace.
In the following example, the speech rate is set to twice the normal speed.
function speak() {
const utter = new SpeechSynthesisUtterance('This text will be spoken.');
utter.rate = 2.0;
speechSynthesis.speak(utter);
}
Reading text with different volume levels
You can adjust the volume of speech using the volume
property of SpeechSynthesisUtterance
. This property accepts values between 0.0 and 1.0, with 0.0 being the quietest and 1.0 being the loudest. If the volume
property is not specified, the default value of 1.0 will be used.
In the following example, the volume of the speech is set to half the default value:
function speak() {
const utter = new SpeechSynthesisUtterance('This text will be spoken.');
utter.volume = 0.5;
speechSynthesis.speak(utter);
}
How to set the language
We can specify the language for the text to be spoken using the lang
property of SpeechSynthesisUtterance
. This property accepts BCP 47 language tags. If the lang
property is not set, the HTML document or user-agent language will be used.
function speak() {
const utter = new SpeechSynthesisUtterance('This text will be spoken.');
utter.lang = 'en-US';
speechSynthesis.speak(utter);
}
Using high or low pitch
Using the pitch
property of SpeechSynthesisUtterance
we can set the pitch of the Text-to-Speech. This property accepts values between 0 (low) and 2 (high), with higher values resulting in a higher-pitched voice. If the pitch property is not set, the default pitch of the platform or voice will be used.
In the following example, the text is spoken with a lower pitch than the default pitch of the platform:
function speak() {
const utter = new SpeechSynthesisUtterance('This text will be spoken.');
utter.pitch = 0.5;
speechSynthesis.speak(utter);
}
How to know if the given text has started or finished being spoken
The start
event of SpeechSynthesisUtterance
gets fired when it begins to speak the given text. We can use the onstart
event handler property as well.
Similar to the start
event, the end
event of SpeechSynthesisUtterance
gets fired when it finishes speaking the given text. Alternatively, we can use the onend
event handler property.
const utter = new SpeechSynthesisUtterance('This text will be spoken.');
utter.addEventListener("start", (event) => {
console.log('Start', event.utterance.text);
});
utter.addEventListener("end", (event) => {
console.log('End', event.utterance.text);
});
We can also use onstart
and onend
event handlers:
utter.onstart = (event) => {
console.log('Start', event.utterance.text);
};
utter.onend = (event) => {
console.log('End', event.utterance.text);
};
Select the word being read aloud
We can select the word being read with the help of the charIndex
property of the boundary
event of SpeechSynthesisUtterance
, which fires when a word is about to be spoken. This event is an instance of SpeechSynthesisEvent
and has a property utterance
where we can get the current text and a charIndex
property which is the index position of the word. With the help of this index position, we can easily select the word.
Here is a simple example:
const utter = new SpeechSynthesisUtterance();
utter.onboundary = (event) => {
textarea.focus();
textarea.setSelectionRange(
event.charIndex,
event.utterance.text.indexOf(' ', event.charIndex),
);
};
Using Speech Synthesis Markup Language (SSML)
While the documentation says that SpeechSynthesisUtterance.text
supports SSML, it can be challenging to find a browser that fully supports it. Speech Synthesis Markup Language (SSML) is an XML-based markup language that allows for more customization such as emphasis, multiple voices, inserting audio, pausing, inserting marks, and more.
In the following example, onmark
event will be fired when it reaches the <mark name="mark1"/>
SSML tag if SSML is supported:
const ssml = `<?xml version="1.0"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
xml:lang="en-US">
Take a <emphasis level="strong"> deep </emphasis> breath <break/>
then continue. <mark name="mark1"/>
Press 1 or wait for the tone. <break time="5s"/>
I didn't hear you! <break strength="weak"/> Please repeat.
</speak>`;
const utter = new SpeechSynthesisUtterance();
utter.text = ssml;
utter.onmark = (event) => {
console.log(`A mark was reached: ${event.name}`);
};
speechSynthesis.speak(utter);
CodePen examples
Here is the Text-to-Speech code that you can edit and immediately check on CodePen. By default, you can enter text, select a voice, change the speed, and start the read-out. Additionally, you can pause, resume, and cancel the speech.
It is also available on GitHub.
See the Pen Text-To-Speech by Altynbek Usenbekov (@usenbekov) on CodePen.
Basic example
Here is a basic version of the Text-to-Speech code on CodePen. By default, you can enter the text and start the speech with the first available voice.
See the Pen Text-To-Speech: Simple example by Altynbek Usenbekov (@usenbekov) on CodePen.