Lab III. Speech synthesis

In this lab session you will practice styling TTS output using Speech Synthesis Markup Language (SSML). It is assumed that you have read the relevant literature on the subject before attempting to solve the assignments.

For reference:

Speech Synthesis Markup Language (SSML) Version 1.0, W3C Recommendation 7 September 2004, http://www.w3.org/TR/speech-synthesis/
Documentation for implementation in each of the platforms

Available TTS platforms:

Google TTS: Docs, you can test it in the Google Actions Console
Amazon Polly: Docs, Test (you need to have an account)
IBM Watson: Docs
Azure Text-to-Speech: Docs, Speech Studio (audio content creation)

Part A: SSML warm-up exercise

The objective of this assignment is to “style” dialogue system’s utterances. Here is what are going to work with:

I have your calendar open. 
<break strength="medium"/> 
For what date? 
<break strength="medium"/> 
What time would you like to start? 
<break strength="medium"/> 
How much time do you want to block out? 
<break strength="medium"/> 
What shall we call this? 
<break strength="medium"/> 
Ok. [create and style the appointment summary!]

Your job is to create the last utterance and to make all the system’s utterances articulate better by inserting SSML tags in the dialogue.

Part B: The sound of dialogue

Modify the content produced by speech syntesis for you “Appointement” application from Lab II, so that it reads everything correctly.

Create a text document in your repository where you describe the words which are mispronounced by the system (especially when reading DuckDuckGo descriptions).
Improve their pronunciations with custom lexicon in Microsoft Speech Studio.
The lexicon should be linked by using a public link (!) because should be available for Azure TTS. In Speech studio you normally create text files which can be used to test custom lexicons. If you switch to SSML representation of text file you will find a public link to a lexicon file. In my case, it starts from https://cvoiceprodneu.blob.core.windows.net/acc-public-files/.

Part C: Speech Synthesis Poetry Slam

A poetry slam is a competition at which poets read or recite original work (or, more rarely, that of others). These performances are then judged on a numeric scale by previously selected members of the audience. (Wikipedia)

Your task in this assignment is to use SSML in order to get an artificial poet to recite the your favourite poem (just a couple of verses) with a speed and in “a style” similar to the way how it is read by an actor (or by a poet her/himself).

You can refer to some poetry performance found on YouTube or elsewhere.

Sources for inspiration:

California Dreaming (386DX art project).
Without Me, which was made by Robert Rhys Thomas in 2019 for this course.
Bad Guy, which was made by Fang Yuan in 2020 for this course.

Submission

In you submission mention which platform you used and provide:

text files with your SSML code (for parts A, B and C)
audio file for Part C
reference for the performance for Part C

These files can be placed in your Github repository.

Part D: Peer assignment 2

TBA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lab3.org

lab3.org

Lab III. Speech synthesis

Part A: SSML warm-up exercise

Part B: The sound of dialogue

Part C: Speech Synthesis Poetry Slam

Submission

Part D: Peer assignment 2

Files

lab3.org

Latest commit

History

lab3.org

File metadata and controls

Lab III. Speech synthesis

Part A: SSML warm-up exercise

Part B: The sound of dialogue

Part C: Speech Synthesis Poetry Slam

Submission

Part D: Peer assignment 2