Skip to content

Latest commit

 

History

History
73 lines (60 loc) · 4.07 KB

lab3.org

File metadata and controls

73 lines (60 loc) · 4.07 KB

Lab III. Speech synthesis

In this lab session you will practice styling TTS output using Speech Synthesis Markup Language (SSML). It is assumed that you have read the relevant literature on the subject before attempting to solve the assignments.

For reference:

Available TTS platforms:

  • Google TTS: Docs, you can test it in the Google Actions Console
  • Amazon Polly: Docs, Test (you need to have an account)
  • IBM Watson: Docs
  • Azure Text-to-Speech: Docs, Speech Studio (audio content creation)

Part A: SSML warm-up exercise

The objective of this assignment is to “style” dialogue system’s utterances. Here is what are going to work with:

I have your calendar open. 
<break strength="medium"/> 
For what date? 
<break strength="medium"/> 
What time would you like to start? 
<break strength="medium"/> 
How much time do you want to block out? 
<break strength="medium"/> 
What shall we call this? 
<break strength="medium"/> 
Ok. [create and style the appointment summary!]

Your job is to create the last utterance and to make all the system’s utterances articulate better by inserting SSML tags in the dialogue.

Part B: The sound of dialogue

Modify the content produced by speech syntesis for you “Appointement” application from Lab II, so that it reads everything correctly.

  • Create a text document in your repository where you describe the words which are mispronounced by the system (especially when reading DuckDuckGo descriptions).
  • Improve their pronunciations with custom lexicon in Microsoft Speech Studio.
  • The lexicon should be linked by using a public link (!) because should be available for Azure TTS. In Speech studio you normally create text files which can be used to test custom lexicons. If you switch to SSML representation of text file you will find a public link to a lexicon file. In my case, it starts from https://cvoiceprodneu.blob.core.windows.net/acc-public-files/.

Part C: Speech Synthesis Poetry Slam

A poetry slam is a competition at which poets read or recite original work (or, more rarely, that of others). These performances are then judged on a numeric scale by previously selected members of the audience. (Wikipedia)

Your task in this assignment is to use SSML in order to get an artificial poet to recite the your favourite poem (just a couple of verses) with a speed and in “a style” similar to the way how it is read by an actor (or by a poet her/himself).

You can refer to some poetry performance found on YouTube or elsewhere.

Sources for inspiration:

Submission

In you submission mention which platform you used and provide:

  1. text files with your SSML code (for parts A, B and C)
  2. audio file for Part C
  3. reference for the performance for Part C

These files can be placed in your Github repository.

Part D: Peer assignment 2

TBA