This application is a Starter Kit (SK) that is designed to get you up and running quickly with a common industry pattern, and to provide information about best practices around Watson services. The Audio Analysis application was created to highlight the combination of the Speech to Text (STT) and AlchemyLanguage services as an Audio Analysis tool. This application can serve as the basis for your own applications that follow that pattern.
Demo.
Note: This sample application only works on desktop computer systems, and then only in the Firefox and Chrome web browsers.
- How this app works
- Getting Started
- About the Audio Analysis pattern
- User interface in this sample application
- Troubleshooting
The Audio Analysis application extracts concepts from YouTube videos.
To begin, select or specify a YouTube video. As the video streams, the Speech to Text service transcribes its audio track. That text is then piped to the AlchemyLanguage service for analysis, it extracts concepts from the transcription with an associated score.
- Clone the repository into your computer.
git clone https://github.com/watson-developer-cloud/audio-analysis.git
-
Sign up in Bluemix or use an existing account.
-
If it is not already installed on your system, download and install the Cloud-foundry CLI tool.
-
Edit the
manifest.yml
file in the folder that contains your code and replaceaudio-analysis-starter-kit
with a unique name for your application. The name that you specify determines the application's URL, such asapplication-name.mybluemix.net
. The relevant portion of themanifest.yml
file looks like the following:applications: - services: - speech-to-text-service - alchemy-language-service name: application-name command: npm start path: . memory: 512M
-
Connect to Bluemix:
cf api https://api.ng.bluemix.net
cf login
- Create and retrieve service keys to access the AlchemyLanguage service:
cf create-service alchemy_api free alchemy-language-service
cf create-service-key alchemy-language-service myKey
cf service-key alchemy-language-service myKey
- Create and retrieve service keys to access the Speech to Text service:
cf create-service speech_to_text standard speech-to-text-service
cf create-service-key speech-to-text-service myKey
cf service-key speech-to-text-service myKey
- Create a
.env
file in the root directory of your clone of the project repository by copying the sample.env.example
file using the following command:
cp .env.example .env
You will update the .env
with the information you retrieved in steps 6 and 7.
The .env
file will look something like the following:
ALCHEMY_LANGUAGE_API_KEY=
SPEECH_TO_TEXT_USERNAME=
SPEECH_TO_TEXT_PASSWORD=
- Install the dependencies you application need:
npm install
- Start the application locally:
npm start
-
Open a browser and go to: http://localhost:3000/
-
Push the application to Bluemix:
cf push
After completing the steps above, you are ready to test your application. Start a browser and enter the URL of your application.
<your application name>.mybluemix.net
See the User interface in this sample application section for information about modifying the existing user interface to support other video sources.
First, make sure you read the Reference Information to understand the services that are involved in this pattern.
When a quality audio signal contains terms found in the current source of concepts in AlchemyLanguage, the combination of Speech To Text and AlchemyLanguage can be used to analyze the audio source to build summaries, indices, and to provide recommendations for additional related content. Though the Speech-To-Text service supports several languages, the AlchemyLanguage service currently only supports English.
The Audio Analysis app uses the node.js Speech-To-Text JavaScript SDK, which is a client-side library for audio transcriptions from the Speech To Text service. It also uses the concepts
feature from AlchemyLanguage to extract concepts.
- You need to analyze or index content contained within speech.
- You want to make content recommendations based on speech.
- The quality of the audio source determines the quality of the transcript, which affects the quality of extracted concepts and recommendations.
- The quality and confidence of the extracted concepts increases with the amount of transcribed text.
The following links provide more information about the AlchemyLanguage and Speech to Text services, including tutorials on using those services:
- API documentation: Get an in-depth understanding of the AlchemyLanguage service
- API explorer: Try out the REST API
- API documentation: Get an in-depth understanding of the Speech To Text service
- API reference: SDK code examples and reference
- API Explorer: Try out the API
The user interface that this sample application provides is intended as an example, and is not proposed as the user interface for your applications. However, if you want to use this user interface, you will want to modify the following files:
src/views/index.ejs
- Lists the YouTube videos and footer values that are shown on the demo application's landing page. These items are defined using string values that are set in the CSS for the application.src/views/videoplay.js
- Maps YouTube video URLs to API calls and initiates streaming. You will want to expand or modify this if you want to use another video source or player.src/index.js
- Supports multiple types of YouTube URLs. You will want to expand or modify this if you want to use another video source or player.
When troubleshooting your Bluemix app, the most useful source of information is the execution logs. To see them, run:
$ cf logs <application-name> --recent
Find more open source projects on the IBM GitHub Page
This sample code is licensed under the Apache 2.0 license. Full license text is available in LICENSE.
See CONTRIBUTING.