TLSphinx is a Swift wrapper around Pocketsphinx, a portable library based on CMU Sphinx, that allow an application to perform speech recognition withouth the audio leave the device
This repository has two main parts. First is syntetized version of the pocketsphinx and sphinx base repositories with a module map to access the library as a Clang module. This module is accessed under the name Shpinx
and has two submodules: Pocket
and Base
in reference to pocketsphinx and sphinx base.
The second part is TLSphinx
, a Swift framework that use the Sphinx
Clang module and expose a Swift-like API that talks to pocketsphinx.
Note: I write a blog post about TLSphinx
here at the Tryolabs Blog. Check it for a short history about why I write this.
The framework provide three classes:
Config
describe the configuration needed to recognize the speech.Decoder
is the main class that has the API to perform the decode.Hypotesis
is the result of a decode attempt. It has atext
and ascore
properties.
Represents the cmd_ln_t opaque structure in Sphinx
. The default constructor take an array of tuples with the form (param name, param value)
where "param name" is the name of one of the parameters recognized for Sphinx
. In this example we are passing the acustic model, the languaje model and the dictionary. For a complete list of recognized parameters check the Sphinx docs.
The class has a public property to turn on-off the debug info from printed out from Sphinx
:
public var showDebugInfo: Bool
Represent the ps_decoder_t opaque struct in Sphinx
. The default constructor take a Config
object as parameter.
This has the functions to perform the decode from a file or from the mic. The result is returned in an optional Hypotesis
object, following the naming convention of the Pocketsphinx API. The functions are:
To decode speech from a file:
public func decodeSpeechAtPath (filePath: String, complete: (Hypotesis?) -> ())
The audio pointed by filePath
must have the following characteristics:
- single-channel (monaural)
- little-endian
- unheadered
- 16-bit signed
- PCM
- sampled at 16000 Hz
To control the size of the buffer used to read the file the Decoder
class has a public property
public var bufferSize: Int
To decode a live audio stream from the mic:
public func startDecodingSpeech (utteranceComplete: (Hypotesis?) -> ())
public func stopDecodingSpeech ()
You can use the same Decoder
instance many times.
This struct represent the result of a decode attempt. It has a text
property with the best scored text and a score
with the score value. This struct implement Printable
so you can print it with println(hypotesis_value)
.
As an example let's see how to decode the speech in an audio file. To do so you first need to create a Config
object and pass it to the Decoder
constructor. With the decoder you can perform automatic speech recognition from an audio file like this:
import TLSphinx
let hmm = ... // Path to the acustic model
let lm = ... // Path to the languaje model
let dict = ... // Path to the languaje dictionary
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
let audioFile = ... // Path to an audio file
decoder.decodeSpeechAtPath(audioFile) {
if let hyp: Hypotesis = $0 {
// Print the decoder text and score
println("Text: \(hyp.text) - Score: \(hyp.score)")
} else {
// Can't decode any speech because an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
The decode is performed with the decodeSpeechAtPath
function in the bacground. Once the process finish the complete
closure is called in the main thread.
import TLSphinx
let hmm = ... // Path to the acustic model
let lm = ... // Path to the languaje model
let dict = ... // Path to the languaje dictionary
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
decoder.startDecodingSpeech {
if let hyp: Hypotesis = $0 {
println(hyp)
} else {
// Can't decode any speech because an error
}
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
//At some point in the future stop listen to the mic
decoder.stopDecodingSpeech()
The more clear way to integrate TLSphinx
is using Carthage or similar method to get the framework bundle. This let you integrate the framework and the Sphinx
module without magic.
In your Cartfile
add a reference to the last version of TLSphinx
:
github "Tryolabs/TLSphinx" ~> tag_pointing_to_the_last_version
Then run carthage update
, this should fetch and build the last version of TLSphinx
. Once it's done drag the TLSphinx.framewok bundle to the XCode Linked Frameworks and Libraries. You must tell XCode where to find Sphinx
module that is located in the Carthage checkout. To do so:
- add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include
to Header Search Paths recursive - add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/lib
to Library Search Paths recursive - in Swift Compiler - Search Paths add
$(SRCROOT)/Carthage/Checkouts/TLSphinx/Sphinx/include
to Import Paths
Download the project from this repository and drag the TLSpinx project to your XCode project. If you hit errors about missing headers and/or libraries for Sphinx please add the Spinx/include
to your header search path and Sphinx/lib
to the library search path and mark it as recursive
BrunoBerisso, [email protected]
TLSphinx is available under the MIT license. See the LICENSE file for more info.