Ergodic Hidden Markov Models for Visual-Only Isolated Digit Recognition
|Title||Ergodic Hidden Markov Models for Visual-Only Isolated Digit Recognition|
|Year of Publication||2007|
|Authors||Terry, L. H.|
|Academic Department||Department of Electrical Engineering and Computer Science|
|Number of Pages||90|
Accurate, robust, automatic speech recognition (ASR) systems represent the cornerstone of advanced human-computer interaction. An inherent asymmetry exists in current human-computer interfaces (HCIs); the human communicates through physical interaction via a keyboard and/or mouse while the computer communicates through visual and auditory channels. This asymmetry prevents human-computer interaction in the familiar, natural style of human-human interaction. With systems able to mimic human-human interaction, technology becomes more natural, free-flowing, and seamlessly integrated into modern life. This broad-reaching goal serves as motivation for this work.
Upon examination of modern speech recognition systems, one finds that no system actually models the speech production phenomenon. These state of the art systems instead model the symbols produced by speech in the form of words or sub-word units. With this realization in mind, this work proposes a system architecture which directly models the speech production phenomenon. As a direct consequence, this approach separates articulation modeling and articulation interpretation.
This work formulates this framework, dubbed the "Articulation Model" framework, and then compares it against the state of the art speech recognizer using only video cues to recognize isolated digits.