July 9, 2007 by James A. Larson, speechtechmag.com
There are many ways a user can respond to the prompt What would you like to drink? While some of us might want a triple martini or an intergalactic gargle blaster, let’s suppose that the user only wants a Coke. The developer specifies a grammar containing the words and phrases Coke, Coca-Cola, or that fizzy brown drink. The speech recognition system compares the user utterance with each word and phrase in the grammar and chooses the word or phrase that most closely matches.
How does the speech application know that Coke, Coca-Cola, or that fizzy brown drink actually mean the same drink? One approach is to have the speech application look up these words in a translation table. A better approach is to embed the translation of each word within the grammar so that when the user speaks either Coke or that fizzy brown drink, the speech recognition engine will translate the words to Coca-Cola.
Just as the World Wide Web Consortium (W3C) made Speech Recognition Grammar Specification (SRGS) the standard for defining the grammars used by a speech engine, the W3C has specified Semantic Interpretation for Speech Recognition (SISR)as the standard for developers to interpret the words recognized by the speech engine.
SISR uses the ECMAScript Compact Profile, a strict subset of ECMAScript designed to meet the needs of resource constrained environments. Special attention has been paid to constrain ECMAScript features that require large amounts of system memory and processing power. In particular, it is designed for use in a lightweight environment. Thus, ECMAScript fits snugly within the grammar rules for extracting semantic information from the words recognized by the speech engine.