RACING TO CURE ALS
Carol’s Team
News Carol Mission Development Racing
Development
Concept

Conceptual Framework for Carol's Song

The problem:

Amyotrophic lateral sclerosis (ALS) patients typically lose their ability to speak while retaining the cognitive functions to be productive members of society. Their needs in the area of speech communication aids are unique for two reasons:

The loss of speech is typically accompanied by a loss of other motor functions, thus greatly reducing the effectiveness of traditional input devices such as keyboards.

The speed at which the disease progresses makes long learning curves impractical. To be useful, a communication aid must be very easy to use and transfer seamlessly from one input device to another.

Current communication aids for the speech impaired have three main components: 

  1. An input device for the speech impaired person to enter their communication.
  2. A translation unit that converts the input to text.
  3. A Text To Speech (TTS) device that converts the text to synthesized speech.

While improvements are possible in all three areas, it is the opinion of this group that the greatest gains are possible in the second area.

In the simplest case, the translation unit is nothing more than a conduit, collecting a string of characters from the input device and then passing the completed string to the TTS device. Such units are woefully inadequate for ALS patients because the input rate is too slow for real-time dialog.

More sophisticated units employ prediction algorithms which offer options for completing words and sentences based on what has already been entered. Most of these algorithms are based on matching the text entered to the beginning of strings held in a dictionary. Such algorithms have the advantage of being intuitively obvious and fast to execute, but they do not take full advantage of the redundancy inherent in written text. For small words, the number of keystrokes is not significantly reduced. For long words, the number of options does not become manageable until many characters have been entered.

Predicting sentences by matching the first few words falls prey to the same problems of information theory. The object of a sentence is often the last word in the sentence. It is difficult to predict the sentence until the main parts of speech are known. A translation unit that requires words be entered in the order of the final sentence mandates that words be entered that could have been predicted.

In an attempt to obviate these difficulties, some translation units also allow the use of stored abbreviations for words and complete sentences. While these are effective, the total number of abbreviations is necessarily small since a large number would require extensive memorization and extra keystrokes.

Another area that is poorly addressed by most translation units is the correction of typographical errors. This is particularly important for users who also suffer from motor skill impairments.

Whats needed is a translation unit that allows the user to abbreviate freely and offer potential words and sentences based on the relevance of the information rather than the order. The translation unit also needs to recognize incongruous information (typos) and attempt to match the remaining information. In essence, the translator needs to employ common sense.

Proposal: Bayesian Adaptive Prediction Algorithm

Since it is impossible to know exactly what message is intended until the entire message is delivered, a prediction algorithm must determine which possibilities are most likely intended and then present them as options to be selected.

The probability that a user intends to send a given message, M, is represented as:

P(M)

The set of probabilities for all possible messages is called the Prior Distribution (or simply, Prior) because we have not yet applied any observations to the calculation of the probabilities. Once a string is entered, we have more information, and thus can update the Prior. The probability of a message, M being intended given that a user has entered a string, S, is expressed as:

P(M|S)

This set of these probabilities over then entire set of possible messages is called the Posterior Distribution (or Post), because it is a distribution modified by the presence of a known condition. This probability is extraordinarily difficult to compute directly. Fortunately, Bayes Theorem gives us a mechanism to compute it indirectly:

P(M|S) = P(M) P(S|M) / P(S)

Two pieces of this equation are known, or at least can be reasonably estimated. P(M) is just our Prior, while P(S|M) is the probability that a user would choose the string S to represent M (as opposed to some other string).

Both these terms will vary from user to user, thus the need for an adaptive algorithm. The initial distributions will be estimated and then the program will track the actual usage and update the probabilities accordingly.

The final piece, P(S), is the probability of seeing a particular string. While this is the most heavily user dependent and therefore the most difficult piece to estimate, it is also completely unnecessary. Since the value of S is fixed, P(S) is merely a constant that normalizes the distribution so all the probabilities sum to 1. The goal here is to determine the most likely candidates, not assess their actual probability, so the P(S) term can be left out of the computation.

The benefits of using a Posterior Distribution to rank candidate results are manifest. Very little training is required as the user is free to abbreviate as they see fit. Over time, the program adapts to their usage rather than vice-versa. However, to implement such a system, two significant challenges need to be addressed:

  1. The set of possible messages and the set of possible strings are both infinite. Exhaustive evaluation of even a restricted list of each would require computing resources that simply do not exist, much less be available to a speech impaired user.
  2. Completely unanticipated abbreviations will never be recognized (and thus not adapted to) since if P(S|M) = 0 then the Post will never rank M as a viable candidate for string S.

The strategies for dealing with these two challenges are outlined below.

The problem of infinite domain:

The set of messages can be reduced to a finite number by restricting the vocabulary and the length of sentences. However, even a tiny vocabulary of 1000 words and a limit of 10 words per sentence yields approximately 1030 possible messages. Further restricting the message domain to grammatically correct statements still results in a set that is far beyond the storage capacity of any device for the foreseeable future.

Translating directly from the input string to the final message is apparently to great a leap. If translation is done word by word, much of the power of the algorithm will be lost.

An intermediate form is to use a tiered translation. The domain of P(M) will include both individual words and the most frequent messages. Since individual words appear with a much higher frequency than complete sentences, the prediction algorithm will initially create a Post that sorts only words to the top. As words are recognized and integrated back into the string, a grammatical context is generated and seeded with the known parts of the message. This is used to generate new candidate messages and include them in the next Prior and Post distributions. Eventually, complete messages will find their way to the top of the list.

An example of how this might work follows:

Input Message String Candidates
R R 1. ARE
2. RUN
3. RAIN
4. ROOM
M RM 1. ROOM
2. RESTROOM
3. REMAIN
4. ROAM
<F2> RESTROOM 1. I HAVE TO GO TO THE RESTROOM.
2. WHERE IS THE RESTROOM?
3. DO YOU HAVE A RESTROOM?
4. I NEED A HANDICAPPED RESTROOM.
C RESTROOM C 1. CAR ;
2. CAN
3. SEE
4. CAT
L RESTROOM CL 1. CLEAN
2. CLIMB
3. CLOSE
4. CLASS
G RESTROOM CLG 1. CLEANING
2. CLING
3. CLOG
4. THE RESTROOM NEEDS CLEANING.
<F1> RESTROOM CLEANING 1. THE RESTROOM NEEDS CLEANING.
2. I ENJOY CLEANING A RESTROOM. ;
3. CLEANING A RESTROOM IS SCARY.
4. MY JOB IS CLEANING A RESTROOM.
M RESTROOM CLEANING M 1. ME
2. MY
3. EMPTY
4. MAN
<F4> RESTROOM CLEANING MAN 1. A MAN IS CLEANING THE RESTROOM.
2. WHERE IS THE RESTROOM CLEANING MAN?
3. A MAN IS CLEANING IN THE RESTROOM.
4. THE RESTROOM IS CLEANING A MAN.

The complexity of the candidate generation increases as the input string gets longer. As more letters are added to a word in progress, the candidate vocabulary is increased. As more words are added to the string, more complicated grammatical structures are added to the message generator. This allows for relatively quick identification of simple and often repeated messages while still assisting with the creation of more complicated messages.

At a certain level of complexity, the generator will cease to create grammatical structures for the message. At that point, the prediction algorithm reverts to simple word prediction and the occasional expansion of a phrase.

The problem of unpredictability:

Word abbreviations are highly predictable, since the abbreviation is almost always a subset of the letters in the word. Even transpositions and typos are easily accounted for by simply assigning a probability penalty to each form of correction.

Much more difficult is predicting the abbreviations of phrases or complete messages. Since all but the most common phrases and messages are generated outside of the vocabulary list, the algorithm needs a grammatical context to create the phrase and assign a Prior. If the user picks an obscure (albeit obvious to them) abbreviation for the phrase and no other message information exists, no match will be found. The user will have to erase the input string and try again. The program will never learn to associate the original string with the message.

To facilitate the linking of such input/message pairs, the program will offer a retry command (triggered by a special input character) distinct from the normal correction keys. Upon receiving a retry, the current token (contiguous string of characters) is erased from the input buffer, but the token is saved. The user continues to build the message in the normal manner.

For every selected candidate between the retry and the acceptance of the final message, a provisional pair (original token, selected candidate) is stored. The next time the original token is encountered, the candidates from the provisional pairs are displayed. The one that is selected is added to the vocabulary list and associated with that token.

To further facilitate the adaptation, anytime a candidate or full message is selected, the message string at that point is added to the vocabulary list and associated with all the partial sets of keystrokes that preceded the acceptance. Infrequently used phases and messages will be sorted to the bottom of the vocabulary list and not affect performance. Frequently used phrases and messages will become part of the active vocabulary and will thus be offered as candidates sooner than grammatically generated messages.

Home News Carol Mission Development Racing Contact Us Site Map
� Copyright 2005-2013, Carol's Team