[mythtv] Voice control of MythTV

Sun Mar 23 20:39:31 EST 2003

This is my first posting to this list, but I have been following the archives and development of MythTV closely for a while now.  I am writing to inform the developers of a project I have been working on as part of my senior design for a Computer Engineering degree.  The project revolves around enabling voice control of various home devices, and we are specifically making a voice controlled set-top box as part of our design.  I'd like to let people know about the project to see if there is any interest, and also to suggest some development ideas for MythTV.

There are three main components to our system, a voice recognition server, a set-top box, and a voice remote control.  The set-top box is simply our custom, light-weight Linux distribution with software for multimedia capabilities.  This is obviously where MythTV is useful.  The voice remote control is a Palm Tungsten T, which we are using due to its integrated microphone and Bluetooth networking capabilities.  We have software running on the Palm which transfers any audio received over Bluetooth to the set-top box that it is controlling.  The set-top box then talks to our voice recognition server, and awaits a control code to signal what action needs to be taken.   A regular microphone hooked up to the sound card can also be used, but doesn't have as much of a "wow factor."

The major part of our project centers in the voice recognition server itself.  A server approach was chosen because most home devices, including set-top boxes, have minimal power and voice recognition capabilities may be too much to embed directly in these devices.  This also complements MythTV's design, being that it is split up into a frontend and a backend.  One voice recognition server could be running on a single backend system, enabling voice control of many frontends.

The voice recognition server works on what we call transactions.  Each transaction specifies one grammar, and any number of optional phrases.  The grammar defines words which are always considered possible utterances the user might say.  For instance "play," "pause," "record," etc. are always something a user could say when watching timeshifted television.

Phrases are loaded dynamically to complement the main grammar, where it makes sense to do so.  Consider MythMusic, and the list of audio tracks it displays.  Those tracks could be loaded dynamically, enabling the user to say "Play Radiohead, Karma Police."  Capabilities could also be added to allow a TV Guide to be filtered into just airings of a single show that the user could say, without a clunky onscreen keyboard for instance.  Capabilities such as these are what I would like to discuss.

MythTV is built from the assumption of your standard, every-day remote control.  It is fairly easy to voice control these standard commands, such as "play" and "stop," but the real interesting features are when you can dynamically select an item to play, from anywhere on screen or even perhaps items that are not listed on screen, such as when you have a long audio playlist.

Admittedly, I have not yet looked very in depth at the MythTV source code, as I have been concentrating on the server and voice remote.  However, I am beginning to, and I'd like to get feedback from the experienced developers as to what may be the best method to implement an advanced control architecture such as this.  Control codes returned by the server are strings, which are defined in the main grammar.  Dynamic grammars can be told what control code to return when they are sent to the server.  I know MythTV uses MySQL to store information, so maybe it would be feasible to have an extra field for the control code, or to build a code out of the current fields.  The right TV Show/MP3/Movie/whatever could then be played based on what control code is returned from the server.  Once this system is in place, various interesting user interfaces could be built around it.  Voice could be used for interfaces not possible with a standard remote, or simply as an alternative to a standard remote.

Our group would like to open source as much of our project as possible.  Any changes that we would make to Myth would be, quite obviously.  The Palm Voice Remote software would also be, and hopefully other projects that need wireless voice input from greater ranges would find that useful.  The server itself links against Nuance libraries for voice recognition capabilities.  The code could be open sourced, but it isn't of much use without the libraries.  These libraries used to be available for free ($), for development purposes from the Nuance Developer Network.  That has been discontinued, however.  Luckily, I obtained copies of all their software when the program was still active.  I am currently looking into options with regard to this situation.

Anyway, I'd really like to hear from the developers if there is any interest in this type of project.  Just from running this for personal use and testing, I can say that it is incredibly cool, even from the standpoint of normal commands.  Dynamic grammar functionallity is potentially a real area that open source software could point to and say "we beat the commercial guys to it."  If you have any questions, feel free to email me or post them to the list.  I will resond as soon as possible.

Thanks for the great software you guys have created,
Jared Hanson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/attachments/20030323/8b418d41/attachment.htm