Tech Talk: The Age of the Speak Easy

January 14, 2011
Jonathan Bertman, MD

MDNG Hospital Medicine, December 2010, Volume 4, Issue 6

Speech recognition technology continues to improve, with many useful applications for use in our medical practices and in our daily lives.

Ever since Ali Baba commanded his cave to “Open Sesame,” speech recognition (SR) has been a dream of mankind. Now, here we are in the 21st century and instead of caves, we’re talking to computers, automated information operators, and our cars. The problem is, much of the time we’re not just talking to these machines, we’re yelling at them, because the technology still doesn’t work as well as The Jetsons or Arthur C. Clarke promised us it would. But the improvements are coming quickly, and it won’t be long before speech recognition is an integral part of all our lives, including those of us in the medical profession.

In fact, health care providers are already among the foremost users of speech recognition technology. So, let’s take a short step back and examine how SR for medical transcription works. For starters, the technology can be implemented in either the front-end or back-end of the documentation process. In front-end SR, the provider dictates into a speech-recognition engine, and the recognized words are displayed almost immediately after being spoken. This is also known as “natural language processing.” The dictator (or “speaker” if you’d rather not conjure up images of the Fuhrer railing into his microphone) is then responsible for editing and signing off on the document. It never goes through a medical transcriptionist/editor. For years, many of us have dictated our patient notes into a personal computer using software like Dragon Naturally Speaking rather than pay a transcription service.

Back-end SR, or deferred SR, occurs when the provider dictates into a digital dictation system, and the voice file is routed, along with the recognized draft document, to a

transcriptionist, who then edits the draft and finalizes the report. At the moment, deferred SR is being more widely used in the industry.

Because searches, queries, and form filling are often faster to perform by voice than keyboard, many HER systems can and will be more effective when deployed in conjunction with a speech-recognition engine. But there are different methods for getting the information from the SR engine into the EHR. Discrete reportable transcription (DRT) is the process of converting narrative dictation into text documents with discrete data elements than can be easily imported into the appropriate placeholders inside an EHR. Some back-end SR technologies support data structuring for EHRs, but they have a harder time moving structured information into EHRs like DRT can.

Beyond medicine, speech recognition technology is popping up all over the place. For example, my Motorola Droid smartphone lets me “dial by name” or look up information online by simply speaking my search query. This is awesome at the dinner table, when one of my sons asks me something I don’t know. Try entering: “What year did women get to vote?” into a smartphone and you’ll recognize how much time and energy SR technology can save when it works (It wasn’t until 1920). Most of these systems employ what’s known as direct voice input (DVI) technology, in which the user issues instructions to the machines via voice commands. DVI can be either “userdependent,” which requires a personal voice template to be created before working properly, or “user-independent,” which will work with the voice of any user.

In addition to making simple conveniences even simpler, speech recognition technology can be used to provide people with certain disabilities access to computers, phones, and other electronic appliances they wouldn’t have otherwise. Blind people can use the technology to dial phone numbers they are unable to look up, while the hard of hearing can be served by the use of speech-to-text conversion.

There is no doubt that the health care industry has as much to gain as any by the recent advances being made in speech recognition, and it’s high time we started realizing that. Once again, though, it depends on us being willing to embrace the new technology rather than fear it. Because don’t we want to spend less time filling out forms and more time with our patients? After all, that’s why most of us got into this business in the first place, and what most of us in the end still do best.

Open Sesame indeed.

Dr. Bertman is Physician Editor-in-Chief of MDNG: Primary Care/Cardiology Edition. He is a Clinical Assistant Professor of Family Medicine at Brown University and president of AmazingCharts.com, a leading developer of EHR software. He is also the founder and president of AfraidToAsk.com, a consumer website focusing on personal medical topics. He is in private practice in Hope Valley, RI.