With many uses, ranging from automatic telephone processing to controlling household equipment, voice recognition is now at the forefront of modern technology solutions. Commonly known as “Speech to text”, Speech Recognition (SR) technology is able to the convert the audible sounds produced when talking into a written form.
Extremely popular in the Medical, Legal and Media sectors, companies such as Nuance and their range of software including Dragon Professional, are making significant progress in making this once unreliable technology a real “must have” business tool.
Front-End speech recognition
Typically a user directly speaks into a USB recording device such as an Olympus Directmic DR-2200 or Phillips Speachmike Pro, with transcription taking place in real time into the users chosen software application. In front-end recognition, the Dictator can amend the output of the transcription and is often responsible for the final quality control.
Back-End speech recognition
Often used in conjunction with portable digital dictation units such as the Olympus DS-3500 and DS-7000, back-end recognition is the delayed processing of the audio file. Often sent to a third party for transcription, the audio file is passed through the voice recognition software and a transcribed document is produced.
Voice Profiles and Training
Basic systems have the ability to recognize the majority of spoken words “out of the box”, however the more advanced systems such as Dragon from Nuance require “Training”, with the benefits of user tailored training resulting in much higher accuracy levels of transcription. Training often consists of the user reciting text enabling the software to “learn” how the person speaks.
Most professional solutions have the ability to update the users “profile” based using adaptive technology. At the top end, the more intelligent systems are able to learn directly from the edited documents, simply pairing the words originally spoken with the revised version of the outputted text.
Accuracy can often be affected by factors such as pitch, lack of enunciation, external interference, dialect and accent. It is therefore important to consider the following when deciding on a final solution.
- Quality of Microphone – does the microphone contain features such as noise cancelling?
- Profile Training – does the solution contain methods to train the software and if so, does it implement post training, adaptive learning.
- User co-operation – are the users prepared to commit the time to both train the software and adapt to a more “speech recognition” friendly way of dictating.
Advanced features of some of professional solutions such as Dragon Professional, include the ability to control common applications such as Microsoft’s office suit using voice commands, such as “open” “save” etc. Combined with the used of “canned text”, inserted with a simple command, the integration solutions are endless.