Wednesday, 6 August 2014

Voice control combined with text-to-speech

I attended a demo of the well known assistive technology 'BrowseAloud' today. I have known about this service for a long time and there are several similar services available which all work in very similar ways. The idea is to offer assistive services on a website which allows the device to read text on the website out loud, translate the site into other languages (and read it out loud too), convert sections of text into an MP3 file and a few visual aids like enlarge text and colourise text for high contrast.

The model works by selling the rights for the service to a website owner which is installed via a JavaScript code snippet on all pages of the site.  The cost to organisations varies but it can cost between £600 - £1000 per year.  Often, this cost is seen by organisations as a 'Tick-box' exercise to ensure that accessibility is covered.

My concern is that none of this helps a person who has a disability very much.  They cannot dictate which websites they browse to will have assistive tech installled and if so, which one. It is likely that most will not have BrowseAloud or a similar service so for these sites, they are going to have to cope without.  For those with some assistive technology installed, the use will have to use whatever is there. This means having to switch to whichever one is on offer. Each website offers their assistive technology in different ways - some have a link at the top of every page, others at the bottom of every page.  Some will offer some kind of icon and others will require you to locate the 'accessibility' link.  Once loacted, some will offer a link to load up a Javascript toolbar whilst older tools have to be downloaded as software to be installed to the computer. The last option is OK if the user only ever uses a single device but for use on several devices or in a public space like a library or at work, it is likely that the download and install of software will be banned.

So to sum up, a disabled user who uses the web to browse many sites is faced with a minefield of issues, even if assistive products are offered.

This got me thinking about mobile devices and their options for voice control and text-to-speech.

Siri for OS devices and Google Now for Android devices both offer some extensive voice control. They also have some excellent tools to convert text into speech.  The trouble is that there isn't any combination of the two technologies to offer a complete solution.

What is the point of using speech control to search for a web page if the page you find can't be read out loud to you? It is like asking a person a question but requiring them to write the answer down on a piece of paper for you to read rather than simply listening to the answer.  Surely what is required is to transform the most commonly used applications - READ TEXTS, READ MAIL, and READ WEBPAGE.  Some recent updates on both Siri and Google Now seem to have added the ability to deal with texts and email but why have they missed the other most commonly used app on any mobile device - the web browser?

What I'm suggesting is to develop a voice control which will work the web browser AND instruct that the page should be read out. This would means there is no need for assistive tools to be bolted-onto individual websites because every website would be treated the same way.

The best way to test such a service is to blindfold the user. Right from start-up, there should be a voice command which will search for content on Google AND announce the results. If the result is a web page link, there needs to be a voice control to OPEN that link which will load the page in the web browser.  Once loaded, a voice command is required to READ PAGE. The reading of the page needs to include hyperlinks and allow the user to follow those links with a voice command. In this way, a user could browse multiple websites using only spoken word commands combined with text-to-speech spoken content. Headphones could be used to listen to the content.  If translation tools like Google Translate were to be incorporated, the use could use voice commands to TRANSLATE PAGE and then READ PAGE in the selected language.

This isn't only going to revolutionise the way blind and partially sighted people access and navigate the web.  It could easily be used by everyone.  Imagine a driver who wants to listen to a Wikipedia page about the company they are visiting?  Or perhaps a baker in their kitchen who has their hands covered in flour but still needs to listen to the next instruction in the recipe they are following?  Or a motor mechanic who's hands are covered in grease and with their head stuck inside an engine but still needs to hear the next instruction from the webpage they found on their iPad propped out of reach on the bonnet?

Imagine any science fiction story or film you ever saw. Chances are that they all used voice control to request answers from the 'computer' who 'spoke the answers back'. This is no longer science fiction - there are both voice control and text-to-speech software which can mimic all that we've seen in the movies. I just feel that both Siri and Google Now are trying too hard to be clever when they've missed out on a golden opportunity to transform a very basic function of all computing technology - to search for web content and hear the results without typing or touching the device.