2008-01-13

Bill Gates says: mouse is out, touch screen and natural language interface are in

Microsoft Chairman Bill Gates has recently said touch screens will dominate PC development while answering BBC online readers' questions. Here you can listen to him talk about future technology, Xbox, Microsoft's dominance, Windows Vista, his views on the competition, open source and his computer use. Answering BBC’s questions he said, for example, that one day we would be able not only to talk to our computers, but also our phones, which are becoming increasingly software-centric.

Last week Bill Gates unofficially opened the International Consumer Electronic Show (CES) - the world’s largest consumer electronics tradeshow in Las Vegas and also expressed his view on the future of software. In his opinion, the “second digital decade” will focus more on connecting people and be increasingly “user-centric”. While the first digital decade was marked by the keyboard and the computer mouse, the new decade will be marked by “natural user interfaces” such as touch screens and speech control, Gates predicts. How can that be useful? For example, we will dictate an email to our computer, and it will convert our words into a graphic version. Paradise for lazy guys, huh?

That one day soon we will use handwriting, voice and touch to control our computers, Bill Gates has been saying for years. 3 months ago he gave an interview about speech recognition. Here are some quotes:

Ina Fried, CNET News.com: With speech recognition, one of the ideas is that there are some applications where it can pay off, even if it is not getting 100 percent recognition. Is finding some of those areas one of the keys to speech recognition being mainstream?
Bill Gates: That's right. Remember, the stuff we're doing with unified communications, speech recognition is not actually a very key element of what goes on. There are some aspects of it. For example, when you're doing audio conferencing in our world, we can tell you who's speaking. And that's very frustrating today in traditional audio conferencing that you don't know who's come and gone, and somebody can speak up and you don't know who that is.

Or with RoundTable (Microsoft's 360-degree video conferencing camera), we use video and audio clues to tell who's speaking and bringing the focus on that. And you always have the full room view at the bottom, but you have that zoomed-in view as well. And so, you know, if it gets it slightly wrong, you can look at the full-room view and see exactly what's going on. And just like if the cameraman was focusing on something different you were interested in, well, the wide view takes care of that.

When you want to search something (in a meeting) if a word sounds like one of three things, for the search case, you can just index all three. And the fact that you might get some false positives, that is, when you do a search, you might get some part of the speech where a similar sounding word was being used, it's not that big a deal. You'll just look at it, skip past it. And so not being perfect is not a huge problem.

And I imagine that's going to be a huge change in video search, for example. Today when we have video searches, you are basically searching keywords of the Internet page that surrounds the video, the description, that sort of thing. When we start using voice recognition to search within the videos, we'll have a much more powerful experience, right?
Yeah, that will help a lot. Microsoft Research has some amazing demos around that. In terms of broadcast videos, of course, there's the requirement that there be the text annotation. So if you have that, you actually have the speech-to-text that has been done for the deaf listener, anybody who wants the captioning-type capability. So there's a lot of video out there where if you ingest it in the right way, that's available. For the bottoms-up video, or just a meeting you have in the business, then you're relying on the speech recognition software to make it easy to navigate.

What are some of the areas where you see voice going that people aren't necessarily thinking about today?
To me, voice is in the broad realm of natural interface. And natural interface is (the notion of) screens everywhere - screen in your desk, screen in your tables, screen on your walls, no more white boards, touching, which is like Surface, where you can manipulate things. It's a pen so you can have ink wherever you want. You know, pull up an article, write a little note on it and get it sent off to a friend.

The speech recognition comes into it - all these things about natural interface are coming to the fore, and they are probably the thing that's most underestimated right now about the digital revolution. (...)

You talked about different natural language interfaces. You know, with multitouch, it seems to have really captured people's imaginations, both with what you guys have shown with Surface, certainly with the iPhone. Voice seems to be a little slower in terms of speech recognition as a mainstream computer interface.
Well, that's fair. Voice recognition is a harder thing. There are certainly tons of people, and I mean millions, who for some reason, the keyboard's not attractive to them. Either they have repetitive stress injury, or they're in a work environment where they're doing something else with their hands, where they've taken the time to learn the software and adapt to the software and gone through the training process there. And they love it. They can't believe other people don't use it.

For the rest of us, the keyboard has worked so well that we are even getting the keyboard into phones. I think voice search on the phone is one of those applications that would really drive it forward. (...)

You guys built a pretty significant voice recognition engine into Vista. It hardly gets talked about. Are you surprised that some of the things you did in Vista aren't getting more attention?
Well, when you sell a product to hundreds of millions of users, there are features that millions of users love that you can call an obscure feature because, percentage wise, it's not very many. (...) We're hard at work on the next version of Windows. We're going to take this speech stuff even further.

What about in the developing world? I imagine natural language input, you know, particularly for people who've never used a computer, has some really interesting applications.
I wouldn't go too far on that (...) but, yeah, it should work for different languages. It's particularly interesting for Japanese and Chinese where the keyboard is not as natural as it is for languages with modest-sized alphabets. And so we do see ink and voice catching on there.

There was a demo recently where there was a challenge about typists compared with voice recognition, and the voice recognition won out by quite a bit. And so there's a lot that can be done pioneering off of the demand that will come out of those markets.

You've talked a fair amount about taking on just a few projects when you step away from full-time work. Is natural language input and voice one of those areas you think you'll be spending time on?
Yeah. I'd say, broadly, the whole natural interface thing. Big screens, touch, ink, speech, that's something that I think, along with cloud computing, is the next big change in how we think about software and how it becomes more basic.

Although he plans to shift to part-time work at Microsoft, Gates has said he will keep a few key projects under his purview and suggested the natural language interface push is one he'll probably keep working on.

No comments: