Start-ups seek to move from speech recognition to true language understanding

Talking to a computer can feel liberating — as anyone who received an Amazon Alexa or Google Home device for Christmas can attest — but only until you ask the wrong question and the machine plays dumb.

Advances in speech recognition, which lie behind the “natural” feel of voice assistants like Alexa, number among the biggest breakthroughs in AI over the past five years. Like image recognition, which has followed a similar trajectory, this has given computers human-like powers of perception. It may also have opened the way to profound changes in working life, from automating call centres to replacing radiologists.

In some ways, “the effect is very exciting,” says Dave Ferrucci, an artificial intelligence expert who led the development of IBM’s Watson AI system.

But while the ability to talk to machines has brought “cool interfaces”, he says, there is a serious problem: “There’s no significant language understanding there.”

Systems that master speech can quickly run up against fundamental problems: they often lose the thread of a conversation, misunderstand the complex meaning coded in language, or fail to respond with meaningful information. Building computers that truly understand language — a task some philosophers equate with what it means to be human — will take “many decades”, says Mr Ferrucci.

That has not stopped many experts from predicting that language is about to become the next big frontier in AI.

Andrew Ng, one of the pioneers of the deep learning systems that lie at the heart of today’s most advanced AI, predicts that language understanding is on the cusp of the kind of improvements that have revolutionised speech and image recognition so far.

His prognostication does not stem from any particular technical breakthroughs, he says, but rather the “social” force that drives researchers: a field of that was an academic backwater has become a focus of serious ambition and competition.

In the meantime, Primer, a start-up in San Francisco that recently raised a first round of venture capital to take on language understanding, is one of a group of companies focused on nearer-term goals.

They are trying to tap into existing capabilities — the availability of large amounts of data, lower cost computing and refinements in machine learning algorithms — to process language for practical applications.

“Any company that needs significant breakthroughs is really just a science experiment,” says Sean Gourley, Primer’s chief executive.

Even though big advances in language understanding have so far proved elusive, these start-ups are betting on systems that can search through large bodies of text and extract useful information.

Their capabilities stand to far exceed the power of human analysts, says Daniel Nadler, chief executive of Kensho, a New York start-up that was founded four years ago with the intention of using computers to replace equity and bond analysts on Wall Street.

“The big change is understanding context, and linking the input to data in more and more complex and sophisticated ways,” he says,

That means, for instance, knowing more about the person who is asking a question in order to zero in on their likely meaning. It also involves combining data from multiple sources to provide useful answers. And, like many of today’s AI systems, Kensho relies on reinforcement learning — where humans create a feedback loop by scoring the results the system produces — to steadily improve.

The large amount of data and processing power now available to start-ups makes it possible for them to bring new techniques to bear in cracking the problem of language.

Some of them are using neural networks: systems modelled on a theory of how the connections between neurons in the brain process information. Given sufficient training with enough data, such systems can be surprisingly effective at finding patterns in complex data — even if they cannot explain the reasons for their results.

Primer, for instance, uses the technique to analyse text documents and produce short written summaries. Like all machine learning systems, it relies on a large data set to train its algorithm — in this case, 300,000 pairs of articles and human-generated summaries produced by the Daily Mail and CNN.

Techniques like these will help computers tackle tasks that cut across many different types of work. More peoples’ working lives involve dealing with the written word than visual images, points out Primer’s Mr Gourley.

His company is targeting fields like finance and intelligence that rely on large numbers of human analysts to glean information expressed in different languages. Customers include Walmart, whose analysts use Primer’s work to distil information about many different commodities and products.

Another customer, investment firm GIC, says it has been experimenting with using the system to analyse “key statements issued by central banks”, as well as hunting for changes in the wording companies use in their periodic corporate filings and earnings calls to find developments that might get missed in the deluge of information.

Although such machine learning has practical benefits, Mr Ferrucci argues that the technique, which relies on intensive statistical analysis of words, can never fully crack the problem of language. That is because meaning does not reside in the words themselves, he says, but in the heads of the people who use them; when two people have a shared view of the world, “it takes very few words to create an illusion of understanding”.

Experts suggest that means it will take a combination of AI techniques, rather than a single “silver bullet”, to master all the different cognitive traits that go into understanding language.

“What’s needed has less to do with research breakthroughs and more to do with integrating different systems into a single model,” says Kris Hammond, a professor of computer science at Northwestern University and co-founder of Narrative Science, a company that generates automatic written reports.

Mr Ferrucci, who left IBM four years ago for hedge fund Bridgewater, has also since founded Elemental Cognition, a start-up trying to build such a system.

He envisages a machine that interacts with people far more deeply than today’s AI, constantly asking questions and refining its understanding of the world, much as a child does. He predicts that it will take five years to come up with an architectural framework for such a learning system — about the time Mr Ng expects it will take before language understanding makes significant headway.

If they are right, then the next stage of the AI revolution will be far more profound than the advances made in the last half decade. But even before that, more rudimentary but effective language processing systems could become a fact of life.