The future has a voice to scatter secretaries. Google can imitate the phone

Um, I’d like to reserve something for this May, I know the voice in the phone is uncertain. The cadence of the net is that on the other hand it is not a hunter, but a robotic voice using elements of machine learning. Sundar Pichai, a Google company, showed what he thought the future of digital assistants looked like.

He knew intelligence and machine learning in the first place – this could be summed up by Sundar Pichai, a Google company, at the Google I / O Development Conference in Mountain View, California. Of course, Google uses various elements in a number of services: it has included it in text or in photo recognition.

Google Home (last

For example, Google Assistant lived in the smart speaker of Google Home

But when it comes to artificial intelligence, the focus will be on the digital assistant. It is a misunderstanding of what intelligence can do. The screenwriter cannot give all the questions and commands, and therefore, at least within the framework of some mantinels, the machine must be used for basic understanding.

Pekvapiv dvryhodn npodoba telefonistky

And then let’s get to our favorite moment of the whole Keynote (thodin’s acquaintance is on YouTube). To the moment when Pichai explains how the assistant teaches people and with the fact that some boring or monotonous matters arise. You book at cadence.

Even in the United States, 60 percent of small businesses do not have an online reservation system, Pichai said. Let’s think that intelligence can help me with this problem.

To the surprised audience, he was rehearsed by a real phone, which they recorded in an experiment with an assistant. The Google Assistant request was: book me a six-year-old in a box between ten and twelve.

Example of a call between Google Now and cadence:

Let us remember that we are understood for the surprised applause of the present audience. The voice of the pot is not known in the slightest of machines. Contains pauses, uncertain pauses, breaths, call frze or neurit and bloom m-hm. We are convinced that on the phone, a hunter on the other side should have a big problem, not talking to another hunter, but to a consumer, especially due to the prompt and meaningful answers.

Of course, the phone is so easy, of course. In the real world of telephone conversations (even those ordinary, interpersonal) there are mistakes, obedience, misunderstandings and other barriers. How do you deal with a new service called Google Duplex?

Only book reservations. The assistant teacher makes a mistake

Creating a conversion program that is indistinguishable from the hunter is extremely complex and complex (see Turing test). Google Duplex has set a very specific goal for the break: their conversations are all about predefined bikes, which people spend on the phone.

Specifically, it is a reservation of a term, because there are all the conceivable scenes to be introduced to the known area: they are free, they are not free, they are free, but for me, they are not open that day, there is no need for reservations for the day and so on.

Restaurant reservation example (order Google Duplex):

This example demonstrates Duplex’s ability to deal with the misunderstanding – it responds by repeating the request in other words and confirming that the person understood the type of pages. Duplex was able to react to the change of darkness and in the end he first understood that he had to choose for the reservation, because the restaurant was not full at that time and did not need a reservation for three people.

Natural and pleasant conversation is seldom one hundred percent effective. That’s why the development of the Duplex service has built in various courtesies, interception and call bad habits, which populate the entire communication. Digital assistant (and assistant, your own test different voices) so you have a question For how many people instead of a simple 4 answers, for those people, please.

Likewise, Duplex is programmed to repeat what he understood the other side for sure. For example, in English OK for four means to be able to drink at three o’clock, and to have a place for three people, wrong in the context of the conversation.

Acute the problem of the Duplex service with talkative people who use long and fold in place in short. For example, when are you open? Duplex got the answer in the rmci test: Well, also, from terka to thursday we are open from eleven to two, and then let’s reopen from you to nine, and then on bird, saturday and sunday or on bird and saturday we are open from eleven to nine and on Sundays from one to nine. In such a case, he knew the intelligence of the rest, not asking questions and considering the plates of different scenes applicable to the situation.


Google Duplex Response Generation Scheme: Recognition (ASR) translates voice to text. Together with the context of the conversation, the calendar, the preferred preferences, etc., the Duplex service analyzes and converts the resulting answer from text to e (TTS).

From these scenes, select Google Duplex using neurons (RRN) built on the TensorFlow Extended platform.

h, ummm, jasaka!

Google uses the synthesis of Tacotron and WaveNet to generate the voice. The programmers have developed a way to insert eh, hm, aha words into the resulting ei. At the same time – like people – play time in case the jet system evaluates the input from the other side. The call is as natural and fluent – the longest delay is a tenth of a second, which is a huge step forward compared to a conversation with Alex, Siri or Google Assistant.

Example of booking a table in a restaurant (note the insecure words):

In addition to booking an appointment, Google wants to use this conversion service to provide additional information. For example, opening hours of a series of businesses on Google Maps have a problem – usually do not reflect the holidays. People cannot know whether they can rely on these data in the world, and they can not call them to the store. Google is implementing Duplex to automatically complete business before the holidays: Do you have a Monday, how is the holiday open? According to the answer, Google fills in the data in the maps, which according to Google uet and users and businesses.

The future or pechodn fze?

As soon as the service is usable, it takes five months. The fact that we did not see a live demonstration, but only the acquaintances of selected samples, testifies to the fact that they cannot be 100% sure of their development. Google will test the promise dog lto dl. But now the results – and certainly selected – are unexpectedly realistic.

It will be interesting to see how this service goes into communication. Because cadence and restaurants can also use a similar digital assistant. Will the machines then get the information about orders and reservations human? Or is it just a temporary inefficient reservation, and in the future one robotic assistant will communicate with another with the help of clearly and precisely defined requirements?

In the future, will the question be, please, to talk to the hunter or the pot? mon bite for telephone conversations.

Update: We have added an explanation and an explanation to the link.

Did you know in Google’s Google Duplex phone that this is a voice generated?

total vote:

The vote is over

they voted in
savings on April 16, 2018. The poll was closed.

I would not know that it is a synthesis


the voice was surprisingly realistic, but revenge is clear that it is TTS


ehm, nevm, e is about synthesis, urit it,, is a hunter


yes, I knew at once that it was a synthesis


after a while it was clear to me that it was Pota’s voice