Able to make machines understand human language, or imitate human language is the original fantasy of artificial intelligence, so in the early days, the Turing test once became the standard for judging artificial intelligence.
Dialogue and translation apply the natural language processing (Nature Language Processing, or NLP) part of the many disciplinary branches of AI, which aims to solve the problem of communication between humans and machines, and is the genesis of AI processing, which still faces many problems.
Taking the dialog system as an example, all the giants on the market have launched their own intelligent voice assistants, but few of them can completely get rid of the suspicion of "retardation".
It can be said that on this track, everyone is running fast. But in spite of this still insist on running, even Siri, long trapped in the phone, also want to launch their own smart speakers.
"Although the current situation is not too optimistic, but keep running, will always see results." Stick with it for another 5-10 years and natural language processing will see significant growth.
The first layer is the basic technology:Segmentation, lexical labeling, semantic analysis.
The second layer is the core technology:representation of words, phrases, sentences, and chapters. Including machine translation, question and answer, information retrieval, information extraction, chat and dialog, knowledge engineering, language generation, recommender systems.
The third layer is "NLP+": modeled after the concept of "Artificial Intelligence+" or "Internet+", it is actually the deepening of natural language processing technology into various applications and vertical fields. Some of the more famous ones are search engines, intelligent customer service, business intelligence and voice assistants, and there are more applications in various verticals - legal, medical, education and so on.
On the third layer of "NLP +", there are a lot of voice assistants on the market, large and small, from Microsoft graduated two: Xiaona (Cortana) and Ice. Although both are voice assistants, but the two are still somewhat different.
In fact, regardless of the small ice such small talk, or small na this focus on task execution technology, behind the unit processing engine is nothing more than three layers of technology.
The first layer: general chat, you need to master communication skills, general chat data, topic chat data, but also know the user profile, to cast their preferences.
The second layer:Information service and Q&A, need the ability to search, Q&A, but also need to collect, organize and search the FAQ table, find out the corresponding information from the knowledge charts, documents and diagrams, and answer the questions, which are collectively known as Info Bot.
The third layer:Conversation ability oriented to a specific task, for example, ordering a coffee, ordering flowers, and buying a train ticket. The tasks are fixed, the state is also fixed, the state transfer is also clear, it can be realized with Bot one by one. Through a scheduling system, the user's intention calls the corresponding Bot to execute the corresponding task. The technology it uses is the understanding of the user's intent, the management of the dialog, domain knowledge, dialog mapping and so on.
In addition to creating Xiaona Ice, Microsoft wants to technology release so that developers can develop their own Bot. if the developer's machine does not understand natural language, this can be realized through a tool called Bot Framework.
Any developer with just a few lines of code can complete their own needs through the Bot Framework Bot. for example, someone wants to do a pizza delivery Bot, you can use the Bot framework to fill in the appropriate knowledge, the appropriate data, you can realize a simple Bot. many small business owners who do not have the ability to develop, through a simple operation, you can make A small Bot attracts a lot of customers.
There are a lot of key technologies for Ice in this open source platform. Microsoft has a platform called LUIS (Language Understanding Intelligent Service), which provides the ability to understand the user's intent, entity recognition, management of dialog, and so on.
For example, the phrase "readme the headlines" is recognized as a result of reading aloud, and the content is today's headlines. Another example is "Pausefor 5 minutes", the result of the recognition is to pause, how long to pause? There is a parameter: 5 minutes. With LUIS, I can extract the intent and important information for the Bot to read.
These conversations, which for humans don't even require brainstorming, are on another level of difficulty for machines.
Dr. Ming Zhou believes that there are four levels of AI, from bottom to top: Computational Intelligence, Perceptual Intelligence, Cognitive Intelligence, and Creative Intelligence.
Operational intelligence has reached a very high level, feel what the world's top Go player said about AlphaGo.
Second is perceptual intelligence, mainly in the auditory, visual and tactile aspects, that is, we usually say voice technology, image technology. Voice technology is used much more, such as let Siri understand what you say, image recognition is mainly used in face recognition, like to follow the technology trend of the company will usually change the access control to face recognition.
Cognitive intelligence is the focus of what we are talking about today, mainly including language, knowledge and reasoning. What's the importance of language?Siri can't just recognize what you're saying, it needs to respond based on what you're saying, and that's when it needs to understand what you're saying.
Creating intelligence is the highest form, when AI has imagination.
In computing and speech and image recognition, machines have been able to achieve a high degree of accuracy, and the main gap now is in cognitive intelligence. In the past cognitive intelligence is mainly focused on natural language processing, which simply understands the sentence, chapter, realized to help search engines, imitation system to provide some basic functions, provide some simple dialogue translation.
For the future development of speech intelligence, Dr. Zhou Ming believes that there are several directions:
First, with the three elements of big data, deep learning, cloud computing to promote, spoken machine translation will be completely popular.
Second, natural language conversation, chat, Q&A, and dialog reach a practical level.
Third, the perfect combination of intelligent customer service plus artificial customer service will definitely greatly improve the efficiency of customer service.
Fourth, automatic writing of couplets, poems, press releases and songs, etc.
Fifth, in terms of conversation, voice assistants, Internet of Things, smart hardware, smart homes, etc., anything that uses human-computer interaction can basically be applied.
Finally, in many scenarios, such as law, medical diagnosis, medical counseling, legal counseling, investment and financing, and so on, these aspects of natural language will be widely used.
Of course, the current natural language now faces many dilemmas now. One of the most critical is how to fully utilize unlabeled data through unsupervised learning. Now all rely on labeled data, there is no way to take advantage of the unlabeled data. However, in many scenarios, there is not enough labeled data, and it is extremely costly to find manual labeling. Reprinted from the machine people, hope it helps you.
So how to use these unlabeled data? This is done by augmenting the overall learning process with a so-called unsupervised learning process, or a semi-supervised learning process.
Give NLP some more time, and the voice assistant might be able to convince you that it's actually AI.