Basic Concepts About Big Data and Machine Intelligence

Big Data and Artificial Intelligence have been a hot topic of hype and discussion, but what exactly is Big Data? How about the machine is considered to have intelligence? Is it that a large amount of data is big data? The first thing you need to do is to get a good understanding of the concept of artificial intelligence, and how it works.

The so-called machine intelligence is commonly referred to as the machine (more often referred to as a computer) can do things that only people can do. How to determine that a machine has intelligence?In 1950, Dr. Turing proposed a test method, the Turing Test ----- to let the machine and the person at the same time to hide up to answer the question, if the questioner can not tell whether the machine is answering the answer or the person is answering the answer, then the machine has intelligence.

Along the lines of the Turing test, computer scientists believe that a computer is considered intelligent if it can do the following:

1. Speech recognition: this is as if a human being is able to understand a language

2. Machine translation: this is as if a human being is able to read a text

3. Automated summarization of or writing of a text: this is as if only a human being knows how to get to the point and assemble meaningful paragraphs, articles

4. Chess champions who beat humans: but actually it is quite normal for computers to be able to outperform humans in things like chess with closed rules. Because computers can quickly calculate and judge the best moves and are not affected by emotions and other circumstances. So personally I don't think this can represent the computer has intelligence

5. automatically answer the question: this is like a person can understand the language and give the answer according to the understanding

All along, scientists in letting the machine to have intelligence, more efforts put on how to let the machine with the same thinking as a person, the history of the so-called machine intelligence 1.0 Bird Fly School (the traditional method of machine intelligence) ---- let the machine Think like a human being to gain intelligence. But the results are not objective, after 20 years of development, this method has encountered a great bottleneck.

Until 1970 Jalinik used the idea of communication to solve this problem: the establishment of mathematical models, and through machine learning to continuously train the model. This pioneered a data-driven approach to solving the problem of intelligence. Jarenik pioneered the use of statistical methods of speech recognition system than the traditional method of speech recognition recognition rate increased from 70% to 90%, so that speech recognition from the laboratory research towards practical applications.

So how do traditional methods and Jalinik's methods achieve speech recognition?

The traditional approach is to organize syntax and semantics to form rules, and when a sentence is input, the computer recognizes the speech by matching it to the syntax and semantics. This is like when we learn English, we need to know the pronunciation, the meaning of the words, and the grammar to understand a sentence.

Jalinik's approach is: use a Markov model to describe the source and channel, there are many parameters in the model, and then use the data to train the optimal parameter values, and finally get the best results (what are the specific parameters?). How is it trained? How to convert after training involves a lot of knowledge, not detailed).

As you can see, the data-driven approach completely abandoned the traditional approach based on like a person, completely dependent on the model and the training of the model (the process of training the model is the process of machine learning).

As can be seen above, the data-driven approach to realizing intelligence relies on machine learning, which is dependent on the data available for learning.

Although Jalinik pioneered new ways of achieving intelligence, in many fields the development of machine intelligence did not improve greatly because the amount of data accumulated was not sufficient to support the need for training. Machine translation, for example, didn't continue to improve in accuracy until the rise of the Internet in the 1990s, because the Internet accumulated a large amount of translation data available for training, making it possible to continually revise models through machine learning.

Big data promotes the development of machine intelligence because of the multi-dimensional and complete characteristics of big data. Multi-dimensional, complete data allows the computer to learn all the situations, and then deal with the problem when it can deal with all the scenarios. For example, machine translation, big data contains all possible translations of statements, which allows the computer to learn all possible translation situations, when the need to translate, just match the results.

When it comes to big data, we all know its 3v characteristics: vast, variety, velocity

First understand the last two characteristics:

1. variety: variety: variety refers to the fact that the data contains different aspects. For example, if the data describes a person, variety means that it can describe all aspects of that person from their looks, to their life, to their spirit, and so on. Having different aspects means that the data can be abstracted into different dimensions, and then the different dimensions can be linked together in random combinations, so that you can get results that are not available from a single perspective.

2. velocity: complete: completeness means that the data covers the full range of possibilities. Unlike statistics, where you can only predict the whole by samples, big data itself is the whole set.

With the first two features, it is not difficult to understand the characteristics of the vast amount of vast: covering all dimensions, including all the possibilities of the collection of data, of course, the amount of data is very large.

These three characteristics of big data can be said to be indispensable, the lack of any one, will not be able to play the power of big data, but also can not make big data to promote the realization of machine intelligence.

1. The generation of data: 1. The increasing degree of global digitization, so that a lot of data to achieve the electronic (such as paper office to computer office); digitalization makes a variety of information systems continue to be developed and used and the complexity of the increasing degree of the operation of the system all the time is not in the generation of data. 2. The large number of applications and popularization of the sensor technology, including commodities on the rfid chip, Traffic sensors, wearable devices, etc. 3. Digitization of non-digitized content, such as converting paper books into e-books. 4. The development of the Internet 2.0 has made everyone generate data every day, issued by the circle of friends, articles, comments, and so on.

2. Data storage: more and more channels of data generation, the amount of data is also growing, Moore's Law guides the development of the semiconductor industry so that the capacity of the memory continues to grow, the price continues to decrease, which makes so much data storage can be stored at low cost.

3. Data reading: If a large amount of data is stored, but the computer's processing (input and output) speed can not keep up, and can not use the data, the SSD capacity has become larger, lower cost makes it possible to use so much data.

4. Transmission of data: After the data is generated from each generator (e.g., sensors), how is it transmitted to the memory (e.g., servers) to be stored, and the development of the fourth-generation lte and WiFi makes transmission no longer a problem.

5. Data processing: how to analyze the use of such a large amount of data, you need to processor with high processing power, although the performance of the processor to follow Moore's law, doubled every 18 months, but the speed of data generation is far more than the processor performance of the increase. Thus it is not possible to process big data with a single processor. The emergence of parallel computing technology solves this problem (but parallel technology itself is subject to the limitations of switches, network speed, and other conditions, and Google and other companies made great progress in solving these problems in 2002, making cloud computing began to emerge)

The advancement and development of data-generation, storage, and processing technologies make it possible to use big data, and when conditions are ripe, big data naturally emerged and developed.

Is it true that with big data you can realize machine intelligence without any problems? Obviously, to realize machine intelligence, we need to have complete data, to be able to deal with complete data. Although the technology of data storage and processing is constantly developing, but in the process of practical application, there are still great limitations, these technical conditions are insurmountable conditions:

1. The collection of big data: the key is how to obtain complete and diverse full set of data? Especially how to get the data of some uncommon scenarios?

2. Data storage: the key is that the growth of data volume is greater than the growth of memory, and what kind of structure is used for storage to make it easy to read and use? (How to abstract so many dimensions? How to retrieve it?)

3. Data **** enjoyment: the completeness of big data, making it difficult for a single company to collect all the data, which requires the collection of data collected by different companies to be used (for example, companies engaged in e-commerce have data on purchases, and companies engaged in the travel aspect of the travel aspect of the data, but no one company can collect data on both aspects at the same time). Different companies store and use data in different ways, how to harmonize the data format to achieve *** enjoyment and *** use when it is to be aggregated?

4. Parallel computing: some special scenarios can not be parallel computing, which leads to the final result of the entire calculation needs to wait for the processing of special cases; different calculators have different computational efficiencies, and the entire task processing is determined by the slowest calculation results; therefore, parallel computing is not just a simple matter of adding more servers, but also need to optimize the storage structure of the data and the entire algorithmic process of the calculation.

5. Data mining: cluttered ultra-large amounts of data can not be used directly, need to be cleaned and formatted first, when the amount of data reaches a certain level of volume, this step becomes not easy; especially when the noise is high, the results of the cleaning and processing directly affects the effectiveness of the application; the large amount of data, the complexity of the learning model, so that the process of machine learning has become a long time, and the higher the requirements of parallel computing. .

So, when you hear AI and big data again, you will be able to judge whether it is true intelligence or false intelligence, and whether it is true big data or false big data.

Colliding with the sparks of the digital economy era, listening to the "14th Five-Year Plan" energy and power voice

Zhengzhou Zhengzhou Evergrande Royal Lake Where is the world?

Should all boys be circumcised before marriage?

What does it mean to dream about red and white things together?

What are the nosql databases

Will Public Security Bureau staff call people returning to Sichuan from Shanghai in February?

What does Shanghai surveillance camera mean?

Ask an expert to help me see if this 22-year-old ship is real or not, and what is its appearance? Which version is it? Market price? Please comment in detail.

Toplite M2O model to test the waters of the big data era

What is volume