Current location - Loan Platform Complete Network - Big data management - Yun Yun Big Data Company
Yun Yun Big Data Company
Big data can talk, simple machine learning problems.

What is learning from data? Scientists learn from data, as do businesses, governments and charities. In fact, no matter in private, public or charitable fields, there is almost no field that does not deploy data-driven models to explore and utilize relationships in data.

We are in the data. Amazon has 25,000 sales/deliveries every day, 65,438+10,000 genes are sequenced almost simultaneously, and more than/kloc-0,000,000 pictures are stored on the webpage. Within a few months, Britain's national health service digitized 60 million health records. We use data every day, and many people also use data in the process of paying for work. An analyst of a marketing company must decide what factors need to be included in his audience/audience selection model. Researchers from the local health department measured the incidence of seasonal influenza. Meteorologists run climate models to calculate the possibility of precipitation, the change of temperature and the percentage of cloud cover.

The public sector and some companies need to turn massive information into operational strategy public/business decisions. Learning from data provides a series of practical techniques and tools to help develop a powerful inductive model to extract useful insights from data. The simple meaning of induction is that opinions come from empirical data, not from the principle of theory first.

The main goal of this article is to help you turn a large amount of data into usable knowledge. To this end, we will use theory to reshape the thinking mode of data science challenge. However, this paper is not a textbook devoted to lemma, proof and abstract theoretical details. It is for readers who want to get an important and successful framework to build a useful forecasting and analysis model, so as to improve the operating mode and increase profits for the organizations they work for and the customers they serve. At the same time, we must understand that the profession of data science is not suitable for people who lack curiosity or technical ability, and any profession dealing with empirical data is also not suitable.

In this paper, you will learn the key differences between inductive reasoning and deductive reasoning, determine the three elements of learning problems, and find a clear framework for using inductive models.

1. 1 The basis of inductive reasoning and deductive reasoning

Figure 1. 1 shows the key difference between induction and deduction around hypothesis testing. Both methods begin with observing interesting phenomena, but inductive methods are more concerned with choosing the best prediction model. The deductive method pays more attention to the exploration theory, mainly combining data to test the hypothesis of a certain theory. Judging whether this hypothesis is accepted or rejected according to the "weighty evidence" of empirical data.

Figure 1. 1 induction and deduction

1. 1. 1 Have you ever encountered these things?

I remember in my theoretical economics class, the professor severely warned: "You can't trust the data." Perhaps, this kind of experience does not only appear in my class. A famous econometric professor once explained 1: "There is a general view in economics that if the current empirical evidence is not credible or economic phenomena cannot be predicted, it is mainly because the economy is too complicated and the data generated is too chaotic to establish a statistical model." Maybe you have had a similar experience.

However, when I left the classroom and stepped into the real world of empirical analysis, I found that using data-driven induction would bring me significant conclusions as long as I had enough data and suitable tools.

Note: In every conceivable field-commerce, industry and government, a successful data-driven inductive model already exists or is being established. Data decision-making models are increasingly used for decision-making, such as smart phones that can recognize your voice, robots that perform surgery, and nuclear explosion detection.

1. 1.2 release the power of induction.

Whether you work in these fields, such as medical diagnosis, handwriting recognition, market, financial prediction, bioinformatics, economics, or any other professional field that needs empirical analysis, you will often face such a situation that the potential first principles have not been discovered, or the system being studied is too complicated to provide useful results through sufficiently detailed mathematical description. I find data-driven induction useful in all the above situations, and you will agree with this.

Note: Beyond science, deductive analysis may occupy the peak position in economics, and most of its focus (even today) revolves around testing and evaluating the economic effectiveness of deductive theory. In fact, economists' desire for objective verification theory gave birth to a new statistical student discipline-econometrics.

1. 1.3 Inferring the Way of Yin and Yang

Although induction and deduction are quite different, they can actually complement each other. It is unusual for a researcher to plan a project that contains inductive and deductive elements.

If you have worked in the field of empirical modeling for a long or short time, you are likely to find this situation: you plan to carry out an inductive or deductive project, but I didn't expect that you will find other more suitable methods to sort out your research problems as time goes by. Remember that the use of induction or deduction depends in part on your data analysis goals.

Note: The relative decline in the superiority of deductive reasoning can be partly explained by the high success of data-driven model. Italian scholars Matteo Pardo and Giorgio Sberveglieri correctly observed 6 more than ten years ago: "At present, there has been a paradigm shift from classical modeling following the first principle to developing data modeling." Interestingly, it should be noted that the shortage of data modelers is a worldwide problem.

1.2 Three elements of learning problems

Our discussion begins with the basis of learning problems. For example, the supervised classification problem, the data we get is a real-valued attribute-response pair (x, y). Three elements constitute the basic learning problem.

1.3 the goal of learning from data

note:

Inductive bias is a key factor in data science practice, as Jonathan Baxter of London School of Economics explained: "In machine learning, the most important thing is to learn the pre-bias of machine hypothesis space, which should be small enough to ensure the good generalization (predictive ability) of reasonable training set and large enough to contain good solutions to learning problems."

1.3. 1 defines the selection criteria.

1.3.2 learning task selection

Now that our learning framework is ready, we can turn our attention to the real tasks that data scientists need to perform. Fortunately, it turns out that learning from data can be accurately divided into three basic types of work:

Classification or estimation of (1) decision boundary. For example, eggs are sorted by size and color on the assembly line.

(2) Regression or estimation of unknown continuous function. For example, predict the average box office value created by local music festivals.

(3) Estimation of probability density. For example, estimate the density of pike in Irish coastal rivers.

This paper will mainly discuss the classification problem from beginning to end, because it is the most frequent task that data scientists face. However, the lessons we have learned apply to all three types of tasks.