Current location - Loan Platform Complete Network - Loan intermediary - What data mining methods are used to predict future behavior based on previous data?
What data mining methods are used to predict future behavior based on previous data?
Data mining is a method and technology to discover potential laws and extract useful knowledge from a large number of data. Because it is closely related to the database, it is also called knowledge discovery in the database (KDD), that is, advanced intelligent computing technology is applied to a large number of data, so that computers can discover potential and useful patterns (also known as knowledge) from massive data with or without guidance.

Broadly speaking, any process of mining information from a database is called data mining. From this perspective, data mining is BI (Business Intelligence). But in technical terms, data mining refers to cleaning the source data and transforming it into a data set suitable for mining. Data mining completes knowledge extraction on this fixed data set, and finally makes further analysis and decision using appropriate knowledge patterns. In this narrow sense, we can define that data mining is a process of extracting knowledge from a specific form of data set. Data mining often chooses one or more mining algorithms for specific data and specific problems, and discovers hidden rules under the data, which are often used to predict and support decision-making.

The main functions of data mining

1. Classification: According to the attributes and characteristics of the analyzed object, different groups are established to describe things. For example, the banking department divides customers into different categories according to the previous data, and now we can distinguish new customers applying for loans according to these, so as to adopt corresponding loan schemes.

2. Clustering: Find out the inherent laws of analysis pairs, and divide the objects into several categories according to these laws. For example, applicants are divided into high-risk applicants, medium-risk applicants and low-risk applicants.

3. Discovery of association rules and sequence patterns: Association is such a connection that when something happens, other things will happen. For example, people who buy beer every day may also buy cigarettes, and the proportion can be described by the support and credibility of the association. Different from association, sequence is a vertical association. For example, if banks adjust interest rates today, the stock market will change tomorrow.

4. Forecast: Grasp the development law of the analysis object and make a forecast for the future trend. For example: the judgment of future economic development.

5. Deviation detection: a description of a few extreme special cases of the analysis object, revealing the internal reasons. For example, there are 654.38+00,000 transactions in the bank, and there are 500 frauds. In order to operate steadily, banks should find out the internal factors of these 500 cases and reduce the risk of future operations.

It should be noted that the functions of data mining do not exist independently, but are interrelated and play a role in data mining.

Methods and tools of data mining

As a new technology for data processing, data mining has many new features. First of all, data mining faces a huge amount of data, which is also the reason for data mining. Secondly, the data may be incomplete, noisy and random, with complex data structure and large dimensions. Finally, data mining is the intersection of many disciplines, using the technology of statistics, computer, mathematics and other disciplines. The following are common and widely used algorithms and models:

(1) traditional statistical methods: ① sampling technology: we are faced with a large number of data, and it is impossible and unnecessary to analyze all the data, so we should conduct reasonable sampling under the guidance of theory. ② Multivariate statistical analysis: factor analysis, cluster analysis, etc. ③ Statistical forecasting methods, such as regression analysis and time series analysis.

(2) Visualization technology: use charts and other means to express data features intuitively, such as histograms. , which uses many methods to describe statistics. One of the difficult problems faced by visualization technology is the visualization of high-dimensional data.

Professional ability requirements

Basic ability requirements

Data miners need to meet the following basic conditions in order to complete the related tasks in data mining projects.

I. Professional skills

Master degree or above, major in data mining, statistics and database, proficient in relational database technology, and experienced in database system development.

Familiar with commonly used data mining algorithms

Have the theoretical basis of mathematical statistics and be familiar with commonly used statistical tools and software.

Second, the industry knowledge

Have relevant industry knowledge, or be familiar with relevant industry knowledge soon.

Third, the spirit of cooperation.

Have good team spirit and be able to work closely with other project members on their own initiative.

Fourth, customer relationship ability.

Have good customer communication skills, be able to clearly explain the key points and difficulties of data mining projects, and be good at adjusting customers' misunderstandings and excessive expectations of data mining.

With good knowledge transfer ability, model maintainers can understand and master data mining methodology and modeling realization ability as soon as possible.

Advanced functional requirements

Data mining personnel have the following conditions, which can improve the implementation efficiency of data mining projects and shorten the project cycle.

Experience in data warehouse project implementation, familiar with data warehouse technology and methods.

Proficient in SQL language, including complex queries and performance tuning.

Familiar with ETL development tools and technologies.

Proficient in Microsoft office software, including various statistical graphics technologies in Excel and PowerPoint.

Be good at combining the mining results with the customer's operation and management, and provide valuable and feasible operation schemes for customers according to the results of data mining.

Application and employment fields

At present, the applications of data mining are mainly concentrated in telecommunications (customer analysis), retail (sales forecast), agriculture (industry data forecast), web logs (web page customization), banks (customer fraud), electricity (customer calls), biology (genes), celestial bodies (star classification), chemical industry, medicine and so on. At present, the typical problems it can solve are database marketing, customer segmentation & etc. Market analysis behaviors such as classification, summary analysis and cross-selling, as well as customer churn analysis, customer credit score and fraud detection, have been successfully applied in many fields. If you visit the famous Amazon online bookstore (), you will find that when you choose a book, there will be many related recommendations, "customers who bought this book also bought it", which is the role of data mining technology.

The object of data mining is the data accumulated in a certain professional field; Mining process is a process of human-computer interaction and repetition. The results of mining should be applied to this major. Therefore, the whole process of data mining is inseparable from the professional knowledge in the application field. "Business first, technology second" is the characteristic of data mining. Therefore, learning data mining does not mean giving up the original professional knowledge and experience. On the contrary, having other industry backgrounds is a big advantage of data mining. If you have sales, finance, machinery, manufacturing, call center and other work experience. You can improve your professional level by learning data mining, and change from the original transactional role to the analytical role without changing the original major. From its appearance in the late 1980s to its wide application in the late 1990s, business intelligence (BI) with data mining as its core has become the new favorite of IT and other industries.

Data acquisition and analysis expert

Job Description: The main responsibility of the data acquisition and analysis specialist is to collect the data of the company's operation, and then mine regular information from it to guide the company's strategic direction. This position is often overlooked, but it is quite important. Because the database technology first appeared in the computer field, and the computer database has the characteristics of mass storage, rapid search and semi-automatic analysis, the data acquisition and analysis specialist first appeared in the computer industry, and later expanded to various industries with the popularization of computer applications. This position is generally provided to people who understand database application and have certain statistical analysis ability. Statistical professionals with computer expertise or computer professionals who have studied data mining can be competent for this job, but it is best to have a certain understanding of the market situation of their industry.

Job-hunting suggestion: Because many companies pursue short-term interests and don't pay attention to long-term strategy, many domestic companies don't pay enough attention to this position at present. However, large companies and foreign companies attach great importance to this position, and with the passage of time, this position will heat up. In addition, data acquisition and analysis specialists can easily gain industry experience, and they can easily grasp the key conditions of the industry, such as market conditions, customer habits, channel distribution and so on. So if you want to start a business in a bank, it is a good choice to start with a data collection and analysis specialist.

Market/data analyst

1. Market data analysis is an indispensable key link in modern marketing science: direct customer-oriented marketing, the industry where marketing/data analysts work the most, has become the main means for companies to promote products since the 1990s. According to the statistics of Canadian Marketing Association, direct selling created 470,000 jobs in 1999. From 1999 to 2000, the number of jobs increased by 30,000. Why do direct selling need so many analysts? For example, with the intensification of business competition, companies hope to get the maximum sales return from advertisements, and they hope that more users will respond to their advertisements. So they must do a lot of market analysis before placing advertisements. For example, according to their own products combined with the family income, educational background and consumption trend of customers in the target market, analyze which areas families or residents are most likely to respond to the company's sales advertisements, buy their own products or become customers, so that the advertisements are only targeted at these specific customer groups. This targeted screening of advertisements in the market not only saves money, but also improves sales returns. However, all these analyses are based on database, and through data processing, mining and modeling, the work of market analysts is essential.

2. Strong industry adaptability: Almost all industries will use data, so as a data/market analyst, you can not only be employed in the traditional IT industry in China, but also serve in government, banking, retail, medicine, manufacturing, transportation and other fields.

Present situation and prospect

Data mining is a new subject to meet the needs of information society and extract information from massive databases. It is the intersection of statistics, machine learning, database, pattern recognition, artificial intelligence and other disciplines. Key universities in China have set up data mining courses or research topics. The famous ones are Institute of Computing, Chinese Academy of Sciences, Fudan University and Tsinghua University. In addition, government agencies and large enterprises have begun to pay attention to this field.

According to IDC's investigation and analysis of 62 enterprises in Europe and North America that have adopted business intelligence technology, it is found that the average return on investment of these enterprises in three years is 40 1%, and the return on investment of 25% of them exceeds 600%. The survey results also show that if an enterprise wants to succeed in a complex environment, top managers must be able to control the extremely complex business structure, which is very difficult without detailed facts and data support. Therefore, with the continuous improvement and maturity of data mining technology, it will be adopted by more users and make more managers gain more business intelligence.

According to IDC's forecast, the market size of BI industry in 2004 is estimated to be $654.38+0.4 billion. Now, with China's accession to the WTO, China will gradually open to the outside world in many fields, such as finance and insurance, which means that many enterprises will face great competitive pressure from large international multinational companies. The level of business intelligence adopted by various enterprises in foreign developed countries has far surpassed that of China. Palo Alto Management Group 1999 investigated the adoption of business intelligence technology by 375 large and medium-sized enterprises in Europe, North America and Japan. The results show that the application level of business intelligence technology has reached or approached 70% in the financial field and 50% in the marketing field. In the next three years, the application level of this technology in all application fields will increase by about 50%.

Nowadays, many enterprises regard data as valuable wealth, and use business intelligence to discover the hidden information, thus obtaining huge returns. At present, there is no official market statistical analysis report on the data mining industry itself in China, but domestic data mining has been studied in various industries. According to the prediction of foreign experts, in the next 5- 10 years, with the increasing accumulation of data and the wide application of computers, data mining will become an industry in China.

As we all know, the competition in the IT job market has been very fierce, and the core technology of data processing-data mining has received unprecedented attention. Data mining and business intelligence technology are located at the top of the pyramid of IT- business structure of the whole enterprise. At present, the talent training system of data mining specialty in China is not perfect, and the supply of proficient data mining technology and business intelligence in the talent market is extremely small. On the other hand, enterprises, government agencies and scientific research institutions have huge potential demand for such talents, and there is a huge gap between supply and demand. If you can combine data mining technology with your existing professional knowledge, you will definitely open up a new world in your career!

Professional salary

At present, the talent demand of data warehouse and data mining in China, like most IT positions, is low-end and high-end, second-line mature, especially high-end data warehouse and data mining. High-end data warehouse and data mining talents need to be familiar with many industries, have at least 3 years experience in large-scale DWH and BI, be fluent in English reading and writing, and have the ability to promote projects. Such talents can earn more than 200 thousand a year.

professional accreditation

1, application industry and career prospect of SAS certification

SAS Global Professional Certification is an internationally recognized authoritative certification in the field of data mining and business intelligence. With the maturity of IT environment and application in China, there will be great room for industry development in these two fields. Obtaining the global professional certification of SAS will lay a good foundation for you to accumulate rich experience in the field of data mining and analysis methodology and help you open up a new world of career development.

2. The validity period of 2.SAS certification

At present, there is no specific validity period for the five-level SAS certification, but the certification certificate with too long time or too old version will depreciate.

3, the relationship between five levels of certification

Five-level certification is a progressive relationship, that is, you can take the next level certification examination only after passing the upper level examination subjects.

4, SAS global certification examination method

The exam is computer-based and lasts for 2 hours, including 70 objective questions.

Related links

With the overall rapid development of China's logistics industry, the construction of logistics informatization has also made some progress. No matter in IT hardware market, software market or information service market, the logistics industry has a certain investment scale, with a total investment of 2-3 billion yuan in the past two years. The government's active support for the development of modern logistics industry and the intensification of competition in the logistics market have effectively promoted the steady development of logistics information construction.

The latest report of Analysys International, China Logistics Industry Informatization Annual Comprehensive Report 2006, points out that the logistics industry in China is changing from the traditional mode to the modern mode, and the modern logistics mode will guide the information demand of the logistics industry, and the basic driving force of this change comes from the market demand. The data in the report shows that from 2006 to 20 10, the IT investment scale of traditional logistics enterprises will exceed10 billion yuan. In 2006-20 10, the IT investment scale of third-party logistics enterprises will exceed 2 billion yuan.

At present, industrial application software systems put forward higher application requirements for the hardware of terminal equipment at the operational level, but the integration of software and hardware is generally not ideal, so enterprises will put forward higher requirements for the integration of software and hardware equipment.

The research and development of software system in logistics industry will give more consideration to operational research and data mining technology, and professional service providers will be more conducive to solving research and development problems.

The theoretical basis of logistics science comes from operational research, which attaches great importance to finding correlation in complex data processing (based on cost-service level system), so data mining technology is more important for related software systems.