Broadly speaking, any process of mining information from a database is called data mining. From that point of view, data mining is BI (business intelligence). But from the technical terminology, data mining (Data Mining) specifically refers to: source data after cleaning and transformation, etc. to become suitable for mining data sets. Data mining in this fixed form of data sets to complete the refining of knowledge, and finally to the appropriate knowledge model for further analysis and decision-making work. From this narrow point of view, we can define: data mining is the process of refining knowledge from a specific form of data set. Data mining tends to select one or more mining algorithms for a specific data, a specific problem, to find the hidden laws underneath the data, which are often used to predict and support decision-making.
The main functions of data mining
1. Classification: according to the attributes and characteristics of the analyzed object, establish different groups of classes to describe things. For example, the banking sector has divided its customers into different categories based on previous data, and now it can differentiate between customers applying for new loans based on these in order to take the appropriate loan programs.
2. Clustering: Identify the rules inherent in the analysis of the pair, according to which the object is divided into a number of categories. For example: the applicant is divided into high risk applicants, medium risk applicants, low risk applicants.
3. Discovery of association rules and sequential patterns: An association is such a connection that something happens when something else happens. For example, how much weight is given to the fact that people who buy beer every day are also likely to buy cigarettes can be described by the support and confidence of the association. Unlike an association, a sequence is a longitudinal connection. For example: today the bank adjusts interest rates, tomorrow the stock market changes.
4. Prediction: to grasp the law of the development of the object of analysis, to make predictions about future trends. For example: the judgment of future economic development.
5. Detection of deviations: the analysis of the object of a few, the description of extreme exceptions, to reveal the underlying causes. For example: in the bank's 1 million transactions in 500 cases of fraud, the bank in order to operate soundly, it is necessary to discover the intrinsic factors of these 500 cases, to reduce the risk of future operations.
It should be noted that: the functions of data mining are not independent of each other, in the data mining interconnected, play a role.
Methods and Tools of Data Mining
As an emerging technology for processing data, data mining has many new features. First of all, data mining is confronted with huge amount of data, which is the reason why data mining was created. Secondly, the data may be incomplete, noisy, random, with complex data structures and large dimensions. Finally, data mining is the intersection of many disciplines, using techniques from statistics, computers, mathematics, and other disciplines. The following are common and most widely used algorithms and models:
(1) Traditional statistical methods: ① Sampling techniques: we are faced with a large amount of data, it is impossible and unnecessary to analyze all the data, it is necessary to carry out reasonable sampling under the guidance of theory. ② Multivariate statistical analysis: factor analysis, cluster analysis and so on. ③ statistical prediction methods, such as regression analysis, time series analysis, etc..
(2) visualization technology: the use of charts and other ways to characterize the data with intuitive expression, such as histograms, etc., which uses a lot of descriptive statistical methods. Visualization techniques face a difficult problem is the visualization of high-dimensional data.
Professional Competency Requirements
Basic Competency Requirements
Data miners need to have the following basic requirements to be able to accomplish the tasks involved in a data mining project.
I. Professional Skills
Master's degree or above, majoring in data mining, statistics, database related majors, proficiency in relational database technology, experience in database system development
Proficiency in commonly used data mining algorithms
Possesses the theoretical basis of mathematical statistics and is familiar with commonly used statistical tool software
II. Industry Knowledge
Possess relevant industry knowledge, or be able to quickly familiarize with relevant industry knowledge
Three, cooperation spirit
Have a good team spirit, be able to take the initiative to work closely with other members of the project
Four, customer relationship skills
Have good customer communication skills, be able to clearly articulate the key points and difficulties of data mining projects, and be good at adjusting the customer's understanding of the project
Customer relations ability
Have good customer communication skills, be able to clearly explain the focus and Difficulties, good at adjusting the customer's misunderstanding of data mining and excessive expectations
Good knowledge transfer ability, able to let the model maintenance staff as soon as possible to understand and master the data mining methodology and modeling implementation capabilities
Advanced ability to require
Data mining personnel with the following conditions, you can increase the efficiency of the implementation of data mining projects, shorten the project cycle.
Experience in data warehouse project implementation and familiarity with data warehouse technologies and methodologies
Proficiency in SQL language, including complex queries, performance tuning
Proficiency in ETL development tools and techniques
Proficiency in Microsoft Office software, including Excel and PowerPoint in Various statistical graphic techniques
Proficient in combining mining results with customer business management, and providing customers with valuable feasible operational solutions based on data mining results
Applications and Employment Areas
Current data mining applications are mainly focused on telecom (customer analytics), retail (sales forecasting), agriculture (industry data forecasting), weblogs (web page customization), banking (customer fraud), electricity (customer call), biology (genetics), celestial bodies (star classification), chemical industry, medicine and so on. Currently it can solve the problem typically lies in: database marketing (Database Marketing), customer group division (Customer Segmentation & Classification), background analysis (Profile Analysis), cross-selling (Cross-selling) and other market analysis behavior, as well as customer churn. Market analysis behavior, as well as customer churn analysis (Churn Analysis), customer credit scoring (Credit Scoring), fraud detection (Fraud Detection) and so on, has been successfully applied in many areas. If you visit the famous Amazon online bookstore (), you will find that when you select a book, there will be a number of recommendations related to "Customers who bought this book also bought", which is behind the data mining technology at work.
The object of data mining is the data accumulated in a specialized field; the mining process is a human-computer interaction, many times repeated process; the results of mining should be applied to the profession. Therefore, the whole process of data mining is inseparable from the specialized knowledge of the application field. "Business first, technique second" is the characteristic of data mining. Therefore, learning data mining does not mean discarding the original professional knowledge and experience. On the contrary, a background in other industries is a great advantage to engage in data mining. Such as sales, finance, machinery, manufacturing, call center and other work experience, through the study of data mining, you can enhance the level of personal career, without changing the original professional circumstances, from the original transaction-based role to the role of analysis. From the end of the 80's to the end of the 90's the emergence of a wide range of applications to data mining as the core of business intelligence (BI) has become a new favorite in the IT and other industries.
Data Collection and Analysis Specialist
Job Description: The main responsibility of the Data Collection and Analysis Specialist is to collect data from the company's operations, and then mine regular information from it to guide the company's strategic direction. This position is often overlooked, but quite important. Due to the database technology first appeared in the field of computer, while the computer database has a massive storage, rapid search, analysis of semi-automated features, data collection and analysis of the commissioner first appeared in the computer industry, and later with the popularity of computer applications to a variety of industries. The position is generally available to people who understand database applications and have certain statistical analysis capabilities. Statistical professionals with computer expertise, or computer professionals who have studied data mining can be qualified for this job, but it is best to have a certain understanding of the market situation in the industry.
Job search advice: many companies pursue short-term interests without focusing on the status quo of long-term strategy, many domestic enterprises do not attach enough importance to this position. However, large companies, foreign enterprises attach higher importance to this position, with the passage of time the position will have a warming trend. In addition, the data collection and analysis specialist is easy to get industry experience, they can easily grasp the industry's market situation, customer habits, channel distribution and other key conditions in the analysis process, so if you want to start a business in a certain line, from the data collection and analysis specialist is a good choice.
Marketing/Data Analyst
1. Market data analysis is an essential and critical part of modern marketing science: Marketing/Data Analyst is practiced in the industry with the most: Direct Marketing (direct customer-facing marketing), since the 90s, Direct Marketing has become an increasingly popular way for companies to sell their products. Since the 1990s, Direct Marketing has become an increasingly important means for companies to market their products. According to the Canadian Marketing Association, Direct Marketing created 470,000 jobs in 1999 alone. From 1999 to 2000, an additional 30,000 jobs were created. Why does Direct Marketing need so many analysts? For example, as business competition increases and companies want to maximize the return on sales from their advertising, they want more users to respond to their ads. They want more users to respond to their advertisements, so they need to do a lot of market analysis before placing their advertisements. For example, according to their own products, combined with the target market customers' household income, educational background and consumption tendency to analyze which areas of households or residents are most likely to respond to the company's sales ads, buy their products or become customers, so that the ads are only targeted to these specific customer groups. Targeting advertisements to these areas saves money and increases the return on sales. But all this analysis is based on the database, through data processing, mining, modeling, during which the work of market analysts is essential.
2. Industry adaptability: Almost all industries use data, so as a data/market analyst can be employed not only in the traditional Chinese IT industry, but also in government, banking, retail, pharmaceuticals, manufacturing and transportation.
Status and Prospects
Data mining is a new discipline adapted to the need to extract information from massive databases in the information society. It is the intersection of statistics, machine learning, database, pattern recognition, artificial intelligence and other disciplines. Courses or research topics of data mining have been opened in all key institutions in China. The more famous ones are Institute of Computing, Chinese Academy of Sciences, Fudan University, Tsinghua University and so on. In addition, government organizations and large enterprises have begun to pay attention to this field.
According to IDC's analysis of a survey of 62 companies in Europe and North America that have adopted business intelligence technology, it was found that the three-year average return on investment for these companies was 401%, with 25% of them having an ROI of more than 600%. The results also show that an enterprise to succeed in a complex environment, senior management must be able to control the extremely complex business structure, if there is no detailed facts and figures to support, it is very difficult to do. Therefore, with the continuous improvement and increasing maturity of data mining technology, it is bound to be adopted by more users, so that more managers get more business intelligence.
According to the IDC (International Data Corporation) predicted that in 2004 the estimated BI industry market at 14 billion U.S. dollars. Now, with China's accession to the WTO, China in many areas, such as finance, insurance and other fields will gradually open up to the outside world, which means that many enterprises will be faced with enormous competitive pressure from large international multinational companies. The level of business intelligence adopted by various enterprises in foreign developed countries has far exceeded that of our country. Palo Alto Management Group, Inc. of the United States conducted a survey on the adoption of business intelligence technology by 375 large and medium-sized enterprises in Europe, North America and Japan in 1999. The results show that in the financial sector, the level of business intelligence technology has reached or close to 70% of the level of application in the field of marketing also reached 50%, and in the next three years, the level of adoption of the technology in all application areas will increase by about 50%.
Now, many enterprises regard data as a valuable asset, and have been using business intelligence to discover the hidden information in which to obtain huge returns. Domestic for the time being, there is no official market statistical analysis report on the data mining industry itself, but domestic data mining in various industries have a certain amount of research. According to the prediction of foreign experts, in the next 5-10 years, with the increasing accumulation of the amount of data and the wide application of computers, data mining will form an industry in China.
As we all know, the competition in the IT job market has been quite fierce, and data mining, the core technology of data processing, has received unprecedented attention. Data mining and business intelligence technology is located in the entire enterprise IT-business architecture of the pyramid of the tip of the tower, the current domestic data mining professional talent training system is not sound, the talent market is proficient in data mining technology, business intelligence, the supply is extremely small, while on the other hand, enterprises, government agencies and and scientific research units on the potential demand for such talent is extremely large, the gap between supply and demand is enormous. If you can combine data mining technology with your existing professional knowledge, you will surely open up new horizons in your career!
Career Salary
At present, and most of the IT industry positions, data warehousing and data mining personnel in the domestic demand for work is low-end saturation, high-end shortage, in the second tier of maturity, high-end data warehousing and data mining personnel are particularly scarce. High-end data warehouse and data mining talents need to be familiar with multiple industries, at least 3 years of large-scale DWH and BI experience, fluent in English reading and writing, with the ability to promote the project, so that the annual salary of such talents can reach more than 200,000 yuan.
Professional certification
1, SAS certification of the application industry and career prospects
SAS Global Professional Certification is internationally recognized as an authoritative certification in the field of data mining and business intelligence, with the growing maturity of China's IT environment and applications, the above two areas will have a great deal of space for industry development. Obtaining the SAS Global Professional Certification lays a good foundation for you to accumulate rich experience in the field of data mining and analysis methodology, and helps you to open up a new world of career development.
2, SAS certification period
Currently, SAS Level 5 certification has no specific validity period, but too long or too old version of the certification will be depreciated.
3, the relationship between the five levels of certification
Five levels of certification for the progressive relationship, that is, only through the last level of examination subjects to participate in the next level of certification exams.
4, SAS Global Certification exam
The exam is a computer-based exam, time 2 hours, **** 70 objective questions.
Related Links
With the overall rapid development of China's logistics industry, the construction of logistics information technology has also made some progress. Whether in the IT hardware market, software market or information service market, the logistics industry has a certain scale of investment, the total investment in the past two years are between 2-3 billion yuan. The government's active support for the development of modern logistics industry, the intensification of competition in the logistics market and other factors have strongly contributed to the steady development of logistics information technology construction.
Econometrics International's latest report, "China's logistics industry information technology annual comprehensive report 2006", pointed out that China's logistics industry is from the traditional model to the modern model to achieve an overall shift in the modern logistics model will guide the logistics industry information technology needs, and the basic driving force for this shift from the market demand. The data in the report show that: from 2006 to 2010, the IT investment scale of traditional logistics enterprises will accumulate more than 10 billion yuan; from 2006 to 2010, the IT investment scale of third-party logistics enterprises will accumulate more than 2 billion yuan.
Because of the current industry application software system at the operational level on the terminal equipment hardware to put forward the application of higher requirements, and software and hardware integration is generally unsatisfactory, the correspondence of a single, so the enterprise will be on the integration of software and hardware equipment to put forward a higher demand.
Logistics industry software system research and development will be more consideration of operations research and data mining technology, professional service providers will be more conducive to help solve the research and development problems.
The theoretical basis of logistics science comes from operations research, and very much emphasizes the complex data processing to find the correlation (based on the cost - service level system), so the data mining technology for the relevant software system appears to be more heavy.