Current location - Loan Platform Complete Network - Big data management - How to use relevance for marketing in the age of big data
How to use relevance for marketing in the age of big data
On the other side, an email address is also required for a Twitter account application. Typically, the same email address means that the member in the airline and the member in the microblog, should be the same person. The company did a screening and merged out 100,000 users. Then the data department of a third-party company stepped in, and the main task was to look at the behavior of these 100,000 airline members on social media, such as what they "say", what topics they like to get involved in and retweet and comment on, and what kind of business accounts they like to follow. The reason for studying this kind of thing is that the airline wants to know what kind of social media campaigns it launches (and what kind of incentives the campaigns are equipped with) will attract these 100,000 members to participate and become educatedmedia. this case is not Big Data in the strict sense of the word, as the data is still not massive enough. However, it is related to the principle of big data marketing: the search for correlation. Correlation is not causation, and it's hard to conclude that because you often fly on such-and-such airline, you like to participate in such-and-such activity (the reverse is also not true). But there is a correlation, in a general sense, between these two variables. This reasoning is like the relationship between wearing red socks and speculating in stocks; there may be some correlation coefficient, but it's never causation. If correlation is turned into causation, it is almost indistinguishable from "superstition". The problem is that many people equate correlation with causation, which can lead to very misleading conclusions. For example, when the 100,000 airline users are found to be particularly fond of certain types of activities, this conclusion is not promotional in nature. When another 50,000 new airline Twitter users are added, you'd be hard pressed to put that conclusion above on them as well. Because there is no causation here. To confirm causality, you have to go through a very complicated process of observation and thinking, eliminating the so-called "hidden variables". It's not as simple as doing some data analysis. Correlation is a prerequisite for causation, but it is not the same as causation. Thus, Big Data emerges. Big data seeks massive data, massive to what part? It is the full sample. Full samples are obviously different from sampling. In the past, due to the operational relationship, it is difficult to achieve the full sample, the need to go to the sample. The scientific approach to sampling is to "randomize" - but this is easier said than done. Truly random sampling costs a lot of money (using social network relationships to get one user to do a questionnaire and then getting that user to get more people to do the questionnaire isn't random at all), and there's a drawback that you can't get around: if you're using a questionnaire, it's very hard to rule out that the verbal responses of the people who are responding to the questionnaire must be what they really have in mind or what they're really doing in practice. Big Data is not about sampling, it's about getting a sample of the whole population, and it's not about getting users to answer questions, it's about actually getting their "behavior". Whether a user claims to be interested in an activity or whether the user participates in an activity is clearly more telling. Most importantly, the core difference between big data analytics and sampling analytics is that the former is dynamic and the latter is static. As mentioned earlier, the random sampling method is costly, so it's hard to do it every day - in fact, it's hard to implement a random sample once a month or even once a quarter for a specific question. Thus, the conclusions formed by a random sample are really static, and it can only indicate some relevance at the time of doing that research. When new users (samples) are added, it's hard to say whether or not past correlations will hold up - unless, of course, you can find a true causal relationship that excludes all sorts of hidden variables. If you try to reduce the cost of doing non-random sampling, then its conclusions are even less generalizable (an academic term for external validity, non-random sampling external invalidity). When new users join, the conclusions of non-random sampling basically can not be applied. But the analysis of big data is dynamic, every second may produce a new conclusion. Let's use the most common example of "Customers who bought this item also bought" on an Amazon page. The items in this section are active, and as new purchases are made, the items in this module may change. However, this module can also be an important reason for centralized purchasing: there is a high probability that a user will see a product recommended in this module and make a purchase (or maybe he/she never had any intention of buying it in the first place and didn't even know about it). But for Big Data, it doesn't matter what the reason is, what it's trying to do - at least in e-commerce - is to increase the unit price. The causal study between buying book A and buying book B is a matter for academics, not merchants. Returning to the specific case of the airlines, the 100,000 people who have both an airline membership and a Twitter membership are not a random sample, so these 100,000 are not representative of the millions of airline members overall. But our goal is not to find a causal relationship between people flying on the airline and participating in an online campaign, we just want to increase the probability of participation and see more people retweeting a campaign. So, 100,000 Twitter users, that's enough. At a certain point in time, ran the data, generally can see some correlation, so we began to design some kind of activity, and targeted to let the 100,000 microblogging users know, this time to get the participation and retweet rate, than no data to support the background of the haphazard planning, the success rate should be a little higher. The same investment in manpower, got a relatively high effect, this is the benefit of data analysis. After three months, there is a need to plan activities, note that this time still need to run the data again. Because the sample may not be only 100,000, maybe 150,000, maybe bad luck 20,000 microblogging users have "died", only 80,000 left. Another possibility is that there are some new external variables, such as a new product that many people are attracted to. At this time to take the last data to guide the planning, and the blind riding a blind horse, the night half of the abyss. Different points in time, or activities with different goals, all need to run the data again, which may be the trouble of big data analysis. However, the strength of computers is computation, spend an hour or two to design a few formulas or models, compared to the past motionless to engage in random sampling, the convenience is improved many times, it is worth trying. A little more ambitious is the true meaning of "big data". At the beginning of this year, the Internet circle Ali to merge with Sina Weibo, from the business logic, one is China's largest consumer platform, the other is China's largest fragmented speech platform, the merger of the two data, is quite able to dig out more relevance. There is a famous saying in advertising circles: I know half of my ads are wasted, but I don't know which half is wasted. Some marketers advocate that they can keep you from wasting that half. Don't believe them. For advertising, anything from 50% wasted to 49% wasted is well worth the investment. Big data marketing, built on correlation rather than cause and effect, can't make advertisers stop wasting ads from now on; it can only do what it can do: waste less.