Perhaps a better analogy for big data is that it's like a spirited champion racehorse: with the right training and a talented jockey, a well-bred racehorse can set stakes records - but without the training and the jockey, the mighty animal wouldn't even make it to the gate. animal wouldn't even make it to the starting gate.
To make sure your organization's big data initiatives stay on track, you need to dispel these 10 common misconceptions.
1. Big data is 'lots of data'
Big data, at its core, describes how structured or unstructured data can be combined with social media analytics, data from the Internet of Things, and other external sources to tell a "bigger story. ". That story may be a macro description of an organization's operations or a big picture view that cannot be captured by traditional analytics. From an intelligence collection perspective, the size of the data it involves is trivial.
2. Big data must be very clean
In the world of business analytics, there's no such thing as "too fast". Conversely, in the world of IT, there's no such thing as "in with the trash, out with the gold." How clean is your data? One way is to run your analytics application, which identifies weaknesses in your data set. Once those weaknesses are addressed, run the analytics again to highlight the "cleaned" areas.
3. All human analysts will be replaced by machine algorithms
Data scientists' recommendations aren't always acted upon by front-line business managers. In a TechRepublic article, industry executive Arijit Sengupta pointed out that these recommendations are often harder to implement than scientific projects. However, overreliance on machine learning algorithms can be just as challenging. machine algorithms tell you what to do, but they don't explain why you're doing it, Sengupta said. That makes it difficult to integrate data analytics with the rest of a company's strategic planning.
Predictive algorithms range from relatively simple linear algorithms to more complex tree-based algorithms and, finally, extremely sophisticated neural networks.
4. Data lakes are a must
According to Jim Adler, a data scientist at the Toyota Research Institute, giant repositories, which some IT managers envision using to store large amounts of structured and unstructured data, simply don't exist. Enterprise organizations don't indiscriminately store all their data in one ****ed-up pool. the data is "carefully planned" and stored in separate departmental databases to encourage "focused expertise," Adler said. This is the only way to achieve the transparency and accountability needed for compliance and other governance requirements.
5. Algorithms are infallible prognosticators
Not so long ago, there was a lot of hype around Google's Flu Trends project, which claimed to predict where flu epidemics would occur faster and more accurately than the Centers for Disease Control and other health information services. As Michele Nijhuis of The New Yorker wrote in a June 3, 2017 article, it was assumed that searches for flu-related terms would accurately predict areas where outbreaks were imminent. In fact, simply plotting local temperatures is a much more accurate predictor.
Google's flu-prediction algorithm fell into a common big-data trap -- it produced meaningless correlations, such as linking a high school basketball game to a flu outbreak because both occur in the winter. When data mining is run on a massive set of data, it's more likely to find relationships between information that have statistical significance rather than practical significance. One example is linking the divorce rate in Maine to the per capita consumption of margarine in the U.S.: although not meaningful, there is indeed a "statistically significant" relationship between the two numbers.
6. You can't run big data applications on virtualized infrastructure
When "big data" first came to the forefront about 10 years ago, it was synonymous with Apache hadoop. As VMware's Justin Murray wrote in a May 12, 2017, article, the term big data now encompasses a range of technologies, from NoSQL (MongoDB, Apache Cassandra) to Apache Spark.
Previously, critics questioned Hadoop's performance on virtual machines. performance, but Murray points out that Hadoop's performance on VMs is comparable to that of physical machines, and that it makes more efficient use of cluster resources.Murray also blasts the misconception that storage area networks (SANs) are required for the basic features of VMs. In fact, vendors often recommend direct-attached storage, which offers better performance and lower costs.
7. Machine Learning is Synonymous with Artificial Intelligence
The gap between an algorithm that recognizes patterns in large amounts of data and a method that can draw logical conclusions based on the patterns of the data is more like a chasm.Vineet Jain, of ITProPortal, writes in a May 26, 2017 article that Machine Learning uses statistical explanations to generate predictive models. This is the technology behind algorithms that can predict what a person is likely to buy based on their past purchases, or what music they like based on their listening history.
While these algorithms are smart, they fall far short of what artificial intelligence is meant to do, which is replicate the human decision-making process. Statistically based predictions lack human reasoning, judgment, and imagination. In this sense, machine learning might be considered a necessary precursor to true AI. Even the most sophisticated AI systems to date, such as IBM Watson , fail to provide the insights into big data that human data scientists provide.
8. Most big data projects achieve at least half of their goals
IT managers know that no data analytics project is 100% successful. When those projects involve big data, the success rate plummets, as recent survey results from NewVantagePartners show. Over the past five years, 95 percent of business leaders said their company was involved in a big data project, but only 48.4 percent of those projects had "measurable results.
NewVantage Partners' Big Data Executive Survey shows that fewer than half of Big Data programs achieve their goals, with "cultural" change being the most difficult to achieve.
In fact, according to Gartner research released in October 2016 , big data projects rarely make it past the pilot stage.Gartner's survey found that only 15 percent of big data implementations are deployed into production, relatively unchanged from the 14 percent success rate reported in last year's survey.
9. Big data growth will reduce the need for data engineers
If the goal of your company's big data initiative is to minimize the need for data scientists, you may be in for an unpleasant surprise. The 2017 Robert Half Tech Salary Guide notes that annual salaries for data engineers jumped to between $130,000 and $196,000 on average, while data scientists' salaries now average between $116,000 and $163,000, and business intelligence analysts' salaries now average between $118,000 and $138,750 per year.
10. Employees and front-line managers will embrace big data with open arms
A survey by NewVantagePartners found that 85.5 percent of companies are committed to creating a "data-driven culture. However, the overall success rate for new data initiatives is only 37.1%. The three most commonly cited barriers for these companies were lack of organizational alignment (42.6%), lack of adoption and understanding by middle management (41%), and business resistance or lack of understanding (41%).
The future may belong to big data, but reaping the benefits of this technology will require a lot of hard work against diverse human nature.