Gartner's Merv Adrian has this definition of Big Data:
"Big Data makes it impossible for commonly used hardware software tools to capture, manage, and process data in a tolerable amount of time for the user." [1]McKinsey Global Institute also had this concept in May 2011, "Big data is a data set that exceeds the ability of typical database software tools to capture, store, manage, and analyze." [2] As you can see from the above definition, the biggest challenge of Big Data is how to process and analyze the data in a limited time and get useful information.
2. Data Processing
The most famous tool in big data processing is Hadoop, though it is not a real-time system. To solve this problem, computer engineers went on to develop Storm and Kafka. Apache Storm is an open source distributed real-time computing system. First developed by Nathan Marz [3], it was open sourced after being acquired by Twitter and has been the top Apache open source project since September 2014.Storm is widely used in a variety of commercial websites including Twitter, Yelp, Groupon, Baidu, Taobao, etc.Storm is used in a wide range of scenarios such as real-time analytics, online Storm is used in a wide range of scenarios , such as real-time analytics , online machine learning , continuous computing , sub-deployment of RPC, ET|, etc. Storm has a very fast processing speed , a single node can be up to a million tuples per second , in addition to its high scalability , fault tolerance , guaranteed data processing and other characteristics . Figure 1 shows a simple architecture of Storm.