Current location - Loan Platform Complete Network - Big data management - What are the big data application models and security risk analysis
What are the big data application models and security risk analysis
Currently, the rate of data generation in various fields is gradually accelerating, and the amount of data to be processed is expanding dramatically. These huge data resources hold potential value and need to be effectively analyzed and utilized. The characteristics of the current data in addition to the huge amount of data, data types have also become diverse, including structured data, semi-structured data and unstructured data. These huge quantities and varieties of massive data have brought great challenges to traditional analysis tools. The current analysis of data is no longer a simple generation of statistical reports, but the use of complex analytical models for in-depth analysis, traditional analysis techniques such as relational database technology can not meet its requirements. In scalability, by increasing or replacing memory, CPU, hard disk and other original equipment to play a show the ability of a single node vertical play a show (scale up) system has encountered a bottleneck; only through the increase in computing nodes, connected to a large-scale clusters, distributed parallel computing and management of horizontal play a show (scale out) system to meet the demand for analysis of big data [u. Therefore, the traditional tools have encountered obstacles in scalability. tools have encountered obstacles in scalability, and reliable data storage and analysis techniques must be sought to analyze and utilize these huge resources. Utilizing cloud computing platforms to build Hadoop computing frameworks has become the main means of processing big data at present. However, due to the characteristics of cloud computing and Hadoop applications and their own weak security mechanisms, it inevitably brings security risks.

1, big data application model

Cloud computing (Cloud Computing) is an Internet-based computing, is based on parallel computing (Parallel Computing ), distributed computing (Distributed Computing) and grid computing (Grid Compu- It is based on Parallel Computing, Distributed Computing and Grid Compu- ting, and integrates network storage, virtualization, load balancing and other technologies. It will originally need to be performed by personal computers and private data centers to transfer the task to have professional storage and computing technology to complete a large computing center to achieve the computer software, hardware and other computing resources to fully **** enjoy [z}. Enterprises or individuals no longer need to spend a lot of money on the purchase of infrastructure, not to mention the need to spend energy on hardware and software installation, configuration and maintenance, which will be provided by the cloud computing service provider CSP ( Cloud Service Provider) to provide the appropriate services. Enterprises or individuals only need to pay for the leased computing resources in accordance with the timing or measurement. Cloud computing service providers have big data storage capacity and computing resources, and are regarded as the best choice for outsourcing information services [31 Therefore the application of big data is often combined with cloud computing.

Hadoop is currently the most widely known implementation of Big Data technology, which is an open source implementation of Map/Reduce}4} and GFS ( Google File System) in Google Cloud Computing.Hadoop provides a computational framework, and its most core technology is HDFS ( HadoopDistributed File System) as well as MapReduee } HDFS provides a high-throughput distributed file system, while MapReduee is a distributed processing model for large data.Hadoop provides a reliable ****enjoyable storage and analytics system for big data [5-6 }v

Although there are organizations that build their own clusters to run Hadoop, there are still many organizations that choose to run Hadoop or offer Hadoop as a service in a cloud built on leased hardware. Examples include Cloudera, which offers to run Hadoop on public or private clouds, and a cloud service called Elastic MapReduee offered by Amazon, etc.f}l Thus combining cloud computing with Hadoop to process big data has become a trend.

2. Big Data Security Risk Analysis

With the widening scope of big data applications, the need for data security is becoming more and more urgent.

Since cloud computing is characterized by outsourcing data to cloud service providers to provide services, this service model transfers the ownership of the data to the CSP, and the user loses direct control over the physical resources [A1. The big data stored in the cloud usually exists in clear text, and the CSP has the underlying control over the data, so a malicious CSP may steal the user's data without the user's knowledge, and the cloud computing platform may also steal the user's data without the user's knowledge. The cloud computing platform may also be attacked, resulting in the failure of the security mechanism or illegal control, leading to unauthorized people reading the data, which poses a threat to the security of big data.

Hadoop was not designed with security in mind, and after Ha-doop 1.0.0 and Cloudera CDH3, Hadoop added Kerberos authentication and ACL-based access control [91]. Even with the addition of authentication and access control, the security mechanism is still very weak. security mechanism is still very weak, because the authentication mechanism of Ker-beros is only applied between clients (Clients ), key distribution centers (I}ey Distribution Center (I}DC ), and servers (Serv-er), which is only for machine-level security authentication, and does not authenticate the Ha-doop application platform itself [ }o} In contrast, the ACL-based access control policy needs to be configured through the attributes in hadoop-policy. xml after enabling the ACL, which includes nine attributes that restrict the access of users and group members to the resources in Hadoop as well as the access of users and group members to the resources in Hadoop as well as the access of users and group members to the resources in Hadoop as well as the access to resources in Hadoop as well as the access to resources in Hadoop by users and group members by users and group members by users and group members by users and group members by users and group members by users and group members by users and group members by users and group members by users and group members. communication between nodes, but the mechanism relies on the administrator's configuration of it [Chuan, this traditional access control list based on traditional access control lists can be easily tampered with on the server side without being easily detected. Moreover, the granularity of ACL-based access control policy is too coarse to protect user privacy fields in a fine-grained way in the MapReduce process. Moreover, for different users and different applications, the access control list needs to be changed frequently, which is too cumbersome and not easy to maintain. Therefore, Hadoop's own security mechanism is not perfect.

2.1 Security Risks of CSPs and Uers in Different Application Modes

There are multiple application modes for Hadoop in cloud computing. Hadoop is built in a private cloud, that is, the enterprise itself applies Hadoop, and the platform is used by employees from various departments within the enterprise, and outsiders can not access and use these resources. At this time the CSP refers to the creation and management of Hadoop, IaaS level and PaaS level CSP for the same entity; in the public cloud platform to apply Hadoop , C SP has two levels, IaaS level CSP, to provide infrastructure; PaaS level C SP, responsible for Hadoop construction and management. At this point the two levels of CSP are often different entities.