?ZAB is the distributed consistency protocol used by Zookeeper, the full name of the English is: Zookeeper Atomic Broadcast, so ZAB is also known as Zookeeper Atomic Broadcast protocol. In addressing distributed consistency, Zookeeper does not use Paxos, but uses the ZAB protocol. Based on the ZAB protocol, Zookeeper implements a master/standby mode system architecture to maintain data consistency between master and standby replicas in the cluster.The ZAB protocol consists of two basic modes: Message Broadcasting and Leader Activation. The following describes the implementation of these two basic modes in detail.
? Message Broadcasting is the method used by Zookeeper to ensure consistency of write transactions. In a Zookeeper cluster, there are nodes with the following three roles:
Leader: The core role of the Zookeeper cluster, which is elected through the participation of the Follower in the cluster startup or crash recovery. that provides read and write services to clients and processes transaction requests.
Follower: Zookeeper cluster core role, in the cluster startup or crash recovery to participate in the election, was not elected is this role, for the client to provide read services, that is, to deal with non-transactional requests, Follower can not deal with transactional requests, for the transaction requests received will be forwarded to the
Observer: Observer role, do not participate in the election, provide read services for the client, processing non-transactional requests, for the incoming transaction request will be forwarded to the Leader. the purpose of using Observer is to extend the system, to improve the performance of reading.
? Below is a brief introduction to the ZAB message broadcasting process through a few diagrams.
?Zookeeper's message broadcasting process is similar to 2PC (Two Phase Commit). ZAB only needs more than half of the Follower to return Ack information to perform the commit, greatly reducing synchronization blocking and improving availability.
? During the startup and operation of a Zookeeper cluster, if a Leader crashes, the network is disconnected, the service is stopped or restarted, or a new server joins the cluster, ZAB will allow the current cluster to quickly enter crash recovery mode and elect a new Leader node, and during this period the entire cluster does not provide any read services to the outside world. When a new leader is elected and more than half of the Follower nodes in the cluster are synchronized with the leader, the ZAB protocol will let the Zookeeper cluster switch from crash recovery mode to message broadcast mode. The purpose of crash recovery is to ensure that the current Zookeeper cluster quickly elects a new Leader and completes state synchronization with the other Follower, so that it can enter message broadcast mode to provide services as soon as possible.
?The main task of Zookeeper crash recovery is to elect a Leader (Leader Election), and there are two scenarios for Leader election: one is the election of a Leader during the startup of the Zookeeper server, and the other is the election of a Leader after a Leader crash in the course of the Zookeeper cluster operation. election. Before describing the Leader election process in detail, we need to introduce a few parameters:
In addition, during the election process, the current state of each node will transition between the following states.
? Suppose there is a cluster of five Zookeeper servers Sever1, Sever2, Sever3, Sever4, and Sever5, with cluster myids 1, 2, 3, 4, and 5, in increasing order of myid. Since zxid and epoch are both 0 at startup, the key factor in leader election becomes myid.
? When the Zookeeper cluster is first started, zxid and epoch do not participate in the leader election. But if the Zookeeper cluster crashes after it has been running for a while, then epoch and zxid will be more important than myid in the Leader election. the order of importance is: epoch zxid myid. a Follower enters the Leader election when it loses communication with the Leader, when the Follower will communicate with other nodes in the cluster, but at this point there will be two situations:
? This post-crash Leader election mechanism is also well understood, if the Leader hangs, the priority is to select the node in the cluster that was the last to do (epoch) Leader as the new Leader node, followed by the node that has the latest transaction commits (zxid) as Leader, and then finally only according to the default maximum machine number (myid) to vote.