Current location - Loan Platform Complete Network - Big data management - zookeeper
zookeeper

The client's Watcher

String is the node's path, and there are three main categories.

dataWatches: data watchers

existWatches: watchers for whether the node exists

childWatches: changes in the number of children of the node, including: 0->1, 1->2, etc., triggered whenever the number changes.

A couple questions about zk's watcher mechanism?

Third, ZK's communication protocol

There are two categories: request and response protocols. It is an application layer protocol based on TCP/IP, which actually specifies the format of the application layer messages and the meaning of each field. Both the client and server sides abide by this convention, and apart from ZK, other frameworks are unable to recognize this communication protocol.

Request and response message formats, search the Internet a lot. I won't go into it again

So back to what is distributed coordination? Let's take an RPC scenario like this:

For example, we can understand the diagram above, half of the clients are in the service role, half of the clients are in the consumer role, and then zk provides coordination between these clients -- the client in the consumer role has to rely on the client in the service role to provide the service. This is a kind of distributed coordination (coordinating half of the clients to depend on the other half). Of course, we can continue to expand: the distributed configuration center Disconf, there is a client whose role can be administrator (configurator), it is a process. It is a process that configures XX data to a node in zk, and then all other clients receive the latest configuration at the same time. This is a different kind of coordination (other clients rely on this configured value to perform their business).

ZooKeeper is a typical distributed data consistency solution, and distributed applications can use ZooKeeper to implement features such as publish/subscribe, load balancing, naming services, distributed orchestration/notification, cluster management, master elections, distributed locks, and distributed queues.

At first glance, this definition is quite abstract and not understood. In fact, an example is much easier to understand, for example, you have a resource, we first simply take the resource as a local file or a local movie. In order to ensure the reliability of the resource, you need to copy the resource to three machines. Then these three resources, there will be three url address on the Internet: machine 1/xxx/xxx/xxx/xxx.pdf, machine 2/xxx/xxx/xxx/xxx.pdf, machine 3/xxx/xxx/xxx/xxx.pdf. The requesting party in order to access the resource, certainly will not be hard-coded to their own local address of the three addresses, so that it is not easy to expand. So at this time, there is a need for a unified naming service to encapsulate the three real addresses of the resource, the external only provide an address and can always be the same. So this unified naming service, of course, we can use the http protocol to implement, but also can use the zk protocol (ZAB protocol) to realize.

Now that we understand the above, it is easy to expand the definition of this resource. Above said resource, we can expand into a service xxx.xxx.xxx.UserService, the service provider can have many, but in order to encapsulate the real provider, we can only expose an external address that is xxx.xxx.xxx.UserService. is not very familiar?Dubbo's ZK registry, is the Dubbo's ZK registry, that's how it works.

Understanding the definitions we will not say, this is very good to understand. In the actual case, we can refer to Dubbo's ZK, where the provider publishes or modifies the real address of the service, the consumer receives a real-time notification of the change, and then goes to the ZK to pull the latest data. This is not the ZK publish and subscribe it , through the Listener observer pattern to achieve . There are many application scenarios: RPC service discovery, configuration center configuration distribution, and so on.

The so-called cluster management does not care about two things: whether there is a machine to exit and join, and the election of the Master.

It is not the ZK publish-subscribe, there is nothing to say.

The maser election here is not ZK's internal master election (the difference between the two is huge). The master election here is the election of the business caller, the process in the client role.

The business master election is actually very simple, and there is no need to implement a reliable election algorithm (the paxos algorithm includes a proposal election, and the paxos algorithm has more than one election algorithm), as the ZK internal election does. The ZK itself provides an API to reliably create a unique node, which guarantees that concurrent calls will only have one success. So business master election, can not be transformed into multiple business processes concurrently call the API interface, who create a unique node success, who is the master, the other process is the slave. how to feel so much like the implementation of distributed locks? Except for the missing notification operation to release the lock.

Soft load balancing is not much to go on, it's a variant of registration and discovery, or publish and subscribe. You get a list of all the providers of a uniformly named service, and then choose one of the bunch via a load balancing algorithm, with a strategy that can be random, rotational, hash-consistent, and so on. Dubbo's load balancing is implemented in the consumer call logic, not in the get-list interface, and ZK doesn't do load balancing; ZK-based apps do.

And the data publish and subscribe exactly the same as above, without going into details.

Much like the Master election described above, there are a bunch of client roles for business callers to call the API that creates a unique node, and whoever creates it successfully takes the lock. Only ZK's underlying implementation is strictly sequential, so it is inevitable that when ZK receives a request, whoever is the first to take the lock. Note that it is not the client who requests first who gets the lock. client requests go through the network, and there is no way to determine which client arrived first.

But just creating a unique node that succeeds is not called a lock. A lock naturally has two functions: reentrant and releasing the lock to wake up other client nodes that are waiting. Re-entry ZK can not be realized, you need to realize the business side of their own. Releasing the lock wakes up the ZK via a Listener notification.

When we look at all the scenarios above, we see that they are essentially calls to two of ZK's core interfaces: creating nodes and listening for node changes.

ZooKeeper stores data in memory, which ensures high throughput and low latency (but memory limits the amount of storage that can be done, which is a further reason to keep the amount of data stored in the znode small.) Adding more machines to a ZK cluster doesn't increase the memory. Because all the data in a ZK cluster is kept in one copy per machine. The machine with the least amount of memory for the entire cluster determines the size of the memory for the entire cluster.

ZooKeeper is high performance. It is especially high performance in applications that have more "reads" than "writes" because "writes" result in synchronized state across all servers. (More "reads" than "writes" is a typical scenario for coordinated services.)

The ZooKeeper underpinnings really only provide two functions:

.