Salvatore Sanfilippo, the author of Redis, once compared these two memory-based data storage systems:
1. Redis supports server-side data manipulation: Redis, compared to Memcached, has more data structures and supports a richer set of data manipulations. Memcached, you need to take the data to the client to make similar changes and then set back. This greatly increases the number of network IO and data volume. In Redis, these complex operations are usually as efficient as a normal GET/SET. So, if you need the cache to be able to support more complex structures and operations, then Redis would be a good choice.
2. Comparison of memory usage efficiency: Memcached's memory utilization is higher if you use simple key-value storage, and if Redis uses a hash structure for key-value storage, its memory utilization will be higher than that of Memcached due to its combinatorial compression.
3. Comparison of performance: Since Redis only uses a single core, while Memcached can use multiple cores, so on average on each core Redis has higher performance than Memcached in storing small data. In data over 100k, Memcached performance is higher than Redis. Although Redis has recently been optimized for storing big data, it is still slightly inferior to Memcached.
Specifically why the above conclusions, the following information collected:
1, data type support is different
With Memcached only supports a simple key-value structure of the data record is different, Redis support for data types to be much richer. The most commonly used data types are five: String, Hash, List, Set, and Sorted Set. Redis internally uses a redisObject object to represent all the keys and values. redisObject's most important information is shown in the figure:
type represents a value object. encoding is the way different data types are stored inside redis, for example: type=string means that value is stored as a normal string, then the corresponding encoding can be raw or int, if it is int, it means that redis actually stores and represents this string as a numeric type. string, of course, provided that the string itself can be expressed as a numerical value, such as: "123″ "456" such a string. The vm field field will actually allocate memory only if Redis' virtual memory feature is turned on, which is off by default.
1) String
Common commands: set/get/decr/incr/mget, etc.;
Application scenario: String is the most commonly used data type, ordinary key/value storage can be categorized as such;
Implementation: String is stored as a string within redis. The default storage is a string, referenced by redisObject, when encountered incr, decr and other operations will be converted to a numerical type for calculation, at this time the encoding field of redisObject is int.
2) Hash
Common commands: hget/hset/hgetall, etc.
Application scenario: we want to store a user information object data, which includes user ID, user name, age and birthday, through the user ID we want to get the user's name or age or birthday;
Implementation: Redis's Hash is actually stored internally as a Value for a HashMap, and provides a direct access to this Map members of the interface. As shown in the figure, Key is the user ID, value is a Map. the key of this Map is the member's attribute name, value is the attribute value. This way, the modification and access to the data can be directly through the internal Map's key (Redis calls the internal Map's key field), that is, through the key (user ID) + field (attribute label) you can manipulate the corresponding attribute data. There are two ways to implement the current HashMap: when the members of the HashMap is relatively small Redis in order to save memory will be used in a way similar to a one-dimensional array to compactly store, and will not use the real HashMap structure, which corresponds to the value of the encoding of the redisObject for zipmap, when the number of members will be automatically increased when the number of When the number of members increases, it will be automatically converted into a real HashMap, at which time the encoding is ht.
3) List
Common commands: lpush/rpush/lpop/rpop/lrange, etc.;
Application Scenarios: There are many application scenarios for Redis lists, and it is also one of the most important data structures in Redis, such as twitter's followers list, which is a list of the most important data structures. For example, twitter's attention list, fan list, etc. can be realized using Redis list structure;
Implementation: Redis list is implemented as a bi-directional chained table, which can support reverse lookup and traversal, more convenient operation, but brings part of the additional memory overhead, Redis internal implementation of many, including sending buffer queues, etc. are used in this data structure. are using this data structure.
4) Set
Common commands: sadd/spop/smembers/sunion, etc.
Application Scenarios: Redis set provides a similar function to list, which is a list of functions, but the special thing is that set can be automatically reordered. When you need to store a list of data, and do not want to repeat the data, set is a good choice, and set provides an important interface to determine whether a member of a set collection, this is also a list can not provide;
Implementation: set's internal implementation of a value will always be a null HashMap, in fact, is through the calculation of hash way to quickly rank the weight of the set. which is why set provides the ability to determine whether a member is in the set.
5) Sorted Set
Common commands: zadd/zrange/zrem/zcard, etc.
Application Scenarios: Redis sorted set is used in similar scenarios to set, the difference is that set is not automatically ordered, and sorted set can be used by the user to provide an extra The difference is that set is not auto-ordered, whereas sorted set can be sorted by the user providing an additional parameter of priority (score) for the members and is inserted ordered, i.e., auto-ordered. When you need an ordered list of collections that are not duplicated, then you can choose the sorted set data structure, for example, twitter's public timeline can be stored with the publication time as the score, so that when you get it, it is automatically sorted by time.
The way to realize: Redis sorted set of internal use of HashMap and jump table (SkipList) to ensure that the data storage and orderly, HashMap is put in the member to the score of the mapping, while the jump table is stored in all the members of the sorted based on the HashMap in the storage of the score, the use of jump table structure can get a relatively high search efficiency. structure can obtain a relatively high search efficiency, and in the implementation is relatively simple.
2, different memory management mechanisms
In Redis, not all the data have been stored in memory. This is one of the biggest differences compared to Memcached. When the physical memory runs out, Redis can swap the values that have not been used for a long time to disk.Redis will only cache all the key information.If Redis finds that the memory usage exceeds a certain threshold, it will trigger a swap operation.Redis will swap the values according to "swappability = age*log (size_in_memory)", Redis calculates which keys correspond to values that need to be swapped to disk. The values corresponding to these keys are then persisted to disk and cleared in memory. This feature allows Redis to hold more data than the size of the machine's own memory. Of course, the machine's own memory must be able to hold all the keys; after all, this data is not subject to swap operations. At the same time, since when Redis swaps data in memory to disk, the main thread providing the service and the child thread performing the swap operation will **** enjoy this part of the memory, so if you update the data that needs to be swapped, Redis will block this operation until the child thread completes the swap operation before you can make changes. When reading data from Redis, if the value corresponding to the key read is not in memory, then Redis needs to load the corresponding data from the swap file and then return it to the requestor. There is an I/O thread pooling problem here. By default, Redis blocks, i.e., it completes loading all swap files before corresponding. This strategy is more appropriate when the number of clients is small and batch operations are performed. But if Redis is used in a large web application, this is obviously not able to meet the situation of large concurrency. So Redis runs us to set the size of the I/O thread pool to perform concurrent operations on read requests that need to load the appropriate data from the swap file, reducing the blocking time.
For memory-based database systems like Redis and Memcached, the efficiency of memory management is a key factor in system performance. The malloc/free function in traditional C is the most commonly used method of allocating and freeing memory, but this method has significant drawbacks: first, mismatched malloc and free for developers can easily lead to memory leaks; second, frequent calls can result in a large amount of memory debris that cannot be reclaimed for reuse, lowering memory utilization; and lastly, as a system call, the system and lastly, as a system call, its system overhead is much larger than that of a general function call. Therefore, in order to improve the efficiency of memory management, efficient memory management programs do not directly use malloc/free calls. redis and Memcached both use their own design of the memory management mechanism, but there are significant differences in the implementation of the method, the following will be introduced to the two memory management mechanism respectively.
Memcached uses the Slab Allocation mechanism to manage memory by default, and the main idea is to split the allocated memory into blocks of a specific length according to a predefined size to store key-value data records of the corresponding length, in order to completely solve the problem of memory fragmentation.
The Slab Allocation mechanism is only designed for storing external data, which means that all key-value data records are stored in the memory. The Slab Allocation mechanism is designed only for storing external data, that is, all key-value data is stored in the Slab Allocation system, while other Memcached memory requests are requested through normal malloc/free, because the number and frequency of these requests determine that they will not affect the performance of the entire system Slab Allocation principle is Slab Allocation is fairly simple. As shown in the figure, it first requests a large chunk of memory from the operating system, splits it into chunks of various sizes, and divides the chunks of the same size into groups of Slab Classes, where a chunk is the smallest unit used to store key-value data. The size of each Slab Class can be controlled by setting the Growth Factor when Memcached starts. Assuming the value of Growth Factor in the figure is 1.25, if the size of the first group of Chunks is 88 bytes, the size of the second group of Chunks will be 112 bytes, and so on.
When Memcached receives data from a client, it first selects the most suitable Slab Class according to the size of the received data, and then finds a Chunk that can be used for storing data by querying the list of free Chunks in the Slab Class that Memcached keeps. When a database record expires or is discarded, the Chunk occupied by the record can be reclaimed and re-added to the free list. From the above process, we can see that Memcached's memory management system is efficient and does not cause memory fragmentation, but its biggest drawback is that it leads to space wastage. Because each chunk is allocated a specific length of memory space, variable-length data cannot fully utilize this space. For example, if you cache 100 bytes of data into a 128-byte Chunk, the remaining 28 bytes are wasted.
Redis's memory management is accomplished through the source code files zmalloc.h and zmalloc.c. Redis allocates a block of memory and then stores the size of the block in the header of the block for ease of memory management. As shown in the figure, real_ptr is the pointer returned by redis after calling malloc. redis stores the size of the memory block, size, into the header. the size of the memory occupied by size is known as the length of the size_t type, and then returns ret_ptr. ret_ptr is passed to the memory management program when it is necessary to free the memory. With ret_ptr, the program can easily figure out the value of real_ptr and then pass real_ptr to free to free the memory.
Redis keeps track of all memory allocations by defining an array of length ZMALLOC_MAX_ALLOC_STAT. each element of the array represents the number of memory blocks allocated by the current program, and the size of the block is the subscript of that element. In the source code, this array is zmalloc_allocations. zmalloc_allocations[16] represents the number of memory blocks of length 16bytes that have been allocated. zmalloc.c has a static variable, used_memory, that is used to keep track of the total size of the currently allocated memory. So, in summary, Redis uses packed mallc/free, which is much simpler compared to Memcached's approach to memory management.
3, data persistence support
Redis is a memory-based storage system, but it itself supports in-memory data persistence, and provides two major persistence strategies: RDB snapshots and AOF logs. memcached does not support data persistence operations.
1) RDB snapshots
Redis supports a persistence mechanism that stores a snapshot of the current data as a data file, i.e., an RDB snapshot. But a continuous write database how to generate snapshots?Redis with the fork command copy on write mechanism. In the generation of snapshots, the current process will be forked out of a sub-process, and then in the sub-process to cycle through all the data, the data will be written to the RDB file. We can configure the timing of RDB snapshot generation through Redis' save command, such as configuring snapshots to be generated in 10 minutes, or configuring snapshots to be generated when there are 1000 writes, or multiple rules can be implemented together. These rules are defined in Redis's configuration file, and you can also set the rules at Redis runtime with Redis's CONFIG SET command, which doesn't require a Redis restart.
Redis's RDB files don't go bad because their writes are performed in a new process, and when a new RDB file is generated, the Redis generated child process first writes the data to a temporary file, and then renames the temporary file to the RDB file via the atomicrename system call, so that at any time of failure, the Redis RDB file is always available. At the same time, Redis RDB file is also Redis master-slave synchronization internal implementation of a link. RDB has his shortcomings, that is, once the database problem, then our RDB file saved in the data is not brand new, from the last time the RDB file generation to the Redis downtime of the period of time all the data is lost. This is tolerable under certain business conditions.
2) AOF log
The full name of the AOF log is append only file, which is an append write log file. Unlike a normal database binlog, the AOF file is recognizable plain text, and its contents are one Redis standard command after another. Only commands that cause modifications to the data are appended to the AOF file. Each command that modifies the data generates a log, the AOF file will get bigger and bigger, so Redis provides a feature called AOF rewrite. its function is to regenerate an AOF file, the new AOF file in a record of the operation will only be once, unlike an old file, may record the same value of the operation of a number of times. The generation process is similar to RDB, also fork a process, directly traverse the data, write to the new AOF temporary file. In the process of writing the new file, all write operation logs will still be written to the original old AOF file, but also recorded in the memory buffer. When the re-complete operation is complete, all the logs in the buffer are written to the temporary file at once. The atomic rename command is then called to replace the old AOF file with the new one.
AOF is a write file operation whose purpose is to write the operation logs to disk, so it similarly encounters the same flow of write operations we described above. After a write write is called on AOF in Redis, the time it takes to call fsync to write it to disk is controlled by the appendfsync option, and the security strength gets progressively stronger with the following three settings for appendfsync.
appendfsync no When appendfsync is set to no, Redis does not actively invoke fsync to synchronize the contents of the AOF logs to disk, so it all depends on the operating system debugging. For most Linux operating systems, fsync is performed every 30 seconds to write the data in the buffer to disk.
appendfsync everysec When appendfsync is set to everysec, Redis defaults to making an fsync call every second to write the data in the buffer to disk. But when this fsync call is longer than one second, Redis adopts the strategy of delaying fsync and waiting another second. That is, fsync will be performed after two seconds, and this fsync will be performed no matter how long it will take. At this point, the current write operation is blocked because the file descriptor is blocked during the fsync. So the conclusion is that in most cases, Redis will fsync every second, and in the worst case, every two seconds. This operation, known as a group commit in most database systems, is a combination of data from multiple write operations that writes the logs to disk at once.
appednfsync always When appendfsync is set to always, fsync is called once for every write operation, which is when the data is safest, but of course, its performance suffers because fsync is performed every time.
For general business needs, it is recommended to use the RDB approach to persistence, the reason is that the overhead of RDB and compared to AOF logging is much lower, for those applications that can not tolerate the loss of data, it is recommended to use AOF logging.
4, the different cluster management
Memcached is a full-memory data buffer system, Redis support for data persistence, but the full-memory is, after all, the essence of its high performance. As a memory-based storage system, the size of the machine's physical memory is the maximum amount of data that the system can accommodate. If the amount of data to be processed exceeds the physical memory size of a single machine, you need to build distributed clusters to extend the storage capacity.
Memcached itself does not support distribution, so you can only implement distributed storage in Memcached on the client side through distributed algorithms like consistent hashing. The following figure gives the architecture of Memcached's distributed storage implementation. When the client sends data to the Memcached cluster, it will first calculate the target node for that piece of data through the built-in distributed algorithm, and then the data will be sent directly to that node for storage. However, when a client queries for data, it also calculates the node where the query data resides, and then sends a query request directly to that node to get the data.
Compared to Memcached, which uses only client-side distributed storage, Redis prefers to build distributed storage on the server side. The latest version of Redis has support for distributed storage, and Redis Cluster is an advanced version of Redis that is distributed and allows for a single point of failure, has no central node, and is linearly scalable. The following figure gives the distributed storage architecture of Redis Cluster, where nodes communicate with each other via binary protocol and nodes communicate with clients via ascii protocol. In terms of data placement strategy, Redis Cluster divides the entire key's numeric field into 4096 hash slots, and one or more hash slots can be stored on each node, which means that the maximum number of nodes currently supported by Redis Cluster is 4096. The distributed algorithm used by Redis Cluster is also very simple: crc16( key ) % HASH_SLOTS_NUMBER.
In order to ensure data availability under a single point of failure, Redis Cluster introduces Master nodes and Slave nodes. In a Redis Cluster, each Master node has a corresponding two Slave nodes used for redundancy. This way, the downtime of any two nodes in the entire cluster will not lead to unavailability of data. When the Master node quits, the cluster automatically selects a Slave node to become the new Master node.