Disk Array (DiscArray) is made up of a number of disk drives or optical disk drives in accordance with certain rules, such as striping (Striping), blocking (Declustering), cross-access (Interleaving) and so on, to form a fast, large-capacity external memory subsystem. It is under the control and management of the array controller to achieve fast, parallel or cross-access, and has a strong fault tolerance. From the user's point of view, although the disk array is composed of several, dozens or even hundreds of disks, but still can be considered a single disk, its capacity can be as high as hundreds ~ thousands of gigabytes, so this technology is widely welcomed for multimedia systems.
The full name of the disk array is:
RedundanArrayofInexpensiveDisk, or RAID technology. It is a disk redundancy technology that was introduced in 1988 by Professor David Patterson and others at the University of California, Berkeley. Since then, disk array technology has developed rapidly and gradually matured. Now it has been basically recognized as the following eight series.
1. RAID0 (RAID 0)
RAID0 is also known as data chunking, that is, the data is distributed on multiple disks, there is no fault-tolerance measures. Its capacity and data transfer rate is N times the capacity of a single machine, N for the total number of disk machines that constitute the disk array, I / O transfer rate is high, but the average time to failure MTTF (MeanTimeToFailure) is only one of the N points of a single disk machine, so the worst reliability of the zero-level disk array.
2. RAID1 (RAID 1)
RAID1, also known as Mirror (Mirror) disk, using mirror fault tolerance to improve reliability. That is, each working disk has a mirror disk, each time you write data must be written to the mirror disk at the same time, read data only from the working disk read out. Once the work disk failure immediately transferred to the mirror disk, read data from the mirror disk, and then by the system then restore the work disk correct data. Therefore, this way data can be reconstructed, but the working disk and the mirror disk must maintain a one-to-one correspondence. This type of disk array is highly reliable, but its effective capacity is reduced to less than half of the total capacity. Therefore, RAID1 is often used in applications that require very strict error rates, such as fiscal and financial areas.
3. RAID2 (2-level disk array)
RAID2, also known as bit-crossing, uses Hamming code for disk error checking, eliminating the need for a CRC (CyclicReDundancycheck) test after each sector. Hamming code is a (n,k) linear grouping code, n for the length of the code word, k for the number of bits of data, r for the number of bits used to test, so there are: n = 2r-1r = n-k
So the most favorable for the Hamming code test according to the bit-crossing access. This kind of disk is suitable for reading and writing big data. But the redundant information overhead is still too large, preventing the widespread use of such disks.
4. RAID3 (3-level disk array)
RAID3 is a single-disk fault-tolerant parallel transmission array disk. It is characterized by the test disk is reduced to one (RAID2 check disk for more than one, DAID1 test disk for 1 to 1), the data in the form of bits or bytes stored in the disk (scattered records in the group of the same sector number of each disk machine). Its advantage is that the bandwidth of the entire array can be fully utilized, so that the bulk data transfer time is reduced; its disadvantage is that each read and write to involve the entire group, each time only one I/O can be completed.
5. RAID4 (4-level disk array)
RAID4 is a group of disks can be independently of the read and write arrays. It also has only one checksum disk.
The difference between RAID4 and RAID3 is that RAID3 is cross access by bit or byte, while RAID4 is accessed by block (sector), can be operated independently of a disk, it does not need to be like RAID3, even if each small I / O operation should involve the whole group, only two disk machines involved in the group (a data disk, a check disk) can be. This improves the I/O rate for small amounts of data.
6. RAID5 (5-level disk array)
RAID5 is a rotating parity independent access array. It and RAID1, 2, 3, 4 disk array of different points, is that it does not have a fixed parity disk, but according to some rules of its redundant parity information evenly distributed in the array belongs to all the disks. So there is both data and parity information on the same disk machine. This change solves the problem of contention for parity disks, so DAID5 allows multiple concurrent write operations within the same group. Therefore, RAID5 is suitable for the operation of large data volume, but also suitable for a variety of transaction processing. It is a fast, high-capacity and fault-tolerant array.
7. RAID6 (6-level disk array)
RAID6 is a two-dimensional parity independent access to the disk array. Its redundant check and error correction information is evenly distributed across all disks, while the data is still stored in variable-sized blocks in a crossover manner on each disk. This type of disk array allows for dual disk errors.
8. RAID7 (7-level disk array)
RAID7 is based on RAID6, the use of cache technology, which makes the transfer rate and response speed have a greater improvement. cache is a cache memory, that is, data in the disk array before writing, first written in the cache. The general use of cache chunk size and disk array data chunk size is the same, that is, a cache chunk corresponding to a disk chunk. The data will be written to two separate cache during the write operation, so that the data will not be lost even if one of the cache fails. Write operations are responded to directly at the cache level and then transferred to the disk array. When data is written from the cache to the disk array, the data on the same track will be completed in a single operation, avoiding the problem of multiple writes of quite a few blocks of data and improving speed. When reading, the host also reads directly from the cache, rather than from the array disk, reducing the number of read operations with the disk, which more fully utilizes the disk bandwidth.
This combination of cache and disk array technology makes up for the shortcomings of disk arrays (such as poor response to chunk write requests and other defects), so that the entire system to efficient, fast, high-capacity, high-reliability, as well as flexible and convenient storage system to provide users with the need to meet the needs of the current technological development, especially multimedia systems.
Analyze the key technology of disk array
Storage technology in computer technology has been widely concerned, the server storage technology is the hot spot of the industry's concern. When it comes to server storage technology, people almost immediately associated with SCSI (Small Computer Systems Interface) technology. Although the cheap IDE hard disk in the performance, capacity and other key technical indicators have been greatly improved, can meet or even exceed the original server storage equipment needs. However, due to the popularity and rapid development of the Internet, the size of the network server has become increasingly large. At the same time, the Internet not only on the network server itself, but also on the server storage technology puts forward harsh requirements. The endless market demand prompts the rapid development of server storage technology. Disk array is one of the more mature server storage technologies and one of the more common high-capacity peripherals on the market.
At the high end, traditional storage models are unable to meet the expanding storage needs of specialized applications, whether in terms of scale, security, or performance. New technologies or application programs such as Storage Area Networks (SANs) are emerging, new storage architectures and solutions are emerging, and server storage technology is expanding from Direct Attached Storage (DAS) to Storage Networking Technology (NAS). In the middle and low end, with the continuous development of hardware technology, driven by strong market demand, localized, direct-attachment-based disk array storage technology, in terms of speed, performance, storage capacity and so on, continue to step up to a new level. Moreover, in order to meet the user's demand for storage data security, access speed and large storage capacity, disk array storage technology has gradually entered the product popularization period, which emphasizes industrial standards, focuses on market scale, and is dominated by mature products, from the technology promotion period, which emphasizes technological innovation, emphasizes system optimization, and is dominated by technological solutions.
Reviewing the development of disk arrays, has been closely associated with the development of SCSI technology, some vendors introduced proprietary technologies, such as IBM's SSA (Serial Storage Architecture) technology, due to compatibility and upgrade capability is not satisfactory, the impact of the market is far less extensive than SCSI technology. As SCSI technology compatibility is good, the market demand is strong, making SCSI technology is developing very quickly. From the original 5MB / s transfer speed SCSI-1, has been developed to the current LVD interface 160MB / s transfer speed of Ultra 160 SCSI, 320MB / s transfer speed of Ultra 320 SCSI interface will also appear in 2001 (see Table 1). From the current market, Ultra 3 SCSI technology and RAID (Redundant Array of Inexpensive Disks) technology should also be the mainstream technology for disk array storage.
SCSI technology
SCSI itself is for small machines (as opposed to microcomputers) customized storage interface, SCSI protocol Version 1 version of the SCSI-1 bus type, interface definitions, cable specifications, and other technical standards only provides a transfer speed of 5MB / s SCSI-1. With the development of technology, the SCSI protocol Version 2 version of a major revision to follow the SCSI-2 protocol of 16-bit data bandwidth, high-frequency SCSI storage devices have emerged and become the mainstream products of the market, but also makes the SCSI technology has firmly occupied the server storage market. SCSI-3 protocol increases the ability to meet the special equipment protocols required for the command set SCSI-3 protocol adds a set of commands that can meet the needs of special equipment protocols, making SCSI protocols both to adapt to the traditional parallel transmission devices, but also to adapt to the latest emergence of a number of serial equipment communication needs, such as Fibre Channel Protocol (FCP), Serial Storage Protocol (SSP), serial bus protocols. Gradually, the concept of "small machine" began to weaken, "high-performance computers" and "server" concept in people's minds to be strengthened, SCSI once became the user from the hardware to distinguish "server". SCSI has become a standard for users to differentiate between "servers" and PCs in terms of hardware.
Often, the user's concern with the SCSI bus is the hardware, and different SCSI operating modes mean different maximum transfer speeds. For example, 40MB/s Ultra SCSI, 160MB/s Ultra 3 SCSI, and so on. But the maximum transfer speed does not mean that the normal operation of the device can reach the average access speed, does not mean that different SCSI mode of operation between the speed of access there must be a "multiple" relationship. the actual access speed of the SCSI controller and SCSI hard disk model, technical parameters, as well as the length of the transmission cable, Anti-interference ability and other factors are closely related. To improve the efficiency of the SCSI bus must pay attention to the configuration of the SCSI device end and the transmission cable specification and quality. As you can see, the actual access speeds obtained in Ultra 3 mode are less than two times the actual access speeds obtained in Ultra Wide mode.
Generally speaking, the selection of high-speed SCSI hard drives, an appropriate increase in the number of hard drives connected to the SCSI channel, and optimization of the way applications access disk data can significantly improve the actual transmission speed of the SCSI bus. In particular, it should be noted that, under the same conditions, different disk access to the actual transmission speed of the SCSI bus can be tens of times the difference between the optimization of the application is to obtain high-speed storage access must be concerned about the focus of this is often ignored by some users. According to the 4KB data block random access to 6 SCSI hard disk, SCSI bus actual access speed of 2.74MB / s, SCSI bus efficiency is only 1.7% of the bus bandwidth; in completely unchanged conditions, according to the 256KB block of data on the hard disk for the sequential read and write, the SCSI bus actual access speed of 141.2MB / s, the SCSI bus The SCSI bus works as efficiently as 88% of the bus bandwidth.
With the improvement of transmission speed, the signal attenuation and interference problems during signal transmission become more and more prominent, and the terminator can play a role in reducing the signal wave reflection and improving the signal quality to a certain extent. At the same time, the application of LVD (Low-Voltage Differential) technology is also more and more. LVD working mode is corresponding to SE (Single-Ended) mode, which can well resist the transmission interference and extend the transmission distance of the signal. Meanwhile, the Ultra 2 SCSI and Ultra 3 SCSI modes also improve the quality of signal transmission by using specialized twisted pair type SCSI cables.
In the concept of disk arrays, high-capacity hard disk does not mean that the individual hard disk has a large capacity, but refers to the individual hard disk through the RAID technology, according to the RAID level combination into a larger capacity hard disk. So in disk array technology, RAID technology is more critical, at the same time, according to the chosen RAID level of different, get the "big hard disk" function is also different.
RAID is a very mature technology, but because of its relatively expensive, configuration is not convenient, the lack of relatively professional technical personnel, so the application is not very popular. According to statistics, 75% of the world's server systems are not currently configured with RAID. due to the server storage needs for data security, scalability and other aspects of the increasingly high requirements, RAID market development potential is huge. RAID technology is an industry standard, the definition of the various vendors on the RAID level is not the same. Currently, there are only four definitions of RAID levels that are widely recognized in the industry, RAID 0, RAID 1, RAID 0+1, and RAID 5.
RAID 0 is the striping of storage space without data redundancy, and has a low-cost, extremely high read/write performance and high storage space utilization RAID level for Video/Audio signal storage, It is suitable for special applications such as Video/Audio signal storage, temporary file dumping, etc., which require extremely strict speed requirements. However, due to the lack of data redundancy, its security is greatly reduced, constituting an array of any one hard disk damage will bring about catastrophic loss of data. Therefore, it is unwise to configure more than 4 hard disks in RAID 0 for general applications.
RAID 1 is a complete mirroring of the data on two hard disks, with good security, simple technology, easy management, and good read/write performance. But it can not expand (single hard disk capacity), data space waste, strictly speaking, should not be called "array".
RAID 0+1 combines the features of RAID 0 and RAID 1, independent disks configured as RAID 0, two complete sets of RAID 0 mirroring each other. It has excellent read/write performance and high security, but the cost of building the array is a big investment and the data space utilization is low, so it can't be called a cost-effective solution.
RAID 5 is the most widely used RAID technology today. Each individual hard drive is striped, the same striped area is parity-checked (heterodyne), and the parity data is distributed evenly across each hard drive. RAID 5 arrays constructed with n hard disks can have the capacity of n-1 hard disks, with very high storage space utilization (see Figure 6). Data loss on any one hard disk can be deduced from the checksum data. The biggest difference between it and RAID 3 is whether the checksum data is evenly distributed to each hard disk. RAID 5 has the advantages of data security, fast read/write speeds, high space utilization, and is widely used, but the shortcoming is that after the failure of a hard disk, the performance of the entire system is greatly reduced.
For RAID 1, RAID 0 + 1, RAID 5 arrays, with hot-swappable (also known as hot replaceable) technology, you can achieve online data recovery, that is, when any hard disk in the RAID array is damaged, without the need for the user to shut down or stop the application services, you can replace the faulty hard disk, repair the system, restore data, and realize HA (High Availability) high availability system. Availability) high availability system is of great significance.
Vendors are also continuing to introduce various RAID levels and standards. For example, higher security, from the RAID controller to start mirroring the RAID; faster read and write speeds, for each hard disk constituting the RAID configuration of the CPU and Cache RAID, etc., but are not popular. IDE hard disk to build RAID technology is the emergence of a new technology direction, the market also has a greater impact, and its outstanding advantage is to build RAID arrays is very cheap. Currently, IDE RAID can support three levels of RAID 0, RAID 1 and RAID 0+1, and supports up to four IDE hard disks. Due to the limitations of IDE device scalability and the lack of hot-swappable technical support for IDE devices, IDE RAID is not yet widely used.
In short, development is an eternal theme, and the field of server storage technology is no exception. On the one hand, some giant manufacturers try to introduce new concepts or standards to lead the direction of development of server and storage technology, more representative of the IA-64 architecture and storage concepts such as Intel pushed; on the other hand, dedicated to the storage of professional vendors based on existing technology and industry standards, to promote SCSI, RAID, Fibre Channel and other existing storage technology and based on the rapid update and development of the program. Programs based on existing storage technologies and programs to rapidly update and develop. Under the condition of market economy, the only criterion to test the development of technology is the recognition of the market. The market calls for good technology, and new technologies must play a role in promoting the market forward to be widely accepted and recognized. With the development of high-performance computer market, high performance ratio, high reliability, high security of the storage of new technologies will continue to emerge.
There are many disk array products on the market now, and users should also choose disk array products according to their own needs, and now lists a few disk array products, while also providing some options for users who need disk array products. Table 2 lists the main technical specifications of several disk arrays.
--------------------------------------------------------------------------------
Trivia: Disk array reliability and availability
Reliability, which refers to the probability that a hard disk will fail under given conditions. Availability refers to the amount of time a hard disk is likely to be used for a given purpose. Disk arrays can improve the reliability of a hard disk system. A comparison of the reliability of a RAID hard disk subsystem versus a single hard disk subsystem can be seen in Table 3.
In addition, in terms of system availability, a single hard disk system has better availability than a disk array without data redundancy, while a redundant disk array has much better availability than a single hard disk. This is because redundant disk arrays allow a single hard disk to fail while continuing to function normally; the system recovery time after a single hard disk failure is also much shorter (compared to recovering data from tape); and in the event of a redundant disk array failure, the data on the hard disk is the same as it was at the time of the failure, and the replacement hard disk will contain the same data that was there at the time of the failure. However, to get full fault tolerance performance, the other components of the computer hard disk subsystem must also have redundancy.