One function
1 High-speed access to the disk (speed): RAID will be ordinary hard disk to form a disk array, in the host to write data, the RAID controller to the host to write data is broken down into a number of blocks of data, and then written in parallel to the disk array; the host reads the data, the RAID controller to read data dispersed in the array of hard disks, the data dispersed in parallel, and then written to the array; host reading data, the RAID controller to read data dispersed in the array of hard disks, and the data dispersed in the array of hard disks, and the RAID controller to read data. When the host reads the data, the RAID controller reads the data dispersed on each hard disk in the disk array in parallel, and provides them to the host after recombination. As a result of the parallel read and write operations, thereby increasing the access speed of the storage system access system.
2 Expansion
3 Data redundancy
II. Classification
RAID can be divided into level 0 to level 6, commonly known as: RAID0, RAID1, RAID2, RAID3, RAID4, RAID5, RAID6.
RAID0: RAID0 is not a true RAID RAID0 is not a true RAID structure, there is no data redundancy, RAID0 splits the data continuously and reads/writes on multiple disks in parallel. Therefore, it has high data transfer rate, but RAID0 does not provide data reliability while improving performance, if one disk fails, it will affect the whole data. Therefore, RAID0 should not be used for critical applications that require high data availability.
RAID1: RAID1 achieves data redundancy through data mirroring, which produces mutual backups on two separate pairs of disks.RAID1 improves read performance by reading data directly from the mirrors when the original data is busy.RAID1 is the most costly of disk arrays, but offers the highest data availability. When a disk fails, the system can automatically swap to the mirrored disk without reorganizing the failed data.
RAID2: Conceptually, RAID2 is similar to RAID3 in that both stripe data across different disks in bits or bytes. However, RAID2 uses a coding technique called "Weighted Average Error Correction" to provide error checking and recovery. This coding technique requires multiple disks to hold the checking and recovery information, making RAID2 technology more complex to implement. As a result, it is rarely used in commercial environments.
RAID3: Unlike RAID2, RAID3 uses a single disk to store parity information. If one disk fails, the parity disk and other data disks can regenerate the data. If the parity disk fails, the data is not affected.RAID3 provides good transfer rates for large amounts of contiguous data, but for random data, the parity disk can be a bottleneck for write operations.
RAID4: Like RAID2 and RAID3, RAID4 and RAID5 also stripe data and distribute it across different disks, but the stripe unit is a block or a record.RAID4 uses a single disk as the parity disk, which requires access to the parity disk every time a write operation occurs, making it a bottleneck for write operations. It is rarely used in commercial applications.
RAID5: RAID5 does not have a single designated parity disk, but instead cross accesses data and parity information on all disks. On RAID5, the read/write pointers can operate on the array devices simultaneously, providing higher data traffic. RAID5 is better suited for small chunks of data, random reads and writes. an important difference between RAID3 and RAID5 is that RAID3 involves all array disks for every data transfer. For RAID5, most data transfers are performed to only one disk, allowing for parallel operations. In RAID5, there is "write loss," which means that for every write operation, there are four actual read/write operations, two to read the old data and parity information, and two to write the new data and parity information.
RAID6: RAID6 adds a second independent block of parity information compared to RAID5. The two independent parity systems use different algorithms for very high data reliability. Even if both disks fail at the same time, the data will not be affected. However, a larger amount of disk space needs to be allocated to the parity information, and there is a greater "write loss" compared to RAID 5. The write performance of RAID 6 is very poor, and the poor performance and complexity of the implementation make RAID 6 seldom used.
Three, in detail
RAID0 is with the purpose of speed and expansion
In RAID0 mode, the data is partitioned into a certain number of data chunks (Chunks) cross-written on multiple hard disks, generally speaking, in the RAID0 system, the number of data is partitioned with the number of hard disks used by RAID arrays is related to the number of hard disks, for example. RAID0 uses three hard disks, so the data will be divided into three copies of the three hard disks written in turn, commonly referred to as this mode is in fact the use of RAID technology to allow the system to believe that the three hard disks to form a larger capacity of the hard disk, because the process does not have data checksum so this RAID mode is the fastest read and write speeds of a kind.
RAID0 is not considered from the security point of view, in fact, if one of the hard disk in RAID0 is broken, all the data will be damaged, and there is no way to recover. This makes RAID0 a very poor security feature, so many users do not use RAID0 mode for security reasons. However, RAID0 is the fastest of all RAID modes, and if there are two hard disks in RAID0 mode, then RAID0 storage will read data twice as fast as a single hard disk. If six hard disks are used, then the theoretical rate is six times that of a single hard disk. If you use different hard disks in RAID0 mode, it will cause two problems, firstly, the effective hard disk capacity of RAID0 will be the capacity of the smallest hard disk multiplied by the number of hard disks, this is because if the capacity of the smallest hard disk is full, the RAID0 will still distribute the files to each hard disk evenly, at this time, it will not be able to complete the task of storing; secondly, if the hard disk speeds of RAID0 differ, then the overall storage speed will be twice as fast as a single hard disk. Secondly, if the speed of the hard disk in the RAID0 is different, then the overall speed will be the slowest hard disk speed multiplied by the number of hard disks, this is because the RAID0 mode is the need for the last part of the storage task to be completed before the next step of the process, so that other fast hard disk will stop and wait for the slower hard disk to complete the task of storing or reading, so that the overall performance has declined. Therefore, it is recommended that users of RAID 0 mode choose hard disks with the same capacity and speed, and preferably of the same brand.
So RAID0 is not strictly a "redundant standalone array", RAID0 mode is generally used for applications that require fast data processing but do not require high levels of data security. This RAID mode is characterized by simplicity and does not require a complex and expensive controller. A minimum of two hard disks are required for RAID0 mode, and the resulting storage capacity is the sum of these two disks.
RAID0 random read performance: very good
RAID0 random write performance: very good
RAID0 sustained read performance: very good
RAID0 sustained write performance: very good
RAID0 benefits: fastest read and write performance, even better if each drive has its own controller. Better.
RAID0 Cons: All data is lost if any one drive fails, and most of the controllers are software-based, so performance is not great.
RAID1
RAID1 mode is to make the hard disks in RAID1 mode mirror each other, so that when you write data to the hard disks, the two hard disks will store the same data at the same time, so that even if one of the hard disks fails, the system can function normally with the other hard disk. RAID1 has better read performance than a single hard disk, because when one hard disk is busy, the RAID controller can read the same data from the other hard disk, but the write performance does not increase and may have a slight decrease. When one of the hard disks fails, the new data can be written to the hard disk that is still functioning properly, and the RAID controller will automatically copy the data to the new hard disk when it is replaced by the new hard disk.The most important feature of the RAID1 mode is the high level of redundancy, but since most of the functions are implemented by software, it will increase the burden on the processor. This RAID mode is ideal for people who demand the utmost in data security.
In RAID1 mode, the hard disks used should ideally be identical, otherwise there will be a waste of hard disk space. Because RAID1 mode writes the same information to different hard drives, the effective hard drive capacity for RAID1 mode is the capacity of the smallest drive in the array. For example, if there is a 20GB hard disk and a 30GB hard disk in RAID1 mode, then the overall RAID1 effective capacity is 20GB, and the remaining 10GB on the 30GB hard disk will be wasted. Meanwhile, if the two drives have different speeds, the faster drive will still stop and wait for the slower drive to complete its task before moving on.
Random read performance of RAID1: good
Random write performance of RAID1: good
Continuous read performance of RAID1: average
Continuous write performance of RAID1: good
Advantages of RAID1: high data reliability, easy to implement, and simple design.
Disadvantages of RAID1: slower than RAID0, especially write speeds, and the fact that we can only use half the capacity of the hard disk.
RAID0+1
This RAID mode is actually a combination of RAID0 and RAID1 modes and requires at least 4 hard disks. Any two of them form a RAID0 array, and then the two RAID0 arrays can be viewed as two larger capacity, faster hard drives, which in turn form a RAID1 array. Such a system ensures higher disk performance and higher data security. Of course, the drawbacks are obvious: higher cost and more complex construction. RAID0+1 is second only to RAID5 in terms of fault-tolerance, and is generally used in file servers and the like.
RAID0+1 random read performance: very good
RAID0+1 random write performance: good
RAID0+1 sustained read performance: very good
RAID0+1 sustained write performance: good
RAID0+1 advantages: relative to a single hard disk has a higher read/write performance. And it greatly improves data security.
Disadvantages of RAID0+1: Higher cost, requires at least 4 hard disks.
RAID2
RAID2 mode is also quite complex, the hard disk used to store data to RAID0 mode to combine, plus special storage Heming ECC checksum hard disk, of course, in order to improve the checksum data security, checksum hard disk is at least two RAID1 mode. So that even if the data stored on one of the hard disk is damaged, the RAID controller can through the Heming code to restore data to the new hard disk.RAID2 is generally aimed at large amounts of data operations and supercomputer applications, but not suitable for ordinary users. The performance of this type of disk array is not high because of the generation of checksums during data storage. For various reasons this disk array model has not been put into practical commercial applications. Because of its high price, it is certainly not acceptable to the average user.
Random read performance of RAID2: average
Random write performance of RAID2: poor, mainly because of all the operations have to go through the ECC algorithm
Continuous read performance of RAID2: very good
Continuous write performance of RAID2: average
Advantages of RAID2: data security is high. Data can be recovered as long as the hard disk holding the checksum does not fail.
Disadvantages of RAID2: Expensive, requires a dedicated hard disk to store the checksum, not very efficient, not supported by commercial applications.
RAID3
Like the RAID2 model, RAID3 data is divided into chunks and stored on multiple hard disks in sequence. Only RAID3 splits the data in bits and stores it on each hard disk. The advantage of RAID3 is its high speed read/write capability, although write performance is affected by the need to generate parity codes during the writing process - it also requires a dedicated hard disk to store the parity codes. When one of the hard disks storing data fails, the system will still function normally, but performance will be affected, and if another hard disk fails before replacing the bad one, then all the data in the array will be lost and unrecoverable. In this type of array mode, all the hard drives are required to have synchronized speeds, which is a difficult requirement in practice. RAID3 requires at least three hard drives, one of which is used to store the parity code - the parity code is obtained through a heterodyne operation.
This RAID mode will have a significant impact on performance if implemented with a software controller because of the complexity of the combination, but it can be implemented with a minimum of three hard disks compared to RAID0+1 -- so the cost is reduced, and overall this type of array is better suited for applications such as video processing and editing. editing applications.
Random read performance of RAID3: good
Random write performance of RAID3: poor
Sustained read performance of RAID3: very good
Sustained write performance of RAID3: fair
Advantages of RAID3: It is better suited for video editing and other applications that require a large amount of data to be recalled.
RAID3's disadvantages: It is very difficult to synchronize the RPM of each drive (most current hard drives don't support this feature) and requires a complex controller.
RAID4
RAID4 mode is almost identical to RAID3, in that the data is divided into smaller chunks and stored sequentially on multiple hard drives, with the parity code stored on a separate parity disk. The only difference is that the data is divided in bits for RAID3 and bytes for RAID4. This allows RAID4 to have the same read speed as RAID3, although write performance is affected by the need to generate the parity code during the write process and store it on the parity disk.
The great thing about this mode is that it doesn't require the drives to be synchronized in terms of RPM, which makes the controller less complex. It has the worst write performance of all the RAID modes. As with RAID 3, data is not lost when one of the hard disks is damaged, but if the second hard disk fails before the failed disk is replaced, all data will be lost. Recovering data from a failed disk is much less efficient than in other RAID modes.
This RAID mode also requires at least three hard disks to build. The parity code is obtained by a different-or operation. It is suitable for general applications, including applications such as video processing. It's also not too expensive to build, as it only requires one hard disk to be used as a checksum disk.
Random read performance of RAID4: Very good
Random write performance of RAID4: Fair, mainly because of the need to write checksums to the parity disk
Continuous read performance of RAID4: Good
Continuous write performance of RAID4: Fair
Benefits of RAID4: In addition to the benefits of RAID3, it does not require synchronous drives. advantages, it does not require synchronization of drive speeds.
Disadvantages of RAID4: Very poor write performance and high controller requirements.
RAID5
RAID5 uses at least three hard disks to implement the array, which can realize both the acceleration function of RAID0 and the backup data function of RAID1. When there are three hard disks in the array, it will store the required data according to the user-defined segmentation size into file fragments and store them in two hard disks. At this time, the third hard disk in the array does not receive file fragments, it receives a part of the data used to verify the data stored in the other two hard disks, this part of the verification data is generated by a certain algorithm, you can use this part of the data to recover the data stored in the other two hard disks. In addition, the task of the three hard disks is not static, that is, in this storage may be the first hard disk and hard disk No. 2 used to store the file fragments after the partition, then in the next storage may be the second hard disk and hard disk No. 3 to complete the task. It can be said that in each storage operation, the task of each hard disk is randomly assigned, but, certainly, two hard disks are used to store the split file fragments and another hard disk is used to store the checksum information.
This parity information is usually calculated by the RAID controller, and usually requires a separate chip on the RAID controller to calculate and decide which hard disk to send this information to for storage.
RAID5 also implements the high-speed storage reads of RAID0, and also implements the data recovery features of RAID1, which means that in the case described above, RAID5 is capable of storing and retrieving data at high speeds. In this case, RAID5 is able to use three hard disks to double the speed of RAID0 as well as the data backup function of RAID1, and when one of the hard disks in RAID5 is damaged, the addition of a new hard disk will also allow for the restoration of data.
RAID5 is the most complex controller design of the RAID modes we've introduced so far, and it can be used in most areas, such as multi-user and multi-tasking environments. Many current Web servers and other Internet servers use this form of disk array, such as the recently introduced Quantum Snap server, which uses an external RAID5 disk array design. Parity typically takes up about 33% of the capacity of the disk space, so for a RAID5 array with a total capacity of 120GB, the available space would be about 80GB. However, this type of disk array mode is not supported in the RAID controllers of the general motherboard process, for example, the Abit KR7A-RAID motherboard only supports RAID0, RAID1, and RAID0+1. Of course, as long as the parity code is used, write performance will be affected to a certain extent, so many disk array vendors have added write caches to the disk array to improve the write performance.
RAID5 mode isn't all that great. If the information on one of the hard drives in the array changes, then the file splitter fragments need to be recalculated, and the parity information needs to be recalculated as well, which would require all three hard drives to be recalled. Similarly, if you want to do a RAID5 array, it is best to use hard drives with the same capacity and the same speed. The effective capacity of the RAID5 mode is the capacity of the smallest hard drive in the array multiplied by the number of hard disks in the array minus one, and the number of hard disks minus one is because one of the hard disks is used to store the parity information.
Random read performance of RAID5: very good (when using large blocks)
Random write performance of RAID5: average, but better than RAID3 or RAID4
Sustained read performance of RAID5: good (when using small blocks)
Sustained write performance of RAID5: average
The advantages of RAID5 are that it can be used to read data from the hard disks in the array, and it can also be used to write data to the hard disks in the array. p>Advantages of RAID5: no need for a dedicated checksum disk, fast read speeds, and solves the problem of relatively slow write speeds.
Disadvantages of RAID5: Write performance is still not as good as it could be.
RAID6
RAID6 is a new technology in the RAID family that expands on RAID5. So like RAID5, data and checksums are divided into data blocks and stored on each hard drive in the array. RAID6 adds a separate checksum disk, which backs up the checksums distributed across the disks, so that RAID6 arrays can allow multiple disks to fail at the same time, which is essential for applications with high data security requirements. A minimum of 4 hard disks are required to build a RAID6 array. However, RAID6 does not improve the poor write performance of RAID5, the application of write caching can only make up for this shortcoming to a certain extent, but not from the root of the problem. Because both RAID5 and RAID6 can change the size of the data blocks depending on the application, the actual performance is still affected by this factor.
In practice, RAID6 is not as widely used as the other RAID modes. It requires a more complex and expensive RAID controller design, so it is generally not integrated into the motherboard.
Random read performance of RAID6: Very good (when using large blocks)
Random write performance of RAID6: Poor, because it is necessary to write checksum data not only on each disk but also on a dedicated checksum disk
Sustained read performance of RAID6: Good (when using small blocks)
Sustained write performance of RAID6: Fair (when using small blocks)
RAID6's Sustained write performance: average
RAID6 benefits: fast read performance, higher fault tolerance.
Disadvantages of RAID6: very slow write speeds, RAID controllers are more complex and costly to design.
Hot-swap and hot-redundancy
Hot-swap and hot-redundancy capabilities are generally available in RAID systems. Hot-swapping allows a failed drive to be replaced without shutting down the system or powering it up, and the new drive can be dynamically recognized by the system and configured and added correctly, all without having to reboot the computer. The benefits of doing so are indisputable, for the maintenance staff is very simple, and for many applications, such as Web servers, users do not want the server downtime, which will cause immeasurable losses. Many HP/DELL server products and RAID disk arrays have hot-swap capability.
Hot redundancy is generally used in situations where hot-swap is not appropriate. This design is generally configured in the computer before the failure occurs additional hard disk, when there is a hard disk failure, this redundant can automatically take the place of the failed hard disk, for such a system in the system shutdown before the damaged hard disk can not be unplugged. Hot redundancy is not as convenient as hot-swapping, but it's better than nothing.
Summary
In fact, there are many types of disk arrays, we introduced today is part of the basic application mode, in order to achieve sufficient performance and stability in the actual application, you can use a variety of RAID modes with the use of the RAID, of course, so that for the requirements of the RAID controller will be higher, and the cost of the disk array system is also higher.
The RAID used by servers is generally SCSI-based, so the cost of such a RAID system will be even higher. In fact, this feature for our personal applications still has a certain distance, even if you have a motherboard with an integrated RAID controller, you need at least 2 hard disk (the general requirements of the two hard disk in the capacity, brand, speed are the same), for individual users this is a lot of money. Of course, if you have special needs, such as the need to assume a workstation or Web server, but do not want to spend too much money, then IDE RAID is still a good choice. Here's a reminder that the typical onboard IDE RAID has a high processor footprint, and that IDE RAID is not as good as SCSI hard disks in some applications