Current location - Loan Platform Complete Network - Big data management - What are the distributed file storage systems
What are the distributed file storage systems
Currently several mainstream distributed file system in addition to GPFS, there are PVFS, Lustre, PanFS, GoogleFS and so on. Specifically introduced as follows:

1, PVFS (Parallel Virtual File System) project is Clemson University in order to run Linux clusters created by an open source project, the current PVFS also has the following shortcomings:

(1) a single management node: there is only one management node to manage metadata, when the cluster After the system reaches a certain size, the management node will likely be over-busy, then the management node will become the system bottleneck;

(2) the lack of fault-tolerant mechanism for data storage: when an I/O node does not work, the data will be unavailable;

(3) static configuration: PVFS configuration can only be done before the startup, and once the system is running, the original configuration can not be changed. then the original configuration cannot be changed.

2, Lustre file system is a distributed file system based on object storage, this project was launched in 1999 at Carnegie Mellon University, Lustre is also an open source project. It has only two metadata management nodes, similar to PVFS, when the system reaches a certain size, the management node will become a bottleneck in the Lustre system.

3, PanFS (Panasas File System) is Panasas company used to manage their own cluster storage system distributed file system.

4, GoogleFS (Google File System) is a distributed file system designed by Google to meet the company's internal data processing needs.

5. Relative to other file systems, the main advantages of GPFS are the following three points:

(1) the use of distributed lock management and large data block strategy to support larger cluster systems, the file system's token manager for the block, inode, attributes and directory entries to establish a fine-grained locks, the first to obtain the locks of the client will be responsible for maintaining the consistency of the corresponding *** enjoyment of object management. management, which reduces the burden on metadata servers;

(2) having multiple metadata servers and distributed metadata makes the management of metadata no longer a bottleneck in the system;

(3) the token manager uses bytes as the smallest unit of locks, which means that unless two requests are accessing the same byte of data in the same file, the requests for accessing the data will never be in conflict. .