The PIR database is divided into four different sections according to the nature of the data and the level of annotation, namely, PIR1, PIR2, PIR3, and PIR4. Sequences in PIR1 have been validated and have the most detailed annotations; PIR2 contains redundant sequences that have not yet been identified; PIR3 contains sequences that have not yet been examined and have not been annotated; and PIR4 includes sequences obtained from other sources that have neither been validated nor annotated. obtained sequences that are neither validated nor annotated. In addition to PIR, another important protein sequence database is SwissProt, which was created by the University of Geneva in 1986 and is currently maintained and managed by the Swiss Institute of Bioinformatics (SIB) and the European Bioinformatics Institute (EBI***). In addition to the development and maintenance of SwissProt database, the Web server of Expert Protein Analysis System (ExPASy) under SIB is also the center of proteome and protein molecular modeling research in the world, providing users with a large number of protein information resources. The Bioinformatics Center of Peking University houses a mirror of ExPASy. PIR and SwissProt are two of the earliest created and most widely used protein databases. With the progress of various model organism genome programs, DNA sequences, especially EST sequences, have entered nucleic acid sequence databases in large numbers. The protein sequence database TrEMBL was created in 1996 [Bairoch, 2000], meaning "Translation of EMBL", from the cDNA sequences in EMBL. The database is in SwissProt database format and contains translations of all coding sequences in the EMBL database.The TrEMBL database is divided into two parts, SP-TrEMBL and REM-TrEMBL.The entries in the SP-TrEMBL will eventually be merged into the SwissProt database, while the entries in the Rem-TrEMBL database will be merged into the REM-TrEMBL database. Rem-TrEMBL, on the other hand, includes other remaining sequences, including immunoglobulins, T-cell receptors, small peptides with fewer than 8 amino acid residues, synthetic sequences, proprietary sequences, and so on. Similar to TrEMBL, GenPept is a protein sequence translated from GenBank. Since both TrEMBL and GenPept are generated by translating nucleic acid sequences through a computer program, both databases contain sequences with a large error rate, and both have large redundancies. Another commonly used protein sequence database is NRL-3D, a primary structure sequence database of proteins with known three-dimensional structures [Namboodiri, 1990]. The sequences of this database are extracted from the three-dimensional structure database PDB.