First of all, to determine your goal, the so-called ten million is ten million queries per second or ten million records of the database, the former is an extremely complex, this is not just sue mysql can be solved, I do not think it is not the former, but the latter is a very simple thing, the premise is to define the high efficiency, to define the two metrics:
1, how many times the query per second
2
After determining the good and then consider the optimization of the following factors
1, the type of storage, SSD than ordinary disk random read and write capability can be improved quite a lot, generally 2 to 3 orders of magnitude, but also depends on the index and the size of the data block, it's more complicated
2, first choose the type of RAID, if you choose raid0 and raid10 can be improve the approximate 1 times the speed
3, the use of high-bandwidth network speed, you can reduce network transmission delay, with 10g fiber than 1g cable can theoretically improve the throughput of 1 order of magnitude, especially for large data data volume of the result set is particularly effective
4, a reasonable index, with the condition of the retrieval field plus the index
5, with a large wide table, as far as possible to reduce the number of multiple table correlation query, with space for time
6, _ with master-slave cluster, basically the concurrency of the query and the number of servers is directly proportional to the
7, the use of cache, such as memcached, especially for static data to enhance the particularly obvious
8, reasonable choice of database field types, with a fixed-length words, do not use variable-length, such as a fixed-length int, char, decimal type, do not use varchar, text, etc.
9, to the database configuration of a larger memory
10, check the bottleneck in the CPU, if the query is complex, change a higher configuration of the server
The total original just is, as far as possible, to use memory instead of touching the disk to improve the speed of the IO, improve the network and CPU configuration to reduce the query time. CPU configuration to reduce query time; as much as possible to improve network speed, memory and the number of hosts to improve concurrency
We first explore the implementation of non-high concurrency.
For fields with high query frequency, add indexes.
Indexing considerations:
1. It is best not to index those with long character content
2. According to the official document, a single table should not be indexed more than 16, and the length of the index should not exceed 256 bytes.
Adding indexes randomly will add burden to data maintenance
In fact, partitioning can be introduced.
Partitioning considerations:
1. Common types of partitioning are range, list, hash, key and so on. The one that is used more often is range partitioning.
2. For the initial establishment of the index, we tend to ignore a prerequisite, resulting in failure to add the error.
The prerequisite here is that if the table has a primary key, and the key of the partition is not the same as the primary key, then the key of the partition must also be the primary key.
After the introduction of partitioning, when data is written, the database will automatically determine which partition to write to
For higher concurrency, we have to consider splitting the library and table or adopting a master-multiple-slave approach, in addition to doing the above operations.
In the future, I believe that this type of problem needs to be solved using NewSQl databases such as TiDb, etc. At this time, we will not have to consider the problem of data partitioning, and can do unlimited expansion of the data level, and the dynamic distribution of hot data.