mysql million data paging query 4 seconds, advice on how to optimize

Many applications tend to display only the newest or most popular records, but in order for the old records to still be accessible, a paged navigation bar is needed. However, how to better implement paging through MySQL is always a headache. While there is no one-size-fits-all solution, understanding the underpinnings of the database can help optimize paging queries to a greater or lesser extent.

Let's start with a common but poorly performing query.

SELECT *

FROM city

ORDER BY id DESC

LIMIT 0, 15

This query took 0.00sec. So, what's wrong with this query? Actually, there is nothing wrong with this query statement or parameters because it uses the primary key of the following table and only reads 15 rows.

CREATE TABLE city (

id int(10) unsigned NOT NULL AUTO_INCREMENT,

city varchar(128) NOT NULL,

PRIMARY KEY (id)

) ENGINE=InnoDB;

The real problem is when the offset is large, like this:

SELECT *

FROM city

ORDER BY id DESC

LIMIT 100000, 15;

The above query takes 0.22sec when there are 2M rows, and looking at the SQL execution plan via EXPLAIN reveals that the SQL retrieved 100015 rows, but ended up needing only 15 rows. Large paging offsets increase the amount of data used, and MySQL loads a lot of data into memory that won't end up being used. Even if we assume that most website users only access the first few pages of data, a small number of requests with large paging offsets can be detrimental to the overall system. facebook realizes this, but instead of optimizing the database so that it can handle more requests per second, facebook is focusing on making the variance of the request response time smaller.

Another piece of information that is also important for paging requests is the total *** number of records. We can easily get the total number of records with the following query.

SELECT COUNT(*)

FROM city;

However, the above SQL takes 9.28sec with InnoDB as the storage engine. an incorrect optimization would be to employ SQL_CALC_FOUND_ROWS,SQL_CALC_FOUND_ROWS It is possible to prepare the number of eligible rows in advance while being able to page through the query, and then just execute the sentence select FOUND_ROWS(); to get the total number of rows. In most cases, however, a shorter query statement does not mean improved performance. Unfortunately, this type of paging query is useful in many mainstream frameworks, so here's a look at the query performance of this statement.

SELECT SQL_CALC_FOUND_ROWS *

FROM city

ORDER BY id DESC

LIMIT 100000, 15;

This statement took 20.02sec, which is twice as long as the previous one. It turns out that using SQL_CALC_FOUND_ROWS for paging is a very bad idea.

Here's a look at exactly how to optimize it. The article is divided into two parts, the first part is how to get the total number of records and the second part is to get the real records.

Efficiently counting rows

If the engine used is MyISAM, you can just execute COUNT(*) to get the number of rows. Similarly, the number of rows is stored in the heap table in the table's meta information. But if the engine is InnoDB the situation is a bit more complicated, because InnoDB does not store the specific number of rows in the table.

We can cache the number of rows, and then we can execute the following statement when the cache is invalidated by a daemon that updates it periodically or by some user action:

SELECT COUNT(*)

FROM city

USE INDEX(PRIMARY);

Fetch Records

Now on to the most important part of this post, getting the records that the pagination is going to display. As mentioned above, large offsets can affect performance, so we're going to rewrite the query statement. For demonstration purposes, let's create a new table "news", sorted by topicality (newest releases at the top), and implement a high-performance paging. For simplicity, we'll assume that the Id of the latest published news is also the largest.

CREATE TABLE news(

id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,

title VARCHAR(128) NOT NULL

) ENGINE=InnoDB;

a A more efficient way is based on the last news Id displayed by the user. the statement to query the next page is as follows and needs to be passed the last Id displayed on the current page.

SELECT *

FROM news WHERE id < $last_id

ORDER BY id DESC

LIMIT $perpage

The statement for querying the previous page is similar, except that you need to pass in the first Id of the current page, and in reverse order.

SELECT *

FROM news WHERE id > $last_id

ORDER BY id ASC

LIMIT $perpage

The above query is suitable for simple paging, i.e., it doesn't show the specific pages to navigate to, but only the "Previous" and "Next", for example, the footer of the blog displays "Previous" and "Next" buttons. button. But if you want to realize the real page navigation is still difficult, here is another way to see.

SELECT id

FROM (

SELECT id, ((@cnt:= @cnt + 1) + $perpage - 1) % $perpage cnt

FROM news

JOIN (SELECT @cnt:= 0)T

WHERE id < $last_id

ORDER BY id DESC

LIMIT $perpage * $buttons

WHERE cnt = 0;

With the above statement it is possible to compute, for each of the paged buttons, a There is another advantage to this approach. Let's say that a new article is being published on the site, then all the articles will be moved back one place, so if the user changes pages while the article is being published, then he will see the article twice. This problem is solved by fixing the offset Id of each button, and Mark Callaghan has published a similar blog that utilizes a combined index and two position variables, but the basic idea is the same.

If the records in the table are rarely deleted or modified, it is also possible to store the page number corresponding to the record in the table and create an appropriate index on that column. With this approach, when a new record is added, the following query needs to be executed to regenerate the corresponding page number.

SET p:= 0;

UPDATE news SET page=CEIL((p:= p + 1) / $perpage) ORDER BY id DESC;

Of course, it is possible to add a new table dedicated to paging, which can be maintained with a background program.

UPDATE pagination T

JOIN (

SELECT id, CEIL((p:= p + 1) / $perpage) page

FROM news

ORDER BY id

ON C. id = T.id

SET T.page = C.page;

Now it's easy to get the elements of any page:

SELECT *

FROM news A

JOIN pagination B ON A.id=B.ID

WHERE page=$offset;

There is another way to do paging that is similar to the previous method, but it is more useful in cases where the data set is relatively small and there are no indexes available - such as when processing search results. Executing the following query on an average server with 2M rows would take about 2sec. It is simpler to create a temporary table to store all the Id's (this is where it takes the most performance).

CREATE TEMPORARY TABLE _tmp (KEY SORT(random))

SELECT id, FLOOR(RAND() * 0x8000000) random

FROM city;

ALTER TABLE _tmp ADD OFFSET INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, DROP INDEX SORT,ORDER BY random;

The next step is to perform a paging query as below.

SELECT *

FROM _tmp

WHERE OFFSET >= $offset

ORDER BY OFFSET

LIMIT $perpage;

Simply put, optimization for paging is. Avoid scanning too many records when there is a large amount of data.

Nursing graduate thesis topics