How to optimize query speed for database tables with large amounts of data

The following relational database system Informix as an example, introduce ways to improve the user query plan.

1. Reasonable use of indexes

Indexes are an important data structure in the database, its fundamental purpose is to improve query efficiency. Now most of the database products are using IBM's first proposed ISAM index structure. Indexes should be used appropriately, and the principles of their use are as follows:

● Indexes are created on columns that are frequently concatenated but are not designated as foreign keys, while infrequently concatenated fields are automatically indexed by the optimizer.

●Build indexes on columns that are frequently sorted or grouped (i.e., group by or order by operations are performed).

● Create searches on columns with more different values that are frequently used in conditional expressions, and do not create indexes on columns with fewer different values. For example, in the Employee table, there are only two different values in the "Gender" column, "Male" and "Female", so there is no need to build an index. Instead of improving query efficiency, an index would seriously slow down the update rate.

● If there are more than one column to be sorted, you can create a compound index on these columns.

●Use system tools. For example, the Informix database has a tbcheck tool that can check on suspicious indexes. On some database servers, indexes may fail or make reads less efficient because of frequent operations. If a query using an index slows down for an unknown reason, try using the tbcheck tool to check the integrity of the index and repair it if necessary. In addition, when a database table is updated with a large amount of data, deleting and rebuilding the index can improve query speed.

2. Avoid or simplify sorting

Repeated sorting of large tables should be simplified or avoided. The optimizer avoids the sorting step when it can use indexes to automatically produce output in the proper order. The following are some of the influencing factors:

● The index does not include one or more columns to be sorted;

● The order of the columns in the group by or order by clauses is not the same as the order of the index;

● The columns to be sorted are from different tables.

To avoid unnecessary sorting, build indexes correctly and merge database tables wisely (although this may sometimes interfere with table normalization, it is worth it relative to the efficiency gain). If sorting is unavoidable, then attempts should be made to simplify it, e.g. by narrowing the range of columns to be sorted.

3. Eliminate sequential access to large table rows

In nested queries, sequential access to tables can be fatal to query efficiency. For example, using a sequential access strategy, a query with three nested levels would query 1 billion rows if each level queried 1,000 rows. The main way to avoid this is to index the joined columns. For example, two tables: the student table (student number, name, age ......) and the course selection table (student number, course number, grade). If the two tables are to be joined, an index should be created on the join field "student number".

A concatenation can also be used to avoid sequential access. Some forms of the where clause force the optimizer to use sequential access, even though there are indexes on all the columns checked. The following query will force a sequential operation on the orders table:

SELECT ＊ FROM orders WHERE (customer_num=104 AND order_num>1001) OR order_num=1008

While there are indexes on customer_num and order_num, the optimizer scans the entire table using the sequential access path in the above statement. Because this statement is retrieving a collection of separated rows, it should be changed to the following statement:

SELECT ＊ FROM orders WHERE customer_num=104 AND order_num>1001

UNION

SELECT ＊ FROM orders WHERE order_num=1008

This allows the query to be processed using the index path.

4. Avoiding Related Subqueries

A column label appears in both the main query and the query in the WHERE clause, and it is likely that the subquery will have to be queried again when the value of the column in the main query changes. The more levels of query nesting, the less efficient it is, so subqueries should be avoided as much as possible. If subqueries are unavoidable, filter out as many rows as possible in the subquery.

5. Avoiding Difficult Regular Expressions

The MATCHES and LIKE keywords support wildcard matches, technically called regular expressions. But this kind of matching is particularly time-consuming. For example, SELECT ＊ FROM customer WHERE zipcode LIKE "98_ _ _"

Even with an index on the zipcode field, a sequential scan is still used in this case. If you change the statement to SELECT ＊ FROM customer WHERE zipcode > "98000", the index will be utilized when executing the query, which will obviously greatly increase the speed.

Also, avoid non-starting substrings. For example, the statement: SELECT ＊ FROM customer WHERE zipcode[2,3]> "80" employs a non-starting substring in the where clause, and thus this statement will not utilize an index either.

6. Using Temporary Tables to Speed Up Queries

Sorting a subset of a table and creating a temporary table can sometimes speed up queries. It helps avoid multiple sorting operations, and it simplifies the optimizer's work in other ways. For example:

SELECT cust.name, rcvbles.balance, ...... other columns

FROM cust, rcvbles

WHERE cust.customer_ id = rcvlbes.customer_id

AND rcvblls.balance>0

AND cust.postcode> "98000"

ORDER BY cust. name

If this query is going to be executed more than once, find all the unpaid customers in a temporary file and sort by customer name:

SELECT cust.name, rcvbbles.balance, ...... other columns

FROM cust, rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

AND rcvblls.balance>0

ORDER BY cust.name

INTO TEMP cust_with_balance

Then look up in the temp table in the following way:

SELECT ＊ FROM cust_with_balance

WHERE postcode> "98000"

There are fewer rows in the temporary table than in the main table, and the physical order is the requested order, reducing disk I/O, so the query workload can be significantly reduced.

Note: Temporary tables do not reflect changes to the primary table after they are created. In the case of frequent data modifications in the main table, be careful not to lose data.

7. Replacing non-sequential accesses with sorting

Non-sequential disk accesses are the slowest operation, as evidenced by the back-and-forth movement of the disk access arm. the SQL statement hides this, making it very easy to write queries that require access to a large number of non-sequential pages when writing applications.

In some cases, replacing non-sequential accesses with the database's ability to sort can improve the query.

Is there a future for the electronic components industry at the moment?

What is the principle of Jitterbug's recommendation?

The HarmonyOS system that breeds everything is here! Huawei lays out the most important version of AIoT.

Is the population density high in Rencheng District, Jining City, Shandong Province?

What does Guiyang Big Data look like as a project? Is it legal?

How do you look at data in Poundland? I've heard that the DuoDuo data analysis tool of DuoDuo Intelligence passes the selection of products more comprehensively

What is the address of Wannsee College? Phone Website

Hydrogen production equipment leading listed companies

Epidemic prevention and control publicity work summary 7 articles

What is the altitude of Guiyang