Mysql Big Data Speed - Loan Platform Complete Network

Mysql Big Data Speed

? Recently, the projects involved have to operate millions of data, and the query efficiency of ordinary SQL has plummeted. If where are many query conditions, the query speed is simply unbearable. In the past, when the amount of data was small, the quality of query statements would not have any obvious impact on the execution time, so many details were ignored. ? test

? After testing, it takes up to 40 seconds to query the table containing more than 4 million records. I believe that any user will be crazy about such a high query delay. Therefore, how to improve the query efficiency of sql statements is very important. The following are several query sentence optimization methods widely circulated on the Internet:

? First of all, when the amount of data is large, we should try to avoid scanning the whole table, and consider establishing indexes on the columns involved in where and order by, which can greatly speed up the retrieval of data. However, in some cases, indexes don't work:

1, you should try to avoid using it in the where clause! = or operator, otherwise the engine will give up using the index and scan the whole table.

2. Try to avoid judging the null value of the field in the where clause, otherwise the engine will give up using the index and scan the whole table, such as:? Select the id from t, where num is empty? You can set the default value of 0 on num, ensure that the num column in the table has no null value, and then query like this:? Select id from t, where num=0.

3. Try to avoid using the or join condition in the where clause, otherwise the engine will give up using the index and scan the whole table, such as:? Select the id from t, where num= 10 or num=20? You can query like this:? select id from t where num= 10？ Joint ownership? Select id from t, where num=20.

4. The following query will also lead to a full table scan:

? Select the id from t, where the name is "%abc%".

? In order to improve efficiency, full-text retrieval can be considered.

5, in and not in should also be used with caution, otherwise it will lead to full table scanning, such as? select id from t where num in( 1，2，3)？ For continuous values, you can use between instead of in:? Select the id from t, where num is between 1 and 3.

6. If parameters are used in the where clause, it will also lead to a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot postpone the selection of access plan until runtime; You must select it at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown, so it cannot be used as an input item for index selection. If the following statement will scan the whole table:? select id from t where num=@num？ You can force the query to use an index instead:? Select id from t with(index) where num=@num.

7. Try to avoid expression operations on fields in the where clause, which will cause the engine to give up using indexes and scan the whole table. For example:? select id from t where num/2 = 100？ It should be changed to:? select id from t where num = 100 * 2

8. Function operations on fields in the where clause should be avoided as much as possible, which will cause the engine to abandon the use of indexes and scan the whole table. For example:? Select id from where substring (name, 1, 3) =' abc' _ name ID starts with abc? Select id from where datediff (date of creation,' 2005-11-30') = 0 _' 2005-11-30' generated id? It should be changed to:? Select the id from t, where the name is "abc%"? select id from t where create date & gt； ='2005- 1 1-30' and create the date.

9. Do not perform functions, arithmetic operations or other expression operations on the left side of "=" in the where clause, otherwise the system may not use the index correctly.

10. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index, otherwise the index will not be used, and the field order should be as consistent as possible with the index order.

1 1, don't write some meaningless queries, if you need to generate an empty table structure:? select col 1，col2 into #t from t where 1=0？ This kind of code will not return any result set, but it will consume system resources. Create table #t (? )

12, many times it is a good choice to replace in with exists: select num from a where num in(select num from b)？ Replace with the following statement:? select num from a where exists(select 1 from b where num = a . num)

? Matters needing attention when indexing:

1, not all indexes are valid for the query, and SQL optimizes the query according to the data in the table. When there are a lot of data duplicates in the indexed columns, SQL queries may not use indexes. For example, almost half of the fields in a table are men and women, so even if the indexes are based on gender, they will not play a role in query efficiency.

2. The more indexes, the better. Although the index can improve the efficiency of corresponding selection, it will also reduce the efficiency of insertion and update. Because the index may be rebuilt during insertion or update, how to build the index needs to be carefully considered according to the specific situation. The number of indexes in a table should not exceed 6. If there are too many indexes, consider whether it is necessary to establish indexes on some columns that are not commonly used.

3. Updating clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column values change, the order of the whole table records will be adjusted, which will consume considerable resources. If the application system needs to update the clustered index data columns frequently, it is necessary to consider whether the index should be built as a clustered index.

? Other places to pay attention to:

1. Try using a numeric field. If the field only contains numerical information, try not to design it as characters, which will reduce the performance of query and connection and increase the storage overhead. This is because the engine will compare each character in the string one by one when processing queries and connections, but only one comparison is enough for the number type.

2. Don't use select * from t anywhere, replace "*" with a specific field list, and don't return any unused fields.

3. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, please note that the index is very limited (only the primary key index).

4. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

5. Temporary tables are not unavailable. Using them correctly can make some routines more effective, for example, when it is necessary to repeatedly refer to data sets in large tables or public tables. However, for one-time events, it is best to use export tables.

6. When creating a temporary table, if you insert a large amount of data at a time, you can use select into instead of create table to avoid creating a large number of logs and improve the speed; If the amount of data is not large, in order to reduce the resources of system tables, tables should be created first and then inserted.

7. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored procedure. Truncate the table first, and then delete the table to avoid long-term locking of the system table.

8. Try to avoid using cursors, because cursors are inefficient. If the data of cursor operation exceeds 654.38+0 million rows, then it is necessary to consider rewriting.

9. Before using the cursor-based method or temporary table method, you should first find a set-based solution to solve the problem, and the set-based method is usually more effective.

10. Like temporary tables, cursors are not unavailable. Use fast forward for small data sets? Cursors are usually superior to other line-by-line processing methods, especially when multiple tables must be referenced to obtain the required data. A routine that contains "Total" in the result set is usually faster than using a cursor. If development time permits, both cursor-based method and set-based method can be tried to see which method works better.

1 1. Set SET NOCOUNT ON at the beginning and set set SET NOCOUNT OFF at the end of all stored procedures and triggers. There is no need to send the DONE_IN_PROC message to the client after executing each statement of the stored procedure and trigger.

12. Try to avoid returning a large amount of data to the client. If the amount of data is too large, it is necessary to consider whether the corresponding requirements are reasonable.

13. Try to avoid large transaction operations and improve system concurrency.

Reference address: blogs.com/luxf/archive/2012/02/08/2343345.html.

/luyee 20 10/ article/details /8309806

0 people left messages and attacked-> & gt here

How to check big data credit

What does big data and accounting do Career Direction

There's a TV series. It's a little dragon lady looking for her mom. Who knows about it?

Big data flight ticket

What's the meaning of Micromax Bank?

What kind of technology is good for girls to learn to get a job?

What are the reliable dating platforms

The digital economy is essentially the development of an enabling technology well. Right or wrong.

Who are the natural enemies of Uber and Didi?

Spring Festival travel rush Peak Big Data Forecast