Current location - Loan Platform Complete Network - Big data management - Common mistakes Java programmers make when writing SQL!
Common mistakes Java programmers make when writing SQL!

Java programmers need to program with a mix of object-oriented thinking and general imperative programming methods, whether the perfect combination of the two depends entirely on the caliber of the programmer:

Skills (Anyone can easily learn imperative programming) Patterns (Some people use the term "pattern-patterns", for example, to refer to patterns that can be applied anywhere and can be categorized as a certain type of pattern. Patterns (some people use the term "pattern-pattern", for example, which can be applied anywhere and can be classified as a certain type of pattern) State of mind (first of all, it's much harder to write a good object-oriented program than a command program, and you have to put some effort into it)

But it's not the same when a Java programmer writes a SQL statement, which is a descriptive language rather than an object-oriented or command programming language. It's easy to write a query statement in SQL. But it's not easy to write a similar statement in Java, because the programmer has to think about the programming paradigm over and over again, not only the programming paradigm, but also the algorithm.

Here are 10 common mistakes (in no particular order) that Java programmers make when writing SQL.

1. Forgetting about NULL

The Java programmer's misunderstanding of NULL when writing SQL is probably the biggest mistake. Perhaps it's because (not the only reason) NULL is also called UNKNOWN, which is a bit more understandable if it's called UNKNOWN. Another reason is that when you take something from the database or bind a variable, JDBC maps SQL NULL to null in Java. This leads to the misunderstanding of NULL = NULL (SQL) and null = null (Java).

The biggest misunderstanding about NULL is when NULL is used as a row-value expression integrity constraint. Another misunderstanding occurs for NULL in the application of NOT IN anti-joins.

Solution:

Train yourself. When you write SQL to constantly think of the use of NULL: this NULL integrity constraints are correct, NULL does not affect the results?

2. Working with data in Java memory

Few Java developers understand SQL well. The occasional JOIN, and the odd UNION, well, but what about for window functions? And what about grouping collections? Many Java developers load SQL data into memory, convert that data to some similar collection type, and then use bounding loops on top of those collections (at least until Java 8's collection upgrade) to perform annoying math.

But some SQL databases support advanced (and standard SQL-supported) OLAP features that are much better behaved and easier to write. One (not-so-standard) example is Oracle's awesome MODEL clause. Just let the database do the processing and only bring the results into Java memory. Because after all, all the very smart guys have optimized these expensive products. So in effect, by moving OLAP to the database, you get two benefits:

Convenience. It's probably easier than writing the right SQL in Java. Performance. The database should process it faster than your algorithms. And more importantly, you won't have to pass millions of records.

Solution:

Every time you implement a data-centric algorithm using Java, ask yourself: is there a way to get the database to do this hassle for me instead.

3. Use UNION instead of UNION ALL

Compared to UNION UNION ALL requires additional keywords. If the SQL standard already provides support for it, then it might be better.

UNION (allow duplicates) UNION DISTINCT (remove duplicates)

Removing duplicate rows is not only rarely necessary (and sometimes wrong), but can be quite slow for large collections with many rows, since the two subselects need to be sorted and each tuple needs to be compared to its child tuples.

Note that even though the SQL standard specifies INTERSECT ALL and EXCEPT ALL, very few databases implement these useless collection operators.

Solution:

Each time you write a UNION statement, consider whether a UNION ALL statement is actually needed.

4, through the JDBC paging technology to a large number of results of paging operations

Most databases will support some of the paging commands to achieve the effect of paging, such as LIMIT ... OFFSET, TOP ... START AT, OFFSET ... FETCH statement and so on. Even if there is no database that supports these statements, it is still possible to filter on ROWNUM (Oracle) or ROW NUMBER(), OVER() (DB2, SQL Server 2008, etc.), which are faster than implementing paging in memory. This is especially effective in dealing with large amounts of data.

Solution:

Using just these statements, then a tool (e.g. JOOQ) can simulate the operation of these statements.

5. Adding data to Java memory

From the early days of SQL, some developers still have an uneasy feeling when using JOIN statements in SQL. This stems from an inherent fear that adding a JOIN will slow things down. If a cost-based optimization chooses to implement nested loops, it may be true that all the tables may be loaded in database memory before a single join table source is created. But the probability of this happening is so low. With proper prediction, constraints, and solving, merge joins and hash joins are quite fast. It's all about proper metadata (I can't quote Tom Kyte too much here). And there are probably still quite a few Java developers loading two tables into a single mapping by querying them separately and somehow adding them to memory.

Solution:

If you have queries from various tables in various steps, think about whether you can express your queries in a single statement.

6. Use DISTINCT or UNION to eliminate duplicate entries in a temporary Cartesian product collection

With complex joins, one can lose track of all the relationships that play a key role in a SQL statement. In particular, if this involves a multi-column foreign key relationship, it is likely that one will forget to add the relevant relationships in the JOIN . ON clause to add the relevant judgment. This will result in duplicate records, but perhaps only in special cases. Some developers may therefore choose DISTINCT to eliminate these duplicate records. In three ways this is wrong:

It (perhaps) addresses the surface symptoms but doesn't solve the problem. It may also fail to address symptoms in extreme cases. It is slow for huge result sets with many columns.DISTINCT has to perform an ORDER BY operation to eliminate duplicates. It is slow for huge Cartesian product collections and still needs to load a lot of data into memory.

Solution:

As a rule of thumb, if you're getting unwanted duplicate records, it's better to check your JOIN judgment. There's probably a hard-to-perceive collection of Cartesian products somewhere.

7. Not using MERGE statements

This is not a fault, but it may be a lack of knowledge or confidence in strong MERGE statements. Some databases understand other forms of update-insert (UPSERT) statements, such as MYSQL's duplicate primary key update statement, but MERGE is really powerful and important in databases, so much so that it's a big extension of the SQL standard, such as SQL SERVER.

Solution:

If you use a statement such as a combined INSERT and UPDATE or joint SELECT . FOR UPDATE and then update inserts in something like INSERT or UPDATE, think twice. You can absolutely use a simpler MERGE statement to stay away from risky competing conditions.

8, use aggregate functions instead of window functions (window functions)

Before the introduction of window functions, aggregating data in SQL meant using GROUP BY statements mapped to aggregate functions. This works well in many situations, such as when aggregated data needs to be condensed from regular data, then use a GROUP query in a JOIN subquery.

But window functions are defined in SQL 2003, and this is implemented in many major databases. Window functions are able to aggregate data on a result set, but there is no grouping. In fact, each window function has its own, separate PARTITION BY statement, a tool that is too good for displaying reports.

Using window functions:

Makes SQL more readable (but not as specialized as GROUP BY statements in subqueries) Boosts performance, like relational database management systems can more easily optimize window functions

Solution:

When you use GROUP BY statements in a subquery, think twice about whether it can be done with window functions.

9. Use memory indirect sorting

SQL's ORDER BY statement supports many types of expressions, including CASE statements, which are useful for indirect sorting. You probably won't sort data in Java memory because you'll want to:

SQL sorting is slow SQL sorting can't be done

Solution:

If you are sorting any SQL data in memory, think twice about whether it can't be sorted in the database. This is very useful for database paging data.

10. Inserting a large number of records one by one

JDBC "gets" batch, and you shouldn't forget it. Don't use the INSERT statement to get in and out of thousands of records one by one, (because) it creates a new PreparedStatement object each time. If you're inserting all your rows into the same table, create an insert batch statement with one SQL statement and a collection of values. You may need to commit after a certain number of inserts to keep the UNDO log thin, depending on your database and database settings.

Solution:

Always insert large amounts of data using batch processing.

From: OSChina

Translated by: oschina.net/translate/10-common-mistakes-java-developers-make-when-writing-sql