Current location - Loan Platform Complete Network - Big data management - How to solve the Oracle database duplicate data in the method steps
How to solve the Oracle database duplicate data in the method steps

In the usual development, we often encounter repeated data in the data table, so how to solve it? Here are two cases of data de-duplication methods, one, completely duplicate data de-duplication; two, part of the field data duplication de-duplication.

First, completely duplicate data de-duplication method

For completely duplicate data de-duplication in the table, you can use the following SQL statements.

Code

CREATETABLE "#temp" AS (SELECTDISTINCT * FROM table name);--Create a temporary table and insert the DISTINCT de-duplicated data into the temporary table

truncateTABLE table name;--Empty the original table data

INSERTINTO table name (SELECT * FROM "#temp");--Insert temporary table data into the original table

DROPTABLE "#temp";--DELETE the temporary table

The specific idea is to first create a temporary table, and then DISTINCT after the insertion of the table data into this temporary table; Then empty the original table; then insert the data from the temporary table into the original table; and finally delete the temporary table.

Second, part of the data de-duplication method

First find duplicate data

select field1,field2,count(*) from table name groupby field1,field2 havingcount(*) > 1

Change the & gt; sign to the = sign and you can query the data without duplicates.

To delete these duplicates, use the following statement:

deletefrom table name a where field1,field2 in

(select field1,field2,count(*) from table name groupby field1,field2 havingcount(*) > 1)

The above statement is very simple - it deletes the queried data. However, this deletion is executed very inefficiently, and for large amounts of data, it can jam the database.

Based on the above, you can first insert the duplicate data from the query into a temporary table, and then delete it, so that you don't have to perform another query when you perform the deletion. As follows:

CREATETABLE Temporary Table AS

(select field1,field2,count(*) from table name groupby field1,field2 havingcount(*) > 1)

The following deletion can be carried out as follows:

deletefrom table name a where field1,field2 in (select field1,field2 from temporary table);

Creating a temporary table and then deleting it is much more efficient than doing it directly with a single statement.

The above statement will remove all the duplicates, in oracle, there is a hidden auto-rowid, which gives each record a unique rowid, if we want to keep the latest record, we can use this field to keep the duplicate data in the rowid of the largest record.

Here is an example of querying for duplicates:

select a.rowid,a.* from table name a

where a.rowid ! =

(

selectmax(b.rowid) from table name b

where a.field1 = b.field1 and

a.field2 = b.field2

)

The statement in the parentheses above queries for the row with the largest rowid in the duplicate data. And the outside is querying out the duplicates other than the largest rowid.

From this, if we want to delete the duplicates and keep only the newest one, we can write it like this:

deletefrom table name a

where a.rowid ! =

(

selectmax(b.rowid) from table name b

where a.field1 = b.field1 and

a.field2 = b.field2

)

In the same vein, the above code is inefficiently executed, after all, so we can consider creating a temporary table. Insert the fields, rowid, that need to be judged duplicates into the temporary table and then compare them when deleting.

createtable temporary table as

select a.field1,a.field2,MAX(a.ROWID) dataid from formal table a GROUPBY a.field1,a.field2;

deletefrom tablename a

where a. rowid ! =

(

select b.dataid from temporary table b

where a.field1 = b.field1 and

a.field2 = b.field2

);

commit;