Tuesday, December 13, 2005

Deleting duplicate rows from a table

This seems to a favourite question during interviews - How to delete duplicate rows in a table?
Well, we can have many strategies:

- Capture one instance of the unique rows using a SELECT DISTINCT .., and dump the results into a temp table. Delete all of the rows from the original table and then insert the rows from the temp table back into the original table.

- If there is an identity column, the we can also write a query to get all the rows that have duplicate entries :
SELECT Field1, Field2, Count(ID)
FROM Foo1
GROUP BY Foo1.Field1, Foo1.Field2
HAVING Count(Foo1.ID) > 1
-----------------------------
Then loop thru the resultset and get a cursor for the query
"SELECT Field1, Field2, ID
FROM Foo1
WHERE Field1 = @FIELD1 and Field2 = @FIELD2".
Use the cursor to delete all the duplicate rows returned except one.

The following links discuss the above techniques and some more:

sql-server-performance
databasejournal
aspfaqs