How to Remove Duplicate Rows from a SQL Server Table?

When designing objects in SQL Server, we must follow certain best practices. For example, a table should have primary keys, identity columns, clustered and unclustered indexes, data integrity and performance constraints. SQL Server table should not contain duplicate rows according to best practices in database design. Sometimes, however, we need to deal with databases where these rules are not followed or where exceptions are possible when these rules are intentionally bypassed. Even though we are following the best practices, we may face issues like duplicate rows.

For example, we could also get this type of data while importing intermediate tables, and we would like to delete redundant rows before actually adding them into the production tables. Moreover, we should not leave the prospect for duplicating rows because duplicate information allows multiple handling of requests, incorrect reporting results and more. However, if we already have duplicate rows in the column, we need to follow specific methods to clean up the duplicate data. Let’s look at some ways in this article to remove data duplication.

The table containing duplicate rows

How to Remove Duplicate Rows from a SQL Server Table?

There are a number of ways in SQL Server to handle duplicate records in a table based on particular circumstances such as:

Removing duplicates rows from a unique index SQL Server table

You can use the index to classify the duplicate data in unique index tables then delete the duplicate records. First, we need to create a database named “test_database”, then create a table “ Employee” with a unique index by using the code given below.

USE master
GO
CREATE DATABASE test_database
GO
USE [test_database]
GO
CREATE TABLE Employee
(
[ID] INT NOT NULL IDENTITY(1,1),
[Dep_ID] INT,
[Name] varchar(200),
[email] varchar (250) NULL,
[city] varchar(250) NULL,
[address] varchar(500) NULL
CONSTRAINT Primary_Key_ID PRIMARY KEY(ID) 
)

The output will be as below.

Creating the table “Employee”

Now insert data into the table. We will insert duplicate rows also. The “Dep_ID” 003,005 and 006 are duplicate rows with similar data in all fields except the identity column with a unique key index. Execute the code given below.

USE [test_database]
GO
INSERT INTO Employee(Dep_ID,Name,email,city,address) VALUES
(001, 'Aaaronboy Gutierrez', 'aronboy.gutierrez@gmail.com','HILLSBORO','5840 Ne Cornell Rd Hillsboro Or 97124'),
(002, 'Aabdi Maghsoudi', 'abdi_maghsoudi@gmail.com','BRENTWOOD','987400 Nebraska Medical Center Omaha Ne 681987400'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(004, 'Aabish Mughal', 'abish_mughal@gmail.com','OMAHA','2975 Crouse Lane Burlington Nc 272150000'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(007, 'Pilar Ackaerman', 'pilar.ackaerman@gmail.com','ATLANTA','5813 Eastern Ave Hyattsville Md 207822201');

SELECT * FROM Employee

The output will be as follows.

Inserting data into the table named “Employee” and fetching data from the same table.

Now find the no of rows in the table by executing the following code. The count(*) function will count no of rows.

SELECT Dep_ID,Name,email,city,address, COUNT(*) AS duplicate_rows_count FROM Employee
GROUP BY Dep_ID,Name,email,city,address

The output will be as below. Row no (3, 4), (6, 7), (8, 9) highlighted in the red box are duplicate ones.

This figure highlights duplicate rows having row_no greater than 1

Our task is to enforce uniqueness by removing duplicates for the duplicate columns. It is a little easier to remove duplicate values from the table with a unique index than to remove the rows from a table without it. Given below are two methods to achieve this. The first method gives you duplicate rows from the table using the “row_number()” function, whereas the second method uses the “NOT IN” function. Thes two methods have their own cost which will be discussed later.

Method 1: Selecting duplicate records using “ROW_NUMBER ()” function

select * from (SELECT
Dep_ID,Name,email,city,address,
ROW_NUMBER() OVER (
       PARTITION BY
           Dep_ID,Name,email,city,address
       ORDER BY
           Dep_ID,Name,email,city,address
        ) row_no
       FROM test_database.dbo.Employee) x
       where row_no>1

Method 2: Selecting duplicate records using “NOT IN ()” function

SELECT * FROM test_database.dbo.Employee
WHERE ID NOT IN (SELECT MAX(ID)
FROM test_database.dbo.Employee
GROUP BY Dep_ID,Name,email,city,address)

Execute the above code and you will see the following output. Both methods give the same result, but they have different costs.

Selecting duplicate rows from the table named “Employee” using method 1 and 2 respectively

Now we will delete the above selected duplicate rows using “CTE” by using the following code. The following code is selecting duplicate rows to be deleted using the “ROW_NUMBER ()” function.

Method 1: Deleting duplicate records using “ROW_NUMBER ()” function

WITH cte_delete AS (
SELECT
Dep_ID,Name,email,city,address,
ROW_NUMBER() OVER (
PARTITION BY
    Dep_ID,Name,email,city,address
ORDER BY
    Dep_ID,Name,email,city,address
) row_no
FROM
 test_database.dbo.Employee
)

DELETE FROM cte_delete WHERE row_no > 1;

The output will be as below.

Deleting duplicate records from indexed table using “ROW_NUMBER ()” function

Method 2: Deleting duplicate records using “NOT IN ()” function

Now in order to test another method, we need to truncate the table which will remove all the rows from the table. Then insert command will add values to the table. Execute the following code now.

USE [test_database]
GO
truncate table test_database.dbo.Employee

INSERT INTO Employee(Dep_ID,Name,email,city,address) VALUES
(001, 'Aaaronboy Gutierrez', 'aronboy.gutierrez@gmail.com','HILLSBORO','5840 Ne Cornell Rd Hillsboro Or 97124'),
(002, 'Aabdi Maghsoudi', 'abdi_maghsoudi@gmail.com','BRENTWOOD','987400 Nebraska Medical Center Omaha Ne 681987400'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(004, 'Aabish Mughal', 'abish_mughal@gmail.com','OMAHA','2975 Crouse Lane Burlington Nc 272150000'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(007, 'Pilar Ackaerman', 'pilar.ackaerman@gmail.com','ATLANTA','5813 Eastern Ave Hyattsville Md 207822201');

SELECT * FROM Employee

The output will be as given below.

Inserting data into the table named “Employee” and fetching data from the same table.

Execute the code given below to delete all the duplicate rows from the table “Employee”.

Delete FROM test_database.dbo.Employee
WHERE ID NOT IN (SELECT MAX(ID)
FROM test_database.dbo.Employee
GROUP BY Dep_ID,Name,email,city,address)

The output will be as follows.

Delete all the duplicate rows from the indexed table named “Employee

Execution Plan and Query Cost for deleting duplicate rows from the indexed table:

Now we have to check which method will be cost-effective and taking fewer resources. Select the code and click on the execution plan. The following screen will appear showing all executing plans along with cost percentage.

We can see that method 1 “deleting duplicate records using “ROW_NUMBER ()” function” have 33% cost and method 2 “deleting duplicate records using NOT IN () function” has 67% cost. So the method one is most cost-effective as compared to method two.

Method 1 has 33% cost and method 2 has a 67% cost, revealing that method 1 is more cost-effective.

Removing duplicates from a SQL Server table without a unique index:

It’s a bit more difficult to remove duplicate rows or tables without a unique index. In this scenario, using a common table expression (CTE) and the ROW NUMBER() function helps us in removing the duplicate records. To remove duplicates from the table without a unique index we need to generate unique row identifiers.

Execute the following code in order to create the table without a unique index.

USE [test_database]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Employee_with_out_index](
[Dep_ID] [int] NULL,
[Name] [varchar](200) NULL,
[email] [varchar](250) NULL,
[city] [varchar](250) NULL,
[address] [varchar](500) NULL,
)
GO

The output will be as follows.

Creating the table named “Employee_with_out_index” without a unique index

Now insert records into the created table named “Employee_with_out_index” by executing the following code.

USE [test_database]
GO
INSERT INTO Employee_with_out_index(Dep_ID,Name,email,city,address) VALUES
(001, 'Aaaronboy Gutierrez', 'aronboy.gutierrez@gmail.com','HILLSBORO','5840 Ne Cornell Rd Hillsboro Or 97124'),
(002, 'Aabdi Maghsoudi', 'abdi_maghsoudi@gmail.com','BRENTWOOD','987400 Nebraska Medical Center Omaha Ne 681987400'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(004, 'Aabish Mughal', 'abish_mughal@gmail.com','OMAHA','2975 Crouse Lane Burlington Nc 272150000'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(007, 'Pilar Ackaerman', 'pilar.ackaerman@gmail.com','ATLANTA','5813 Eastern Ave Hyattsville Md 207822201');

SELECT * FROM Employee_with_out_index

The output will be as follows.

Inserting data into the table with an out index named “Employee_with_out_index”

Method 1: Deleting duplicate rows from a table using the “ROW_NUMBER ()” function and JOINS.

Execute the following code which is using ROW_NUMBER () function and JOIN to remove duplicate rows from the table without index. IT first creates a unique identity to assigns row_no to all the rows and keep only one-row removing duplicate ones.

WITH temp_tablr_with_row_ids AS
(
SELECT ROW_NUMBER() OVER (ORDER BY Dep_ID,Name,email,city,address) AS row_no,
Dep_ID,Name,email,city,address
FROM test_database.dbo.Employee_with_out_index
)

DELETE a FROM temp_tablr_with_row_ids a
WHERE row_no < (SELECT MAX(row_no) FROM temp_tablr_with_row_ids i WHERE  a.Dep_ID=i.Dep_ID and
a.Name=i.Name and a.email=i.email and a.city=i.city and a.address=i.address
GROUP BY  Dep_ID,Name,email,city,address)

The output will be as follows.

Deleting duplicate rows from a table without index using “ROW_NUMBER ()” function and JOINS

Method 2: Deleting duplicate rows from a table using the “ROW_NUMBER ()” function and PARTITION BY.

Now, in this method, we are using ROW_NUMBER function along with partition by clause in order to assign row_no to all the rows and then delete duplicates ones. First of all, we need to truncate the same table we have created earlier so that all the data gets deleted from the table. Then, insert records into the table including the duplicates records. The third query will delete duplicate rows from the table named “Employee_with_out_index”.

truncate table Employee_with_out_index
INSERT INTO Employee_with_out_index(Dep_ID,Name,email,city,address) VALUES
(001, 'Aaaronboy Gutierrez', 'aronboy.gutierrez@gmail.com','HILLSBORO','5840 Ne Cornell Rd Hillsboro Or 97124'),
(002, 'Aabdi Maghsoudi', 'abdi_maghsoudi@gmail.com','BRENTWOOD','987400 Nebraska Medical Center Omaha Ne 681987400'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(003, 'Aabharana, Sahni', 'abharana.sahni@gmail.com','HYATTSVILLE','2 Barlo Circle Suite A Dillsburg Pa 170191'),
(004, 'Aabish Mughal', 'abish_mughal@gmail.com','OMAHA','2975 Crouse Lane Burlington Nc 272150000'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(005, 'Aabram Howell', 'aronboy.gutierrez@gmail.com','DILLSBURG','868 York Ave Atlanta Ga 303102750'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(006, 'Humbaerto Acevedo', 'humbaerto.acevedo@gmail.com','SAINT PAUL','895 E 7th St Saint Paul Mn 551063852'),
(007, 'Pilar Ackaerman', 'pilar.ackaerman@gmail.com','ATLANTA','5813 Eastern Ave Hyattsville Md 207822201');

Selecting duplicate records into the temp table

; WITH temp_tablr_with_row_ids AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY Dep_ID,Name,email,city,address
ORDER BY  Dep_ID,Name,email,city,address) AS row_no, Dep_ID,Name,email,city,address
FROM Employee_with_out_index
)

Deleting duplicate records from the temp table

DELETE a FROM temp_tablr_with_row_ids a WHERE row_no > 1

The output will be as follows.

Truncating, inserting, deleting duplicate rows from a table without index and selecting resulting records.

Furthermore, we need to know about query execution costs in order to understand which one is an optimized solution. So you need to select all the relevant queries and click on the execution plan. The image below shows the execution plan for the queries along with execution cost. Delete queries are highlighted in the red box. The first query which is using “ROW_NUMBER ()” and JOIN clause has 56% execution cost, whereas the second query is using “ROW_NUMBER ()” and “PARTITION BY” has 31% cost. So the second method is a more optimized one and we should follow an optimized solution.

The first query which is using “ROW_NUMBER ()” and JOIN clause has 56% execution cost, whereas the second query is using “ROW_NUMBER ()” and “PARTITION BY” has 31% cost. So the second method is more optimized one
ABOUT THE AUTHOR

Muhammad Ussama


Sam is a First Line Support Engineer in Network Rail. He loves to play around & tinker with smartphones. He has a strong background and extensive experience in troubleshooting and configuring Android. He's our go-to guy when it comes to smartphones because he always has a trick or two up his sleeve. In his spare time, he likes to listen to music & binge-watch Netflix.