In this post we look at ways to remove all duplicate rows except one in an SQL database. For all examples in this article, we'll be using the following MySQL "user
" table:
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | david |
| 3 | john |
| 4 | john |
| 5 | wayne |
| 6 | david |
+----+--------+
CREATE TABLE user (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(25) NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO user(name) VALUES ('john'), ('david'), ('john'), ('john'), ('wayne'), ('david');
#Create a New Table With Unique Values Copied From Original Table
You can simply create a new table with unique values copied over from the original table like so:
CREATE TABLE user2 AS (
SELECT id, name
FROM user
GROUP BY name
)
Result:
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | david |
| 5 | wayne |
+----+--------+
You will have to add original table's constraints and indexes on the new table as they aren't copied over.
#Use Temporary Table to Fill Original Table With Unique Rows
You can use a temporary table to copy over unique values from the original table, and add them back to the original table. This can be useful when a primary key does not exist.
# Step 1: Copy distinct values to temporary table
CREATE TEMPORARY TABLE tmp_user (
SELECT id, name
FROM user
GROUP BY name
);
# Step 2: Remove all rows from original table
DELETE FROM user;
# Step 3: Remove all rows from original table
INSERT INTO user (SELECT * FROM tmp_user);
# Step 4: Remove temporary table
DROP TABLE tmp_user;
Be careful when using this method because if there are foreign key constraints referencing the original table, rows deleted in step 2 may delete referenced rows in child tables as well to maintain referential integrity.
Result:
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | david |
| 5 | wayne |
+----+--------+
#Add Unique Constraint and Copy Unique Rows to Original Table
To make sure there are no duplicates added in future, we can extend the previous set of queries to add a UNIQUE
constraint to the column.
# Step 1: Copy distinct values to temporary table
CREATE TEMPORARY TABLE tmp_user (
SELECT id, name
FROM user
GROUP BY name
);
# Step 2: Remove all rows from original table
DELETE FROM user;
# Step 3: Add Unique constraint
ALTER TABLE user ADD UNIQUE(name);
# Step 4: Remove all rows from original table
INSERT IGNORE INTO user (SELECT * FROM tmp_user);
# Step 5: Remove temporary table
DROP TABLE tmp_user;
Be careful when using this method because if there are foreign key constraints referencing the original table, rows deleted in step 2 may delete referenced rows in child tables as well to maintain referential integrity.
Result:
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | david |
| 5 | wayne |
+----+--------+
#Remove Duplicates and Keep Row With Lowest ID
Using the methods listed below, row with the lowest id is kept, all other duplicates are removed.
Expected Result:
+----+--------+
| id | name |
+----+--------+
| 1 | john |
| 2 | david |
| 5 | wayne |
+----+--------+
Method #1:
DELETE u1 FROM user u1, user u2
WHERE u1.id > u2.id AND u1.name = u2.name
Method #2:
DELETE u1 FROM user u1
INNER JOIN user u2 ON u1.id > u2.id AND u1.name = u2.name;
Method #3:
DELETE FROM user
WHERE id NOT IN (
SELECT *
FROM (
SELECT MIN(id)
FROM user
GROUP BY name
) temp
)
Method #4:
# Step 1: Copy all unique rows with lowest id
CREATE TEMPORARY TABLE tmp_user (
SELECT MIN(id) id
FROM user
GROUP BY name
);
# Step 2: Delete all rows in original table that are not in the temporary table
DELETE FROM user WHERE id NOT IN (SELECT id FROM tmp_user);
# Step 3: Remove temporary table
DROP TABLE tmp_user;
#Remove Duplicates and Keep Row With Highest ID
Using the methods listed below, row with the highest id is kept, all other duplicates are removed.
Expected Result:
+----+--------+
| id | name |
+----+--------+
| 4 | john |
| 5 | wayne |
| 6 | david |
+----+--------+
Method #1:
DELETE u1 FROM user u1, user u2
WHERE u1.id < u2.id AND u1.name = u2.name
Method #2:
DELETE u1 FROM user u1
INNER JOIN user u2 ON u1.id < u2.id AND u1.name = u2.name;
Method #3:
DELETE FROM user
WHERE id NOT IN (
SELECT *
FROM (
SELECT MAX(id)
FROM user
GROUP BY name
) temp
)
Method #4:
# Step 1: Copy all unique rows with lowest id
CREATE TEMPORARY TABLE tmp_user (
SELECT MAX(id) id
FROM user
GROUP BY name
);
# Step 2: Delete all rows in original table that are not in the temporary table
DELETE FROM user WHERE id NOT IN (SELECT id FROM tmp_user);
# Step 3: Remove temporary table
DROP TABLE tmp_user;
This post was published (and was last revised ) by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.