This time, I encountered an SQL question that is supposed to be easy, but it wasn't as easy as expected. It was a problem about deleting duplicate emails. I tried to solve it using SELECT and GROUP BY, but it didn't work.
Question
Table: Person
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| email | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to delete all duplicate emails, keeping only one unique email with the smallest id.
For SQL users, please note that you are supposed to write a DELETE statement and not a SELECT one.
For Pandas users, please note that you are supposed to modify Person in place.
After running your script, the answer shown is the Person table. The driver will first compile and run your piece of code and then show the Person table. The final order of the Person table does not matter.
The result format is in the following example.
Example 1:
Input:
Person table:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Output:
+----+------------------+
| id | email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
Explanation: john@example.com is repeated two times. We keep the row with the smallest Id = 1.
I looked up how to use the DELETE statement to solve this problem and wrote the following code:
DELETE FROM Person
WHERE id NOT IN (
SELECT MIN(id)
FROM Person
GROUP BY email
)
But, I got a error message 'You can't specify target table 'Person' for update in FROM clause'
in order to solve this issue, I thought I should approach the problem using a subquery.
WITH CTE AS (
SELECT
id,
email,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) as rn
FROM Person
)
DELETE FROM Person
WHERE id IN (
SELECT id
FROM CTE
WHERE rn > 1
)
After writing it this way, I was able to solve the problem. However, I thought I needed to learn more about the DELETE statement.
Overview of DELETE Statement
The DELETE statement is used to remove rows from a table based on specified conditions. The basic syntax is:
DELETE FROM table_name
WHERE condition;
Examples
1. Deleting Rows Based on a Condition
Suppose we have a table Employees:
id name department
1 Alice HR
2 Bob IT
3 Charlie IT
4 David HR
To delete all employees in the HR department:
DELETE FROM Employees
WHERE department = 'HR'
After running this query, the Employees table will look like:
id name department
2 Bob IT
3 Charlie IT
2. Deleting All Rows
To delete all rows from the Employees table:
DELETE FROM Employees;
After running this query, the Employees table will be empty.
3. Using Subqueries
You can use subqueries to determine which rows to delete. For example, suppose you have another table Departments:
id department_name
1 HR
2 IT
3 Sales
To delete employees whose department is not in the Departments table:
DELETE FROM Employees
WHERE department NOT IN (SELECT department_name FROM Departments)
'Issue Note' 카테고리의 다른 글
Group sold products by the date - SQL (0) | 2024.06.27 |
---|---|
Movie rating - SQL (0) | 2024.06.12 |
Join And function (0) | 2024.04.19 |
1934. Confirmation Rate (0) | 2024.04.12 |
The count of divisors and addition (0) | 2024.04.09 |