I have a products table with the following schema:
id createdOn, updatedOn, stock, status
createdOn
& updatedOn
are TimeStamp
.
createdOn
is the paratition field.
Say this is the data I have now:
id createdOn, updatedOn, stock, status
1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
2 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 5 12
3 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
I have a ETL
that append new rows to this table. when the ETL is finished I can have a situation where the same id
has more than 1 row.
For example:
id createdOn, updatedOn, stock, status
1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
2 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 5 12
3 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
3 2018-09-14 14:14:24.305676 2018-09-15 10:00:00.000000 7 5
I want to have a query which will run over the table and make sure that each id has only 1 row - the row with the MAX(updatedOn)
should stay. There can be more than 1 row for the MAX(updatedOn)
per id - in that case it's guarantee that they are identical, because if they weren't than the updatedOn
field would have been modified.
After ruuning the query the table will look like:
id createdOn, updatedOn, stock, status
2 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 5 12
1 2018-09-14 14:14:24.305676 2018-09-14 14:14:24.305676 10 5
3 2018-09-14 14:14:24.305676 2018-09-15 10:00:00.000000 7 5
How can I write a query that efficiently perform this?
I know it should be something like:
DELETE FROM products
WHERE id NOT IN
(
SELECT MAX(id)
FROM products
GROUP BY id
)
However this won't work... I don't have auto-increment field to distinguish the rows.
How can I solve this?
See Question&Answers more detail:os