Hey guys. I have some clean up to do on our data hub. I'm trying to eliminate the duplicate source keys on all of our hub tables and ran into a snag. The query I have so far identifies the duplicate keys and shows a count of how many duplicates there are for each.
I'm trying to keep only the keys with the most current timestamp and deleting the rest. Can someone assist me in this?
Code:
SELECT SOURCE_KEY,COUNT(*)
FROM CBISCIR.HUB_BR_PREMISE A
WHERE A.PROCESS_STAT_CD = 'E'
AND UPPER (A.PROCESS_ERROR_TXT) LIKE '%CANNOT INSERT DUPLICATE%'
AND NOT EXISTS (SELECT 1 FROM CBISCIR.HUB_BR_PREMISE B
WHERE B.PROCESS_STAT_CD = 'S'
AND B.SOURCE_KEY = A.SOURCE_KEY
AND B.CREATE_TMSTMP > A.CREATE_TMSTMP)
GROUP BY SOURCE_KEY
HAVING COUNT (*) > 1 ORDER BY COUNT(*) DESC
WITH UR