Need help with my sql query

qtip · Jul 9, 2009

Hey guys. I have some clean up to do on our data hub. I'm trying to eliminate the duplicate source keys on all of our hub tables and ran into a snag. The query I have so far identifies the duplicate keys and shows a count of how many duplicates there are for each.

Code:

SELECT SOURCE_KEY,COUNT(*)
FROM CBISCIR.HUB_BR_PREMISE A
WHERE A.PROCESS_STAT_CD = 'E'
AND UPPER (A.PROCESS_ERROR_TXT) LIKE '%CANNOT INSERT DUPLICATE%'
AND NOT EXISTS (SELECT 1 FROM CBISCIR.HUB_BR_PREMISE B
                WHERE B.PROCESS_STAT_CD = 'S'
                AND   B.SOURCE_KEY = A.SOURCE_KEY
                AND   B.CREATE_TMSTMP > A.CREATE_TMSTMP)
GROUP BY SOURCE_KEY
HAVING COUNT (*) > 1 ORDER BY COUNT(*) DESC
WITH UR

I'm trying to keep only the keys with the most current timestamp and deleting the rest. Can someone assist me in this?

calebb · Jul 9, 2009

qtip said:
Hey guys. I have some clean up to do on our data hub. I'm trying to eliminate the duplicate source keys on all of our hub tables and ran into a snag. The query I have so far identifies the duplicate keys and shows a count of how many duplicates there are for each.

Code:

SELECT SOURCE_KEY,COUNT(*) FROM CBISCIR.HUB_BR_PREMISE A WHERE A.PROCESS_STAT_CD = 'E' AND UPPER (A.PROCESS_ERROR_TXT) LIKE '%CANNOT INSERT DUPLICATE%' AND NOT EXISTS (SELECT 1 FROM CBISCIR.HUB_BR_PREMISE B WHERE B.PROCESS_STAT_CD = 'S' AND B.SOURCE_KEY = A.SOURCE_KEY AND B.CREATE_TMSTMP > A.CREATE_TMSTMP) GROUP BY SOURCE_KEY HAVING COUNT (*) > 1 ORDER BY COUNT(*) DESC WITH UR

I'm trying to keep only the keys with the most current timestamp and deleting the rest. Can someone assist me in this?

Window functions can often have better performance than subqueries - and are often easier to understand.

I'll start with just the inner query:

Code:

SELECT
	SOURCE_KEY,
	ROW_NUMBER() OVER (PARTITION BY SOURCE_KEY ORDER BY CREATE_TMSTMP DESC),
	CREATE_TMPSTMP
FROM
	CBISCIR.HUB_BR_PREMISE A
WHERE
	WHERE A.PROCESS_STAT_CD = 'E'
ORDER BY
	SOURCE_KEY

Run this query and look at the results - you'll have each SOURCE_KEY listed multiple times like:

Code:

ASDFAHKF  1  June 1
ASDFAHKF  2  May 1
ASDFAHKF  3  April 1
ASDFAHKF  4  March 1
BADSFAAS  1  June 1
BADSFAAS  2  May 1
BADSFAAS  3  April 1
BADSFAAS  4  March 1

And you'll notice that the most recent one always has the number "1" next to it.

Now, you can either put this in to a temp table or use a derived table to act on this data set:

Code:

DELETE CBISCIR.HUB_BR_PREMISE
FROM CBISCIR.HUB_BR_PREMISE T1
INNER JOIN
	(SELECT
		SOURCE_KEY,
		ROW_NUMBER() OVER (PARTITION BY SOURCE_KEY ORDER BY CREATE_TMSTMP DESC) AS theRowNumber,
		CREATE_TMPSTMP
	FROM
		CBISCIR.HUB_BR_PREMISE A
	WHERE
		WHERE A.PROCESS_STAT_CD = 'E'
	ORDER BY
		SOURCE_KEY) AS FOO ON FOO.SOURCE_KEY=T1.SOURCE_KEY 
							AND FOO.CREATE_TMSTMP=T1.CREATE_TMSTMP
							AND FOO.theRowNumber <> 1 --this is the one you want to keep - rowNumber 1

qtip · Jul 10, 2009

Thanks for the reply. There is no support for olap functions. Can you recommend another way to count the row numbers?

extide · Jul 10, 2009

What DBMS are you using? You may need to use a cursor for something like this.

calebb · Jul 10, 2009

extide said:
What DBMS are you using? You may need to use a cursor for something like this.

Yeah, please let us know what DBMS and if there are any other specific requirements for a potential solution.

Pwyl_The_Destroyer · Jul 10, 2009

Maybe something like this:
(returns the record of the max "thisnum" for each record that has a dupe "thisname")
Works on sql-server 2k

Code:

select thisname, count(thisnum)as thiscount into #temp2 from test2
group by thisname
having count(thisnum) > 1

select thisname, max(thisnum)as maxnum from test2 where thisname in 
(select thisname from #temp2)
group by thisname

qtip · Jul 11, 2009

calebb said:
Yeah, please let us know what DBMS and if there are any other specific requirements for a potential solution.

Sorry about that. I'm using DB2 QMF (Query Management Facility) v8.1 on a z/OS box. There are really no other requirements other than to get rid of the duplicates. It does not matter how I go about it. To me, the easiest way was to find the max timestamp for each duplicate and eliminate the other keys below the max. If you think there's an easier way to accomplish this, please let me know.

Need help with my sql query

qtip

Weaksauce

calebb

[H]F Junkie

qtip

Weaksauce

extide

2[H]4U

calebb

[H]F Junkie

Pwyl_The_Destroyer

Limp Gawd

qtip

Weaksauce