Comparison Algorithm

Trepidati0n · Mar 24, 2010

I'm an engineer who occassional writes code..

I have an array of N numbers. I want to compare each of these numbers to each other. If the differnce between this two is less than D, I want to flag this as a match.

I could do this with two simple for loops and brute force it. Problem is lets say array(1) and array (6) match...when the loop comes around I will also get a match on array(6) and array(1).

Problem is, N is very large. While I can brute force this in Matlab...they want it to work in "excel". Is there any way to make this more "efficient" in terms of algorithm?

Tawnos · Mar 25, 2010

Are you allowed to sort? If so, you can reduce it to n + nlogn instead of n^2

tim_m · Mar 25, 2010

so your brute force method would be like

Code:

for (int i = 0; i < arr.length; i++)
    for (int j = 0; j < arr.length; j++)
        if (arr[i] == arr[j]) { // note result }

right?

now i've spent all of 30 seconds thinking on this but could the following still work?

Code:

for (int i = 0; i < arr.length; i++)
    for (int j = i + 1; j < arr.length; j++) // no point in comparing arr[0] to arr[0] etc
        if (arr[i] == arr[j]) { // note result }

....
ok so someone may have posted since i started this post but i didn't really want to get going with this other homework so i whipped this little test program in java to see if my 2 methods work just as well and my cursory testing indicates yes

Code:

import java.util.*;

public class ForTest {
	public static void main(String[] args) {
		Random r = new Random();
		int[] arr = new int[10000];
		
		for (int i = 0; i < arr.length; i++) {
			arr[i] = r.nextInt(arr.length);
		}
		
		check(arr);
	}

	static void check(int[] arr) {
		Map<Integer, Void> map1 = new TreeMap<Integer, Void>();
		Map<Integer, Void> map2 = new TreeMap<Integer, Void>();
		
		for (int i = 0; i < arr.length; i++) {
		    for (int j = 0; j < arr.length; j++) {
		    	if (i == j) { // happens often
		    		//System.err.println("i == j");
		    		continue;
		    	}
		    	
		        if (arr[i] == arr[j]) {
		        	map1.put(i, null);
		        	map1.put(j, null);
		        }
		    }
		}
		
		System.out.println("Size: " + map1.size());
		for (int i : map1.keySet()) {
		//	System.out.println(i);
		}
		
		System.out.println();
		System.out.println();
		System.out.println();
		
		
		for (int i = 0; i < arr.length; i++) {
		    for (int j = i + 1; j < arr.length; j++) { // no point in comparing arr[0] to arr[0] etc
		    	//if (i == j) { // never happens with this method
		    	//	System.err.println("i == j");
		    	//	continue;
		    	//}
		    	
		        if (arr[i] == arr[j]) {
		        	map2.put(i, null);
		        	map2.put(j, null);
		        }
		    }
		}
		
		System.out.println("Size: " + map2.size());
		for (int i : map2.keySet()) {
		//	System.out.println(i);
		}
		
		
		if (map1.size() != map2.size()) {
			throw new RuntimeException("Size mismatch");
		}
		
		Iterator<Integer> i1 = map1.keySet().iterator(), i2 = map2.keySet().iterator();
		
		while (i1.hasNext() && i2.hasNext()) {
			if (!i1.next().equals(i2.next())) {
				throw new RuntimeException("Found mismatch");
			}
		}
		
		System.out.println("Both methods found same matches");
	}
}

can't help you with converting it to excel but at the least the inner loop can start with i+1 instead of 0 if i'm correct

Trepidati0n · Mar 25, 2010

Yes that works...no clue why I didn't see that before. Coders just think that way I guess. Massive reduction in duplicate matches.

Trepidati0n · Mar 25, 2010

Quick update....excel evidently is a pig when it comes to constantly repinging cells. I reduced execution time by a factor of ~40 by reading into an array first then crunch...amazing

jimmyb · Mar 25, 2010

You can do this in O(n+d), with some caveats:

Code:

#!/usr/bin/perl
use strict;

my @nums = (0 .. 99, 26,27,28,24,23,21);
our $lookup;

# create LUT; value->indices
for (my $i = 0; $i < @nums; $i++) {
	my $n = $nums[$i];
	push ( @{$lookup->[$n]}, $i );
}

printNeighbours(23, 4);

sub printNeighbours {
	my $base = shift;	
	my $d = shift;	
	
	print "$base is a neighbour of:\n";
	for (my $i = $base - $d; $i <= $base + $d; $i++) {
		print "$i - " . join(',', @{$lookup->[$i]}) . "\n";
	}
}

Results of my sample program:

Code:

23 is a neighbour of:
19 - 19
20 - 20
21 - 21,105
22 - 22
23 - 23,104
24 - 24,103
25 - 25
26 - 26,100
27 - 27,101

This solution also doesn't require reprocessing the array if you choose a different distance.

Comparison Algorithm

Trepidati0n

[H]F Junkie

Tawnos

2[H]4U

tim_m

i'm so nice

Trepidati0n

[H]F Junkie

Trepidati0n

[H]F Junkie

jimmyb

2[H]4U