Comparison Algorithm

Trepidati0n

[H]F Junkie
Joined
Oct 26, 2004
Messages
9,269
I'm an engineer who occassional writes code..

I have an array of N numbers. I want to compare each of these numbers to each other. If the differnce between this two is less than D, I want to flag this as a match.

I could do this with two simple for loops and brute force it. Problem is lets say array(1) and array (6) match...when the loop comes around I will also get a match on array(6) and array(1).

Problem is, N is very large. While I can brute force this in Matlab...they want it to work in "excel". Is there any way to make this more "efficient" in terms of algorithm?
 
Are you allowed to sort? If so, you can reduce it to n + nlogn instead of n^2
 
so your brute force method would be like
Code:
for (int i = 0; i < arr.length; i++)
    for (int j = 0; j < arr.length; j++)
        if (arr[i] == arr[j]) { // note result }
right?

now i've spent all of 30 seconds thinking on this but could the following still work?
Code:
for (int i = 0; i < arr.length; i++)
    for (int j = i + 1; j < arr.length; j++) // no point in comparing arr[0] to arr[0] etc
        if (arr[i] == arr[j]) { // note result }

....
ok so someone may have posted since i started this post but i didn't really want to get going with this other homework so i whipped this little test program in java to see if my 2 methods work just as well and my cursory testing indicates yes
Code:
import java.util.*;

public class ForTest {
	public static void main(String[] args) {
		Random r = new Random();
		int[] arr = new int[10000];
		
		for (int i = 0; i < arr.length; i++) {
			arr[i] = r.nextInt(arr.length);
		}
		
		check(arr);
	}

	static void check(int[] arr) {
		Map<Integer, Void> map1 = new TreeMap<Integer, Void>();
		Map<Integer, Void> map2 = new TreeMap<Integer, Void>();
		
		for (int i = 0; i < arr.length; i++) {
		    for (int j = 0; j < arr.length; j++) {
		    	if (i == j) { // happens often
		    		//System.err.println("i == j");
		    		continue;
		    	}
		    	
		        if (arr[i] == arr[j]) {
		        	map1.put(i, null);
		        	map1.put(j, null);
		        }
		    }
		}
		
		System.out.println("Size: " + map1.size());
		for (int i : map1.keySet()) {
		//	System.out.println(i);
		}
		
		System.out.println();
		System.out.println();
		System.out.println();
		
		
		for (int i = 0; i < arr.length; i++) {
		    for (int j = i + 1; j < arr.length; j++) { // no point in comparing arr[0] to arr[0] etc
		    	//if (i == j) { // never happens with this method
		    	//	System.err.println("i == j");
		    	//	continue;
		    	//}
		    	
		        if (arr[i] == arr[j]) {
		        	map2.put(i, null);
		        	map2.put(j, null);
		        }
		    }
		}
		
		System.out.println("Size: " + map2.size());
		for (int i : map2.keySet()) {
		//	System.out.println(i);
		}
		
		
		if (map1.size() != map2.size()) {
			throw new RuntimeException("Size mismatch");
		}
		
		Iterator<Integer> i1 = map1.keySet().iterator(), i2 = map2.keySet().iterator();
		
		while (i1.hasNext() && i2.hasNext()) {
			if (!i1.next().equals(i2.next())) {
				throw new RuntimeException("Found mismatch");
			}
		}
		
		System.out.println("Both methods found same matches");
	}
}
can't help you with converting it to excel but at the least the inner loop can start with i+1 instead of 0 if i'm correct
 
Yes that works...no clue why I didn't see that before. Coders just think that way I guess. Massive reduction in duplicate matches.
 
Quick update....excel evidently is a pig when it comes to constantly repinging cells. I reduced execution time by a factor of ~40 by reading into an array first then crunch...amazing
 
You can do this in O(n+d), with some caveats:

Code:
#!/usr/bin/perl
use strict;

my @nums = (0 .. 99, 26,27,28,24,23,21);
our $lookup;

# create LUT; value->indices
for (my $i = 0; $i < @nums; $i++) {
	my $n = $nums[$i];
	push ( @{$lookup->[$n]}, $i );
}

printNeighbours(23, 4);

sub printNeighbours {
	my $base = shift;	
	my $d = shift;	
	
	print "$base is a neighbour of:\n";
	for (my $i = $base - $d; $i <= $base + $d; $i++) {
		print "$i - " . join(',', @{$lookup->[$i]}) . "\n";
	}
}

Results of my sample program:
Code:
23 is a neighbour of:
19 - 19
20 - 20
21 - 21,105
22 - 22
23 - 23,104
24 - 24,103
25 - 25
26 - 26,100
27 - 27,101

This solution also doesn't require reprocessing the array if you choose a different distance.
 
Back
Top