Comparison Algorithm

Discussion in 'Webmastering & Programming' started by Trepidati0n, Mar 24, 2010.

  1. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,826
    Joined:
    Oct 26, 2004
    I'm an engineer who occassional writes code..

    I have an array of N numbers. I want to compare each of these numbers to each other. If the differnce between this two is less than D, I want to flag this as a match.

    I could do this with two simple for loops and brute force it. Problem is lets say array(1) and array (6) match...when the loop comes around I will also get a match on array(6) and array(1).

    Problem is, N is very large. While I can brute force this in Matlab...they want it to work in "excel". Is there any way to make this more "efficient" in terms of algorithm?
     
  2. Tawnos

    Tawnos 2[H]4U

    Messages:
    3,807
    Joined:
    Sep 9, 2001
    Are you allowed to sort? If so, you can reduce it to n + nlogn instead of n^2
     
  3. tim_m

    tim_m i'm so nice

    Messages:
    5,528
    Joined:
    Feb 10, 2003
    so your brute force method would be like
    Code:
    for (int i = 0; i < arr.length; i++)
        for (int j = 0; j < arr.length; j++)
            if (arr[i] == arr[j]) { // note result }
    
    right?

    now i've spent all of 30 seconds thinking on this but could the following still work?
    Code:
    for (int i = 0; i < arr.length; i++)
        for (int j = i + 1; j < arr.length; j++) // no point in comparing arr[0] to arr[0] etc
            if (arr[i] == arr[j]) { // note result }
    
    ....
    ok so someone may have posted since i started this post but i didn't really want to get going with this other homework so i whipped this little test program in java to see if my 2 methods work just as well and my cursory testing indicates yes
    Code:
    import java.util.*;
    
    public class ForTest {
    	public static void main(String[] args) {
    		Random r = new Random();
    		int[] arr = new int[10000];
    		
    		for (int i = 0; i < arr.length; i++) {
    			arr[i] = r.nextInt(arr.length);
    		}
    		
    		check(arr);
    	}
    
    	static void check(int[] arr) {
    		Map<Integer, Void> map1 = new TreeMap<Integer, Void>();
    		Map<Integer, Void> map2 = new TreeMap<Integer, Void>();
    		
    		for (int i = 0; i < arr.length; i++) {
    		    for (int j = 0; j < arr.length; j++) {
    		    	if (i == j) { // happens often
    		    		//System.err.println("i == j");
    		    		continue;
    		    	}
    		    	
    		        if (arr[i] == arr[j]) {
    		        	map1.put(i, null);
    		        	map1.put(j, null);
    		        }
    		    }
    		}
    		
    		System.out.println("Size: " + map1.size());
    		for (int i : map1.keySet()) {
    		//	System.out.println(i);
    		}
    		
    		System.out.println();
    		System.out.println();
    		System.out.println();
    		
    		
    		for (int i = 0; i < arr.length; i++) {
    		    for (int j = i + 1; j < arr.length; j++) { // no point in comparing arr[0] to arr[0] etc
    		    	//if (i == j) { // never happens with this method
    		    	//	System.err.println("i == j");
    		    	//	continue;
    		    	//}
    		    	
    		        if (arr[i] == arr[j]) {
    		        	map2.put(i, null);
    		        	map2.put(j, null);
    		        }
    		    }
    		}
    		
    		System.out.println("Size: " + map2.size());
    		for (int i : map2.keySet()) {
    		//	System.out.println(i);
    		}
    		
    		
    		if (map1.size() != map2.size()) {
    			throw new RuntimeException("Size mismatch");
    		}
    		
    		Iterator<Integer> i1 = map1.keySet().iterator(), i2 = map2.keySet().iterator();
    		
    		while (i1.hasNext() && i2.hasNext()) {
    			if (!i1.next().equals(i2.next())) {
    				throw new RuntimeException("Found mismatch");
    			}
    		}
    		
    		System.out.println("Both methods found same matches");
    	}
    }
    
    
    can't help you with converting it to excel but at the least the inner loop can start with i+1 instead of 0 if i'm correct
     
  4. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,826
    Joined:
    Oct 26, 2004
    Yes that works...no clue why I didn't see that before. Coders just think that way I guess. Massive reduction in duplicate matches.
     
  5. Trepidati0n

    Trepidati0n [H]ardForum Junkie

    Messages:
    8,826
    Joined:
    Oct 26, 2004
    Quick update....excel evidently is a pig when it comes to constantly repinging cells. I reduced execution time by a factor of ~40 by reading into an array first then crunch...amazing
     
  6. jimmyb

    jimmyb 2[H]4U

    Messages:
    3,165
    Joined:
    May 24, 2006
    You can do this in O(n+d), with some caveats:

    Code:
    #!/usr/bin/perl
    use strict;
    
    my @nums = (0 .. 99, 26,27,28,24,23,21);
    our $lookup;
    
    # create LUT; value->indices
    for (my $i = 0; $i < @nums; $i++) {
    	my $n = $nums[$i];
    	push ( @{$lookup->[$n]}, $i );
    }
    
    printNeighbours(23, 4);
    
    sub printNeighbours {
    	my $base = shift;	
    	my $d = shift;	
    	
    	print "$base is a neighbour of:\n";
    	for (my $i = $base - $d; $i <= $base + $d; $i++) {
    		print "$i - " . join(',', @{$lookup->[$i]}) . "\n";
    	}
    }
    Results of my sample program:
    Code:
    23 is a neighbour of:
    19 - 19
    20 - 20
    21 - 21,105
    22 - 22
    23 - 23,104
    24 - 24,103
    25 - 25
    26 - 26,100
    27 - 27,101
    This solution also doesn't require reprocessing the array if you choose a different distance.