Optimal pairing algorithm?

shaolin3a · Nov 7, 2006

Here's what I'm trying to accomplish: I have two sets of numbers and and I need to pair the numbers from set A with numbers from set B in any way possible to get the lowest absolute difference in all the pairs combined.

ex. A[4, 10] and B[2, 10] : 4->2, 10->10 [diff: 2]
Wouldn't want this: 4->10, 2->10 [diff: 14]

I know that brute force won't cut it in this problem because getting about 15 numbers will already take too long. I tried thinking about how to do linear regression and maybe borrowing some ideas from that but nothing came of it so far. Also, thinking about whether it would ever be a good idea to pair sets of sequential numbers out of order as in:

ex. A[a, b] where a < b and B[i, j] where i < j

I don't see a situation where it would be advantageous to pair a->j and b->i. This out-of-order pairing would always yield an equal or higher difference.

Anyway, there must be some method to break it down but I've been thinking on it for several days and can't come up with anything yet. Maybe it's something obvious but someone please enlighten me!

Seated · Nov 7, 2006

You have two lists of numbers, and you're looking for a way to find the lowest sum of absolute differences. Every number in list A has to be paired with a number in list B. 2 ways come to mind:

1) Brute force - run through all combinations. Would make a nice recursive function. It would also go quickly.
2) As a thought, would sorting each list in either ascending or descending order, and then summing the differences, would this be the minimum amount?

mikeblas · Nov 7, 2006

Are the sets the same size? If so, how could you ever have 15 numbers?

Let's assume you meant two sets of 15 for 30 numbers total. The brute force method does 15 comparisons to find a best match for the first number, then 14 to find a best match for the second, and so on. It'll take only 120 comparisons for this.

Why does that take too long?

Meanwhile, I think Seated's #2 suggestion about sorting then "merging" is the right way to go. Am I missing a case where it doesn't work?

nameless_centurian · Nov 7, 2006

mikeblas said:
Are the sets the same size? If so, how could you ever have 15 numbers?

Let's assume you meant two sets of 15 for 30 numbers total. The brute force method does 15 comparisons to find a best match for the first number, then 14 to find a best match for the second, and so on. It'll take only 120 comparisons for this.

Why does that take too long?

Meanwhile, I think Seated's #2 suggestion about sorting then "merging" is the right way to go. Am I missing a case where it doesn't work?

well, option number two seems very very feasible, as does option one. now, if you've got two lists that are a few million in length, then your options are getting limited.

for the given input size, i recommend one of the two options listed. for very large input size, i recommend a genetic algorithm or a PSO algorithm

edit: note, a GA and a PSO do not *guarantee* an optimal solution

HHunt · Nov 8, 2006

nameless_centurian said:
well, option number two seems very very feasible, as does option one. now, if you've got two lists that are a few million in length, then your options are getting limited.

for the given input size, i recommend one of the two options listed. for very large input size, i recommend a genetic algorithm or a PSO algorithm

edit: note, a GA and a PSO do not *guarantee* an optimal solution

I rather like genetic algorithms, though. No guarantees, but quite reasonable results.
(Wrote a program for and paper on multilevel GA for solving SAT problems for my bachelor degree. Fascinating stuff.

)

mikeblas · Nov 8, 2006

Why recommend a non-deterministic genetic algorithm when a guaranteed solution is available with a reasonable time guarantee?

Can you provide a sketch of a genetic algorithm for this problem?

shaolin3a · Nov 8, 2006

Yeah I think I was thinking about this in a totally wrong way which is why I was saying it'd take too long. I was doing all permutations which is not necessary.

Are the sets the same size? If so, how could you ever have 15 numbers?

The list do not have to be the same size so I would have to throw away some elements on the longer list.

Meanwhile, I think Seated's #2 suggestion about sorting then "merging" is the right way to go. Am I missing a case where it doesn't work?

I was thinking about the same thing, hence my bit about out of order pairing. But I couldn't prove that so I was hesitant.

BTW, what is PSO? Thanks for everyone's input so far. Lots of ideas for me to look into

HHunt · Nov 8, 2006

Having looked at it, I agree that sorting is very probably a better use of time for this specific problem than a GA. (On the other hand, it should at least scale better than the brute-force method, if you don't need a guaranteed optimal solution. )

A genetic algorithm would be more interesting if the way you compared the objects did not lend itself so easily to sorting. Matching two sets of strings on characters in common, perhaps?
If I find the time I can try and set something up.

Very basically, it could be done something like this:
Leave one of the lists alone, just to simplify things.
Generate the initial population in some way. Random works.
Each chromosome consist of a list of indexes into the second table of strings.
The crossover function is a bit iffy, since you can't just cut/paste. How about:
Select a section. For child 1: Begin with parent 1, then make the selected section identical to parent 2 (swapping values as neccesary, thus making the non-selected region(s) different from the same in parent 1). And the other way around for child 2, of course.
I'm not sure if that's a good crossover, but it'll have to do for now.
Mutation is simpler: Just swap one or more values.
The fitness of a chromosome is the total number of matching characters.

Then it's just the normal stuff:
Generate
do {
Crossover + mutate
Select the best half of the population
} until (good enough or time limit)

I'm trying to think of any other tricks that could be interesting.
Using a local search of some kind in addition to or instead of the mutation requires a suitably cheap local search method. Something like "Select randomly one of the words that match especially bad, and find a better match for it" could work, with some thought put into how you define "match especially bad" and how to find those words cheaply.

Some form of multileveling might also work, I'll get back to that later.

mikeblas · Nov 8, 2006

Huh? Why can't the characters in a string be sorted?

nameless_centurian · Nov 8, 2006

shaolin3a said:
BTW, what is PSO?

Particle Swarm Optimization

wikipedia

rodsfree · Nov 8, 2006

Is it just me, or do we not know enough about the numbers and the sets.

Like

Are only whole numbers allowed, or any number and if so, to what degree of precision?

Are negative numbers allowed in the sets?

Are duplicate numbers allowed within a set?

Is zero allowed?

Are any of the numbers in set "a" allowed to be equal to any of the numbers in set "b"? Basically asking if sets "a" & "b" are allowed to have common members.

Are the sets the same size? And if not, are we allowed to just truncate the larger set or must we also evaluate all of the different members, by using members of the smaller set more than once.

It might be just me, but I feel that I don't have enough information to get a good start on this problem.

Anyway, logic dictates that the minimum possible difference between 2 numbers would be zero. So the first run through should be to look for numbers that are equal to each other in both sets and clear them out. After that it's going to be more interesting.

mikeblas · Nov 8, 2006

I'm not sure how the precision and magnitude of the numbers affects any proposed solution. Can you elaborate on why you need that information to get started?

By sorting the sets, we're trying to put the lowest numbers in one set near the lowest numbers of the other set.

Being concerned with sets that aren't the same length is interesting, and throws a wrench into the sorting approach, but I still don't think it's too hard to implement.

shaolin3a · Nov 8, 2006

rodsfree said:
Is it just me, or do we not know enough about the numbers and the sets.

Like

Are only whole numbers allowed, or any number and if so, to what degree of precision?

Are negative numbers allowed in the sets?

Are duplicate numbers allowed within a set?

Is zero allowed?

Are any of the numbers in set "a" allowed to be equal to any of the numbers in set "b"? Basically asking if sets "a" & "b" are allowed to have common members.

Are the sets the same size? And if not, are we allowed to just truncate the larger set or must we also evaluate all of the different members, by using members of the smaller set more than once.

It might be just me, but I feel that I don't have enough information to get a good start on this problem.

Anyway, logic dictates that the minimum possible difference between 2 numbers would be zero. So the first run through should be to look for numbers that are equal to each other in both sets and clear them out. After that it's going to be more interesting.

Sorry if my post was unclear. The sets have the following properties:
1)They do not have to be the same size, truncate the larger
2)They only contain positive integers and zero
3)There can be duplicates
4)Sets A & B and have some or all common elements

Thanks for pointing these ambiguities out!

rodsfree · Nov 8, 2006

mikeblas said:
I'm not sure how the precision and magnitude of the numbers affects any proposed solution. Can you elaborate on why you need that information to get started?

By sorting the sets, we're trying to put the lowest numbers in one set near the lowest numbers of the other set.

Being concerned with sets that aren't the same length is interesting, and throws a wrench into the sorting approach, but I still don't think it's too hard to implement.

I look at things from a math viewpoint since I'm a Manufacturing Engineer instead of a programming view - like sorting and such. So it's an ingrained response to ask about magnitude and precision.
This problem sends me all the way back to Algebra I and number sets and stuff.

As a thought,
Once you eliminate the zero pairs and get the sets the same size.
Sort both sets in ascending order.
Subtract Item 1a from Item 1b (this would be the smallest from set "a" from the smallest from set "b" ) - set this result as an absolute value in a new set "c"
Iterate through the rest of the pairs doing the same thing.

Then sum all the numbers in set "c"
This should give you the smallest answer.

Anybody see anything wrong with it?

rodsfree · Nov 8, 2006

shaolin3a said:
Sorry if my post was unclear. The sets have the following properties:
1)They do not have to be the same size, truncate the larger
2)They only contain positive integers and zero
3)There can be duplicates
4)Sets A & B and have some or all common elements

Thanks for pointing these ambiguities out!

OK,

So what method do we use to truncate the larger list?

Ideally, we should throw out the numbers that would hurt our result the most.
Best way to do that would be after sorting. Then toss the numbers from the largest list that are greater than the max value in the smallest list until we have 2 lists that have the same number of members.

Another way would be to duplicate the extra numbers from the larger list onto the smaller list, then after subtraction you would get a zero value for those list cells.

I'm starting to think of this like a spreadsheet or an array now.

I think I need an Excel intervention.

shaolin3a · Nov 8, 2006

rodsfree said:
OK,

So what method do we use to truncate the larger list?

Ideally, we should throw out the numbers that would hurt our result the most.
Best way to do that would be after sorting. Then toss the numbers from the largest list that are greater than the max value in the smallest list until we have 2 lists that have the same number of members.

The only problem with that is the largest list may contain numbers all smaller than the the smallest list.

amromousa · Nov 8, 2006

The question is were you given a time constraint (i.e. are you required to do this within a certain number of operations given the two lists?). The sorting idea was what came to mind first, after eliminating equal pairs, of course. . .but I think you might get better answers with the naive "brute force" method.

Alternatively, you could use a binary search sort of deal.

Sort both lists, use binary search to find the matching pairs in each list, if they exist (taking this from O(n^2) [because you have to find a match for every item in the first list] to O(nlogn) [I think? I think it takes O(logn) time per search and we're doing this n times, so O(nlogn)] time).

Having found the "pairs" you can then remove the numbers from the lists and then pair the sorted items by index, HOWEVER, this doesn't guarantee the best match, so you should try to pair both the current index, previous index, and next index with the current item, removing them from the list as you go to see which is best. Since looking at the prior, current, and next item in list two is done in constant time, we're still just left with a single for loop for the pairing procedure. I'm not sure what time this would take to run, but it'd be much less than O(n^2). I guess it'd be O(n+c), meaning the initial matching of like values would take more time.

rodsfree · Nov 8, 2006

shaolin3a said:
The only problem with that is the largest list may contain numbers all smaller than the the smallest list.

I just thought of that.

If that was the case we'd have to toss the extra numbers from the small end of the large set.

HHunt · Nov 8, 2006

mikeblas said:
Huh? Why can't the characters in a string be sorted?

Oh, you can sort both the words and the characters in the words. But does that help you?
Consider these two lists of nonsense words:
(azzz, byyy, cxxx), (xxxx, yyyy, zzzz)

Is there a way to independently sort these to get the ideal result? What if the matching parts were more complex?

amromousa · Nov 8, 2006

rodsfree said:
I just thought of that.

If that was the case we'd have to toss the extra numbers from the small end of the large set.

The issue with brute force is that you can't just find the "matching" number for a single element and be done..you have to see if a set of matching number differences yields the lowest overall difference. Tricky, I suppose you have to find all possibilities and compare they're overall difference -- very bad stuff.

mikeblas · Nov 8, 2006

HHunt said:
Oh, you can sort both the words and the characters in the words. But does that help you?
Consider these two lists of nonsense words:
(azzz, byyy, cxxx), (xxxx, yyyy, zzzz)

Is there a way to independently sort these to get the ideal result? What if the matching parts were more complex?

Why would you do so? I guess you've lost me: if we're comparing characters instead of integers, I see no reason why we can't sort them. Where did words come from? If we're comparing words, we'll first have to decide how to compute the absolute value difference between two different words. Sorting integers has the property of putting the numbers in the order of their magnitude. Does sorting words do the same thing for your definition of the absolute value of the differences between the words?

mikeblas · Nov 8, 2006

rodsfree said:
I look at things from a math viewpoint since I'm a Manufacturing Engineer instead of a programming view - like sorting and such. So it's an ingrained response to ask about magnitude and precision.

Sure. But what I asked about was how magnitude and precision have a bearing on this problem. I still don't see how they do.

rodsfree said:
As a thought,
Once you eliminate the zero pairs and get the sets the same size.
Sort both sets in ascending order.
Subtract Item 1a from Item 1b (this would be the smallest from set "a" from the smallest from set "b" ) - set this result as an absolute value in a new set "c"
Iterate through the rest of the pairs doing the same thing.

Then sum all the numbers in set "c"
This should give you the smallest answer.

Anybody see anything wrong with it?

What's wrong with it is that it is inefficient; you're doing things out of order. Finding the numbers in list A which are equal to numbers in list B requires N(A) * N(B) comparisons, worst case.

But then you'll sort, which will require even more comparisons. By sorting first, you can find the numbers which match accross the lists as well as the numbers which are nearest to their best match in a very efficient way.

HHunt · Nov 8, 2006

mikeblas said:
Why would you do so? I guess you've lost me: if we're comparing characters instead of integers, I see no reason why we can't sort them. Where did words come from? If we're comparing words, we'll first have to decide how to compute the absolute value difference between two different words. Sorting integers has the property of putting the numbers in the order of their magnitude. Does sorting words do the same thing for your definition of the absolute value of the differences between the words?

HHunt said:
A genetic algorithm would be more interesting if the way you compared the objects did not lend itself so easily to sorting. Matching two sets of strings on characters in common, perhaps?

"A set of strings" roughly = a list of words.
I was trying to come up with something related where a GA would make more sense, and landed on:
Instead of numbers: words. Instead of "smallest difference": "largest amount of characters in common".

I think that's complex enough to eliminate sorting as a solution.

drizzt81 · Nov 8, 2006

rodsfree said:
So what method do we use to truncate the larger list?

If the solution involves sorting both lists, the truncation ought to be trivial, since we can "easily" create aggregate statistics. Depending on the range and possibly average and standard deviation, we ought to be able to select a subset of the larger one that best matches the smaller set.

rodsfree · Nov 8, 2006

mikeblas said:
Sure. But what I asked about was how magnitude and precision have a bearing on this problem. I still don't see how they do.

Mike, if you look at the end of my statement it says "I'm a Manufacturing Engineer... So it's an ingrained response to ask about magnitude and precision."
Magnitude and precision matter when you are making physical objects. Like parts for turbine engines and stuff (which is what I do). So I automatically ask about them i.e. ingrained response

mikeblas said:
What's wrong with it is that it is inefficient; you're doing things out of order. Finding the numbers in list A which are equal to numbers in list B requires N(A) * N(B) comparisons, worst case.

But then you'll sort, which will require even more comparisons. By sorting first, you can find the numbers which match accross the lists as well as the numbers which are nearest to their best match in a very efficient way.

Again, I am not a programmer.

Those steps are the way I would do this manually, with a pen and paper.

It's called "Flow Charting" - It's the kind of thing that is taught in first year problem solving classes.
Find a way that works - then refine it. This is the approach that generally works best. Solutions to problems don't have to be perfect to work correctly.

Another thing about my inefficient solution - We don't know the maximum number of items in one of these sets.
If there are less than 20 or even less than a 100 items then efficiency doesn't matter at all - because the time difference between the efficient program and the inefficient program will be negligible.

We are not talking about sorting Amazon's or eBay's database here - this is probably just a first year programming problem. With the constraints the OP's given I'd just about bet on it. So he'll get more points for a logical solution that works instead of the most efficient programming solution to the problem which could very well be language dependent, using a language that he can't use.

Remember, he is asking for an algorithm, which last time I checked, was a method or formula. He isn't asking for the code to do this with.

rodsfree · Nov 8, 2006

drizzt81 said:
If the solution involves sorting both lists, the truncation ought to be trivial, since we can "easily" create aggregate statistics. Depending on the range and possibly average and standard deviation, we ought to be able to select a subset of the larger one that best matches the smaller set.

I think that we might be over complicating things here. Not to take anything away from your statement drizzt81.

The only math that the OP has mentioned is:
subtraction
addition
<>= comparison
number sets

The basic problem is:

Given: 2 different sets of numbers that may be of unequal size
Find: The method of pairing the numbers such that the sum of their differences is the smallest possible number. Include all of the constraints that the OP has mentioned. Truncation of the larger set is allowed.

Stated this way I think that we can avoid statistical analysis of the number sets or any other higher math.

mikeblas · Nov 8, 2006

rodsfree said:
Mike, if you look at the end of my statement it says "I'm a Manufacturing Engineer... So it's an ingrained response to ask about magnitude and precision."

Yep; I saw that. It doesn't tell me why you think those things are important. Your note also said you felt that information was imperative to getting starrted. If you meant that you just ask that because you always ask that and just did so this time again without thinking, then that's fine -- just say so.

On the other hand, maybe your expeirence in Manufacturing can tell us something that, as computer scientists or programmers, we're not used to thinking about. If you have such an insight to share from your different perpsective, I'm very eager to hear it.

rodsfree said:
Find a way that works - then refine it. This is the approach that generally works best. Solutions to problems don't have to be perfect to work correctly.

No, they don't. But to be elegant and scaleable, they have to be significantly refined. I can tell by the tone of your response that you're not interested in feedback which might refine your solution, which is fine. But then, I'm not sure why you want to take such a defensive and condescending tone in your response.

rodsfree said:
If there are less than 20 or even less than a 100 items then efficiency doesn't matter at all - because the time difference between the efficient program and the inefficient program will be negligible.

I, and a few other people here, have pointed that out when discussing the brute-force approach. Thing is, your suggested solution is less efficient than the brute-force approach because it also sorts and merges; and it is less efficient than the sort-and-merge approach because it does the brute force work and discards some of the by-product of that work.

rodsfree said:
We are not talking about sorting Amazon's or eBay's database here

We aren't? I thought you had asserted that we didn't know the maximum size of the lists. The interesting thing about thinking through algorithms (for those of us who are practiced at it) isn't coming up with something good enough to work and stopping. It's finding a solution that scales well from only a few items to astronomically large numbers of items. It's hard to call something "good enough" for a small data set "solved". Data sets only grow larger. If it's a homework assignment in question, I'd argue it's even more important to study the problems of scale in the proposed solution because they offer a far greater learning opportunity.

rodsfree said:
So he'll get more points for a logical solution that works instead of the most efficient programming solution to the problem which could very well be language dependent, using a language that he can't use.

Remember, he is asking for an algorithm, which last time I checked, was a method or formula. He isn't asking for the code to do this with.

Nobody here has presented any code, and certainly nothing language-specific.

mikeblas · Nov 8, 2006

drizzt81 said:
If the solution involves sorting both lists, the truncation ought to be trivial, since we can "easily" create aggregate statistics. Depending on the range and possibly average and standard deviation, we ought to be able to select a subset of the larger one that best matches the smaller set.

Problem is, aggregate statistics don't tell you which numbers to leave out. I'm also a little shy of the word "truncate". You can't just lop off one end of the list or the other, since the best matches might be in the middle. (Even in the problem statement, I'm not sure if "truncate" means "rip one of the ends off" or "arbitrarily reduce the cardinality of the set".)

Consider the case where set A = { 3, 4, 5, 6, 10 } and set B = { 5, 9 }. The solution ends up being { {5, 5}, {10, 9} }. If you're discarding individual elements, or runs of elements, you can find the best solution here. If you're truncating set A (meaning, chopping off a run of elements but only at one end), you're going to miss the best solution.

HHunt said:
"A set of strings" roughly = a list of words.
I was trying to come up with something related where a GA would make more sense, and landed on:
Instead of numbers: words. Instead of "smallest difference": "largest amount of characters in common".

I think that's complex enough to eliminate sorting as a solution.

Oh, I see; thanks for the clarification! Indeed, unless you could wire up a sort that managed to keep the property of differences, then, yes, you'd probably need to investigate some more complicated algorithm. That said, though, I think you might get pretty far with a comparison-based approach that "scored" the words in a stable ranking. "folds" in the domain of that ranking could be identifiable, and then be addressed in a manner similar to that of a hash.

HHunt · Nov 8, 2006

Yes, I was wondering if something like that might be possible. It wouldn't be trivial, though. We need to mesh the basically separate sorting criteria "number of 'A's", 'B's etc. into a result where closeness actually correlates to a good match. Also, a word can be an equally good match for several others, but some of those would be better matched with other words. It's not like the scoring system is anything near linear, either. And then there's the issue of pairs with different length.

It might actually be easier to write a genetic algorithm, and I don't want to guess how they would perform.

(I think I landed on an interesting problem, more so than I suspected myself.)

rodsfree · Nov 8, 2006

mikeblas said:
Yep; I saw that. It doesn't tell me why you think those things are important. Your note also said you felt that information was imperative to getting starrted. If you meant that you just ask that because you always ask that and just did so this time again without thinking, then that's fine -- just say so.

I thought I did say so. It seems that my ingrained responses about magnitude and precision bother you. For me, I have to think of numbers like the difference between 0.1gm and 1.0gm at 25,000rpm meaning that several hundred people could possibly die when the turbine engine hanging on the wing their airplane explodes at 35,000 feet.

And since the FAA, various other government agencies, entire boat loads of lawyers, and quite possibly my own family members could take exception to their plane blowing up - I ALWAYS worry about magnitude and precision.

mikeblas said:
No, they don't. But to be elegant and scaleable, they have to be significantly refined.

That's part of the fun!

mikeblas said:
I can tell by the tone of your response that you're not interested in feedback which might refine your solution, which is fine. But then, I'm not sure why you want to take such a defensive and condescending tone in your response.

No, I am interested in feedback. And if the tone of my response seems defensive. I'm sorry. But it's in response to the tone of YOUR comments. You always appear to be very arrogant and condescending to me.

Mike, it seems that whenever you and I are in the same thread we are always arguing. I just know that if we ever met in person that we could have such great fun arguing with each other. And I really do enjoy arguing.

But we have a communication problem. Your posts always seem stilted, arrogant and condescending to me and apparently mine appear defensive and condescending to you. We will have to learn to communicate with each other better.

mikeblas said:
I, and a few other people here, have pointed that out when discussing the brute-force approach. Thing is, your suggested solution is less efficient than the brute-force approach because it also sorts and merges; and it is less efficient than the sort-and-merge approach because it does the brute force work and discards some of the by-product of that work.

We aren't? I thought you had asserted that we didn't know the maximum size of the lists. The interesting thing about thinking through algorithms (for those of us who are practiced at it) isn't coming up with something good enough to work and stopping. It's finding a solution that scales well from only a few items to astronomically large numbers of items. It's hard to call something "good enough" for a small data set "solved". Data sets only grow larger. If it's a homework assignment in question, I'd argue it's even more important to study the problems of scale in the proposed solution because they offer a far greater learning opportunity.

The only thing that I can say to these statements are.
1) It illustrates our communication problem perfectly - because just the apparent tone of the comments sets my teeth on edge. It is very arrogant and condescending to say something like (for those of us who are practiced at it)
2) The judge of what is "good enough" will be the end user - or instructor - or somebody far different from me. I will be happy to just see a flow chart of individual, ordered steps required to solve this problem. Regardless of elegance.

mikeblas · Nov 8, 2006

rodsfree said:
It seems that my ingrained responses about magnitude and precision bother you.

Only as far as wondering why you think it gates progress on the problem.

rodsfree said:
Mike, it seems that whenever you and I are in the same thread we are always arguing. I just know that if we ever met in person that we could have such great fun arguing with each other. And I really do enjoy arguing.

Once you understand the difference between "arguing" and "discussion", I think you'll have an insight into reading my posts in the spirit they're intended.

drizzt81 · Nov 8, 2006

mikeblas said:
Problem is, aggregate statistics don't tell you which numbers to leave out. I'm also a little shy of the word "truncate". You can't just lop off one end of the list or the other, since the best matches might be in the middle. (Even in the problem statement, I'm not sure if "truncate" means "rip one of the ends off" or "arbitrarily reduce the cardinality of the set".)

Consider the case where set A = { 3, 4, 5, 6, 10 } and set B = { 5, 9 }. The solution ends up being { {5, 5}, {10, 9} }. If you're discarding individual elements, or runs of elements, you can find the best solution here. If you're truncating set A (meaning, chopping off a run of elements but only at one end), you're going to miss the best solution.

Please excuse my inaccurate usage of "truncate". Let me rephrase: "I believe that selecting the numbers to leave out ought to be 'easy' if we have sorted the lists, since compiling aggregate statistics during the sort should not increase complexity by 'much'".

Yes, there are a lot of heuristics in here and more to come.
If we have the mean and the std. deviation of a sample, as well as its size, can we not determine approximately how many numbers are contained in the range of (mean - std.dev, mean + std. dev) or is this limited to normally distributed samples?
...*thinking*

I guess my underlying idea was pointless. Sorry.

Maybe if we sorted the smaller list first and keep track of the range of values. While sorting the larger list we could have a 'filter' that discards numbers that are outside of the range into a thirsd list. This third list will be sorted according to the minimum absolute distance to either the min or max of the initial list.

If we need more values from list 2 (the larger one) we select ones from that third list, though we still run into the problem that mike has pointed out...

mikeblas · Nov 8, 2006

drizzt81 said:
Please excuse my inaccurate usage of "truncate".

No worries; it's not inaccurate -- it's imprecise. And it is imprecise in the problem statement, too, as I mentioned.

drizzt81 said:
Maybe if we sorted the smaller list first and keep track of the range of values. While sorting the larger list we could have a 'filter' that discards numbers that are outside of the range into a thirsd list. This third list will be sorted according to the minimum absolute distance to either the min or max of the initial list.

What's that get you? It seems like a lot of work to do when just comparing and merging cover the best fit.

drizzt81 · Nov 8, 2006

mikeblas said:
No worries; it's not inaccurate -- it's imprecise. And it is imprecise in the problem statement, too, as I mentioned.

What's that get you? It seems like a lot of work to do when just comparing and merging cover the best fit.

I was thinking about how to select the values from the larger list that are best fit.. bottom line: I cannot come up with one at the moment.

mikeblas · Nov 8, 2006

I don't think you can come up with a more direct method than picking the items out of a sorted list. The out-of-range values might be the ones you want:

A: { 4, 4, 4, 4 }
B: { 3, 10}
S: {3, 4}, {10, 4}

or

A: { 1, 1, 99, 99 }
B: { 20, 30}
S: {1,20 }, {1, 30}

If you construct this extra list, you have to decide when to use it (or not), which is more work.

amromousa · Nov 8, 2006

The reason I think sorting and picking the same index isn't ideal is this situation:

A: { 1 2 4 5 7 }
B: { 3 6 8 }

Picking the same index gives an answer of { (1, 3), (2, 6), (4, 8) } where it should be { (2,3), (5,6), (7,8) }.

mikeblas · Nov 8, 2006

Right. The idea isn't to pick the same index; it is to perform a modified merge over the list.

shaolin3a · Nov 8, 2006

mikeblas said:
Right. The idea isn't to pick the same index; it is to perform a modified merge over the list.

What do you mean by "modified merge"? Say you're looking to pair the i-th element from set A, then how can you decided if an element from B is right?

What I'm thinking is the possible pairing can be are anything from B to B[i + n] where n is the difference in # of elements of the two sets. But that can still leave a range for all the elements. How can you narrow it from there?

mikeblas · Nov 8, 2006

I mean that I'd take the well-known merge algorithm and modify it. Since the items are in order, we know that the best match is between where we are and the number of needed remaining elements.

I'm coming down with the flu, but I think it turns out to be just a special case of the subset sum problem, where the sums you're working are the cumulative absolute values of the differences.

shaolin3a · Nov 9, 2006

mikeblas said:
I'm coming down with the flu, but I think it turns out to be just a special case of the subset sum problem, where the sums you're working are the cumulative absolute values of the differences.

Thanks again for all the help you've given me even when you're not feeling well. Hope you get better soon!

Optimal pairing algorithm?

Limp Gawd

Weaksauce

[H]ard|DCer of the Month - May 2006

Gawd

Supreme [H]ardness

[H]ard|DCer of the Month - May 2006

Limp Gawd

Supreme [H]ardness

[H]ard|DCer of the Month - May 2006

Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

Limp Gawd

Gawd

[H]ard|Gawd

Supreme [H]ardness

Gawd

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - May 2006

Supreme [H]ardness

[H]F Junkie

[H]ard|Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

[H]ard|DCer of the Month - May 2006

Supreme [H]ardness

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]ard|DCer of the Month - May 2006

Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd