Determining Algorithm Efficiency for Specific Values

Cod · Oct 25, 2012

When dealing with growth functions and time complexity, is finding the exact efficiency for a specific value as simple as plugging in the value and doing the math? Or do you only have to plug in the value for the order of the growth function? Whichever is lower is most efficient?

For example, take the following two growth functions: 5n^2 + n and 10n + 7. If I wanted to determine the efficiency at n = 50, do I just plug in the value old school algebra style? The first value would be 12550 and the second value would be 507, so the second growth function with order O

would be more efficient? Or, like I stated earlier, can I just plug in the value for the part of the function that determines the order to find the most efficient?

This algorithm analysis stuff is torturing me. Any nudging in the correct direction would be greatly appreciated.

BDV · Oct 26, 2012

Check this out: http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o

ShoeLace · Oct 26, 2012

I'm learning about this stuff now too but I don't think you can use comparisons like this all the time. What happens if you only have 1 piece of code to look at and you want to determine its time complexity?

I think you need to measure the time it takes and maybe how much memory it uses but the time is more important. You could start by breaking the algorithm up into steps but the step count won't be the deciding factor. How long it runs is the deciding factor which could be figured out by seeing how many times it needs to loop.

For example:

What is the timing here in relation to n?

Code:

i = 0
while i < 5:
   n *= 2
   i += 1

print n

The answer is O(1) constant time because no matter what n's value is, the running time won't change.

What would the timing be for this one?

Code:

def recurPower(a, b):
   print a, b
   if b == 0:
      return 1
   else:
      return a * recurPower(a, b-1)

Edit:
I forgot to mention this when I wrote it this morning. If there's 2 time complexities the highest (worst/longest) one is the true time measurement.

Cod · Oct 26, 2012

I'm still a total newb at this stuff, but I think the if-statement is O

and the else-statement is O(log n). Thus, the whole recursion loop has an order of O(log n).

Raekwon · Oct 26, 2012

Is anyone else taking MITx 6.00x? Watched a lecture on this material today.

ShoeLace · Oct 26, 2012

Cod said:
I'm still a total newb at this stuff, but I think the if-statement is O and the else-statement is O(log n). Thus, the whole recursion loop has an order of O(log n).

Nope. That function I pasted is to calculate the power of a number recursively. It's linear complexity, O

or in this case O(b).

It's linear complexity because the length of time it takes to compute will go up based on what "b" is. If we set "b" to be 3 initially and then later on changed it to 300 then it'll take 100x longer to run because it has to call itself 100x more to finish the job.

An example of O(log n) would be binary searching.
http://en.wikipedia.org/wiki/Binary_search_algorithm

I'm still a newbie myself so explaining this one will be harder. Think of a phone book. If you looked for Mike Jones you would probably flip to about the half way point right? Now you would decide if you went too far and need to go back or didn't go enough and need to go forward. You can state with 100% certainty that the other side is 100% useless to you.

If you skipped to the Hs right away then you didn't go far enough. Now you know everything prior to H is useless so you can toss it. If you kept repeating this pattern where you picked a guessing point and then compared that guessed value to the value you want within a certain accuracy thresh hold you can approximate an answer very quickly.

If you walked through each page then it would be O

or linear time. This is usually labeled as exhaustive searching. If you added 1,000 pages to the book then it would take 1,000 more iterations. O(log n) is faster than O

. If we used the smarter way of guessing adding 1,000 pages only requires 1 more guess. Big difference right?

What do you think the answer to this one is?

Code:

def recurPowerNew(a, b):
   print a, b
   if b == 0:
      return 1
   elif b%2 == 0:
      return recurPowerNew(a*a, b/2)
   else:
      return a * recurPowerNew(a, b-1)

Is anyone else taking MITx 6.00x? Watched a lecture on this material today.

Yes, I just finished this week's problem set last night.

xSnowmaNx · Oct 27, 2012

Cod said:
When dealing with growth functions and time complexity, is finding the exact efficiency for a specific value as simple as plugging in the value and doing the math? Or do you only have to plug in the value for the order of the growth function? Whichever is lower is most efficient?

For example, take the following two growth functions: 5n^2 + n and 10n + 7. If I wanted to determine the efficiency at n = 50, do I just plug in the value old school algebra style? The first value would be 12550 and the second value would be 507, so the second growth function with order O would be more efficient? Or, like I stated earlier, can I just plug in the value for the part of the function that determines the order to find the most efficient?

This algorithm analysis stuff is torturing me. Any nudging in the correct direction would be greatly appreciated.

Yes, it is as simple as that.

The real question is: How useful is it in reality?

That depends on how accurate your function is with respect to a real algorithm running on a real machine. Even if you could totally account for all of the time (in abstract units) in an algorithm doesn't necessarily mean it will be completely accurate on a real machine with regards to its actual performance.

What exactly are you trying to do? Just trying to reason about the performance of two algorithms with certain input sizes? Plug and chug man. But you have to realize that your answer is only as good as how accurate your model is. The example growth functions you've given are a highly simplified view of algorithms that run on real machines.

You're assuming many things with the numbers in those approximations, such as all memory accesses taking the same amount of time, which is definitely not true on real machines. There's also the issue of performance effects due to factors outside of the algorithm's control, like OS scheduling or resource contention from other programs.

But from a purely academic point of view, you just plug in input size and you get your running time in your abstract units.

Clarification:

When you have an actual input size you wish to use to compare efficiency, you MUST use the entire function! This is where computer science and reality often differ! One of the most commonplace complaints from real programmers is that computer scientists ignore constants. The constant is INCREDIBLY IMPORTANT in many applications. Assembly programming is all about reducing the size of the constant in that time complexity function, but it's an incredibly effective form of optimization in certain cases.

Take for example, the function F

= n + 50000 vs G

= 5n^2. If you decide to only compute the values for each function taking the most dominant parts of the function F'

= n and G'

= n^2 and compute the cost for n = 50, you get F'(50) = 50 and G'(50) = 2500. You would then conclude that F

is faster for value n = 50. However, this is not true at all. When you put in the value n = 50 for the actual cost function F

and G

you get 50050 and 12500, respectively. G

is, in fact, 4 times faster than F

for n = 50.

Asymptotic analysis is incredibly useful, but you need to know when it is acceptable to ignore the constant (or other smaller parts of the function that aren't constant). The point of this kind of analysis is to understand the algorithm behavior for ever increasing problem sizes. If your actual problems are a fixed size or quite small, ignoring the constant can lead to a wildly incorrect analysis, as I've just shown.

jimmyb · Oct 27, 2012

Cod said:
When dealing with growth functions and time complexity, is finding the exact efficiency for a specific value as simple as plugging in the value and doing the math? Or do you only have to plug in the value for the order of the growth function? Whichever is lower is most efficient?
...

I'm going to dissent and say no you can't do that.

If you're talking about runtime efficiency, the big O complexity will not give you an answer because it doesn't give you the runtime. It's also an upper bound an so isn't necessarily typical.

Cod · Oct 27, 2012

ShoeLace, I'm just now looking at the phone book problem. I'll post something soon.

xSnowmaNx, thanks for the clarification. Its amusing you mentioned the importance of constants. I'm a math major taking my second CS class and this stuff is crazy because the big-O notation I learned a few years back in calculus works a little smoother.

jimmyb, to determine runtime efficiency, wouldn't we have to have more detail? Things such as CPU speed, memory, etc. would come into play. For determining algorithm efficiency, if I understand correctly, all I need is a sample input value with the growth function.

mikeblas · Oct 28, 2012

xSnowmaNx said:
Take for example, the function F = n + 50000 vs G = 5n^2. If you decide to only compute the values for each function taking the most dominant parts of the function F' = n and G' = n^2 and compute the cost for n = 50, you get F'(50) = 50 and G'(50) = 2500. You would then conclude that F is faster for value n = 50. However, this is not true at all. When you put in the value n = 50 for the actual cost function F and G you get 50050 and 12500, respectively. G is, in fact, 4 times faster than F for n = 50.

Asymptotic analysis is incredibly useful, but you need to know when it is acceptable to ignore the constant (or other smaller parts of the function that aren't constant). The point of this kind of analysis is to understand the algorithm behavior for ever increasing problem sizes. If your actual problems are a fixed size or quite small, ignoring the constant can lead to a wildly incorrect analysis, as I've just shown.

This is true, but even worse:the constant exists at each term. Most people writing O

analysis just consider the number of iterations or operations that will happen. That gives a coarse estimate and is still very useful to see how the algorithm scales. The issue ends up being that some operations are cheap (hitting memory, for example) and some operations are expensive (going to disk). Some operations seem cheap (getting the next element in an array, rg[n++] for example) some operations are actually expensive (doing division, branches, and tests to find a subsequent element in an array -- division and multiplication are costly, random access to memory is expensive because of the cache miss, and ...).

Point is, two algorithms which are both linear (at O

) might have a clear winner if the constant cost of an individual operation is expense. One might be O

= 3000000 * n, the other might be O

= 10 * n.

aL Mac · Oct 30, 2012

All of these points are good to make, but just saying.. that if you are taking a class and they ask you which is better for a specific N you can just plug in and find the answer with simple algebra. You can compare the performance by graphing the functions or by finding the cross over point(s). If you are a math major, this shouldn't be torture, it should be trivial!

cyclone3d · Nov 1, 2012

aL Mac said:
All of these points are good to make, but just saying.. that if you are taking a class and they ask you which is better for a specific N you can just plug in and find the answer with simple algebra. You can compare the performance by graphing the functions or by finding the cross over point(s). If you are a math major, this shouldn't be torture, it should be trivial!

This isn't always going to be the case though.

Plugging numbers into different algorithms, even if one of them should supposedly be faster, it may not be.

It really depends on the exact code, what language the code is in, how well the compiler can optimize the code, and what architechture that specific piece of code is run on.

The only real way to know which one will actually be better is to compile and test. Even then, it will be dependant on how well the actual algorithm is put into code.

mpoon2489 · Nov 1, 2012

I think a key point to keep in mind when you're evaluating algorithms in O notation, is that O notation doesn't care about constant values. O(10n + 7) = O(5n +2). O notation is specifically for the scenario of "which algorithm is faster when n -> infinity?" If you care about actual performance, you should benchmark implementations of algorithms. There is also some literature out there that has the constant values of algorithsm, for example quicksort is generally considered to have a faster constant than the other O(n log n) sorting algorithms.

aL Mac · Nov 1, 2012

cyclone3d said:
This isn't always going to be the case though.

Plugging numbers into different algorithms, even if one of them should supposedly be faster, it may not be.

It really depends on the exact code, what language the code is in, how well the compiler can optimize the code, and what architechture that specific piece of code is run on.

The only real way to know which one will actually be better is to compile and test. Even then, it will be dependant on how well the actual algorithm is put into code.

You're missing the point completely. He is taking a class. Pretend on an exam it says:

for n = 1000 which is the best algorithm given the following complexities:

Algorithm A : n^2 + 100n + 50
Algorithm B: 1,000,000 * n + 1,000

I suggest in this case you stop making thing over-complicated, plug in the numbers and get the answer instead of missing the question.

jimmyb · Nov 1, 2012

An exam likely wouldn't phrase the question open-endedly, and for advanced analysis the correct answer would actually require raising these caveats.

Cod · Nov 6, 2012

Sorry I haven't responded in awhile, work got a little crazy. Anyways, aL_Mac is correct. For the purpose of what I was trying to figure out, simply plugging in the values was the correct way to find out which of the original growth functions was more efficient. I spoke with my instructor at length after class the other night discussing all this.

I'm sure the caveats brought up throughout this discussion are relevant, but for a guy learning algorithms for the first time (me), simple is where I need to start.

Determining Algorithm Efficiency for Specific Values

Cod

Gawd

BDV

[H]ard|Gawd

ShoeLace

Gawd

Cod

Gawd

Raekwon

2[H]4U

ShoeLace

Gawd

xSnowmaNx

Gawd

jimmyb

2[H]4U

Cod

Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

aL Mac

Gawd

cyclone3d

Fully [H]

mpoon2489

Weaksauce

aL Mac

Gawd

jimmyb

2[H]4U

Cod

Gawd