Syntax and efficiency question

TheDude05 · Aug 18, 2008

Quick question. Is it anymore efficient, in terms of execution speed, to write a line of code like

Code:

answer = sanitizeInput(raw_input(wrapText(questionStr), lineWidth)).upper()

versus breaking it up into multiple lines to something like this maybe

Code:

answer = raw_input(wrapText(questionStr), lineWidth)
answer = sanitizeInput(answer)
answer = answer.upper()

Obviously the latter would be easier to read but is it any slower? This is also just a small example because I don't have any other unreadable chunks of code to compare with but I would think the idea should still be the same. I'm curious for both compiled and scripted languages. Thanks.

Biskquik · Aug 18, 2008

I really couldn't see the benefit to the top example to be faster. Every function within the parenthesis will still need to be evaluated. It might have fewer calls to store to the answer address but the overhead would be insignificant at worst.

Though it still depends on how the compiler/interpreter reads and optimizes the code above, in which case the two examples could translate to something very similar.

If you're a Software Engineer, readability takes precedence over minimal performance boosts because revising nested code can be a pain.

Monkey God · Aug 18, 2008

readability > (miniscule imaginary performance increases * 11tybillion)

ambientZ · Aug 18, 2008

If you can't read what its doing, you aren't going to be able to figure out that you can optimize the algorithm to reduce its complexity for the real performance gains.

mikeblas · Aug 18, 2008

Depends on the language, and the semantics of the type being returned from those functions.

wuggle · Aug 18, 2008

1) Write readable code that gets the job done in a reasonably fast manner
2) If its too slow, find the bottle neck and refactor/optimize it
3) Rinse and repeat as many times as necessary

The basic idea here is that you won't know how slow/fast something will be until its running in front of you. Get it done first, then start picking apart the slow parts. That doesn't excuse badly written / slow code, but don't optimize the hell out of it until its necessary.

mikeblas · Aug 18, 2008

wuggle said:
The basic idea here is that you won't know how slow/fast something will be until its running in front of you.

It's hard for me to agree with this.

Choosing the correct algorithm needs to be done first. The code you write implements an algorithm. Tweaks that you make to the code generally aren't going to change the cost of the implementation with the same order of magnitude as choosing the correct algorithm.

This thread started out asking what language was used to write Windows. It ended up containing some code to compare C++ and assembler which implemented bubble sort. As a part of the exercise, I implemented the same code with quick sort.

The C++ code for bubble sort was slowest. The provided assembler, and the tweaked assembler are only 10 or 20% different. The change in the algorithm, even though it was written with C++, shows an improvement that's about ten times faster.

It doesn't require having written the code to know that quick sort is faster than bubble sort.

jimmyb · Aug 18, 2008

mikeblas said:
This thread started out asking what language was used to write Windows. It ended up containing some code to compare C++ and assembler which implemented bubble sort. As a part of the exercise, I implemented the same code with quick sort.

Pretty interesting read here.

You used the term "double-pumping memory" in that thread, what does this mean?

mikeblas · Aug 19, 2008

The processor has registers, where it does work. They're internal to the processor. The memory is external, obviously. Memory is very expensive to access. Even on fast processors, you might spend 300 or 400 clock cycles to get something out of memory if it isn't in cache. If you've got multiple processors, reading and writing memory also involves doing cache coherency management, which can involve locking the bus and temporarily keeping other procs from doing any work.

Double-pumping memory means that someone has written machine code that gets a value from memory into a register, touches it, then puts it back into memory. If it needs it again, it goes back to memory and loads the value. It shouldn't do that; since it knows the value hasn't changed, it should just rely on the still-current copy in memory.

An example from this post is in this code fragment:

Code:

$L278:
	mov	ecx, DWORD PTR _i$276[ebp]
	add	ecx, 1
	mov	DWORD PTR _i$276[ebp], ecx
$L277:
	mov	edx, DWORD PTR _count$[ebp]
	sub	edx, 1
	cmp	DWORD PTR _i$276[ebp], edx
	jge	$L279

We get something from a temporary into ecx. We add one, then write it back to that temporary. ECX and the temporary are exactly equal, right now.

We get _count out of memory, then subtract one. We compare that to the temporary value. This code would be significantly faster if it didn't store ECX into the temporary, and direcltly compared to ECX. Or, it would be at least a little faster if it compared against ECX after storing, knowing ECX hasn't changed since.

Since x64 and IA32 processors have so few registers, it's important to keep the right data in the registers --the most useful data for optimization. This is compounded by the fact that accessing memory is so expensive. So writing a compiler requires register coloring that remembers what's in each register, and what's changed or not. Since I know the VC++ compiler intimately, I know that it never double-pumps memory as above when asked to produce optimized code.

Arainach · Aug 19, 2008

TheDude05 said:
Quick question. Is it anymore efficient, in terms of execution speed, to write a line of code like

Code:

answer = sanitizeInput(raw_input(wrapText(questionStr), lineWidth)).upper()

versus breaking it up into multiple lines to something like this maybe

Code:

answer = raw_input(wrapText(questionStr), lineWidth) answer = sanitizeInput(answer) answer = answer.upper()

Obviously the latter would be easier to read but is it any slower? This is also just a small example because I don't have any other unreadable chunks of code to compare with but I would think the idea should still be the same. I'm curious for both compiled and scripted languages. Thanks.

If it's a compiled language, any compiler should optimize the two into the same assembler and there should be no difference. On an interpreted language, it should be indistinguishable.

mikeblas · Aug 19, 2008

Arainach said:
If it's a compiled language, any compiler should optimize the two into the same assembler and there should be no difference.

As I pointed out earlier, this may or may not be true depending on the language and the declarations involved.

jimmyb · Aug 19, 2008

Arainach said:
If it's a compiled language, any compiler should optimize the two into the same assembler and there should be no difference. On an interpreted language, it should be indistinguishable.

As Mike said, without knowing more about the program and language, I don't think we can make any assumptions about what the code will compile to.

It's a contrived example, but suppose you have the following:

Code:

char *answer;  // global declaration

...

char *sanitizeInput ( char *a ) {
	return answer;
}

Then inlining the code actually produces different results. I would think that the compiler and linker would not always be able to figure out if this is the case, particularly when you're linking with sourceless libraries.

Oh and thanks for the reply Mike.

mikeblas · Aug 19, 2008

It's not inlining I'm worried about. While inlining does affect performance, it's not going to affect the semantics unless your compiler is buggy. The problem is the semantics between the two code snippets might be different, depending on which operators are defined and which types the classes are taking and returning. If they're accepting references and returning references, maybe they're isn't much difference; but that's not likely the case, since the first line example would return answer as a reference to a now modified questionStr object.

If they're accepting values and returning new values, the semantics involve copying and destroying temporary objects and that will have different ramifications between the two examples. Let's consider an example; I've written a class called CSomething which does nothing but print out something when these different operators are invoked.

Code:

class CSomething
{
public:
	CSomething( const CSomething &other )
	{
		printf( "CSomething copy constructor\n" );
	}

	CSomething& operator=(const CSomething &other )
	{
		printf( "CSomething assignment operator\n" );
		return *this;
	}

	CSomething& upper( )
	{
		return *this;
	}

	CSomething()
	{
		printf( "CSomething default constructor\n" );
	}

	~CSomething()
	{
		printf( "CSomething destructor\n" );
	}
};

CSomething wrapText( CSomething some )
{
	return some;
}

CSomething raw_input( CSomething some )
{
	return some;
}

CSomething sanitizeInput( CSomething some )
{
	return some;
}

int _tmain(int argc, _TCHAR* argv[])
{
	CSomething someInput;
	CSomething someAnswer;

	printf( "== compound statement ==\n" );

	someAnswer = sanitizeInput(raw_input(wrapText(someInput))).upper();


	printf( "== explicit temporaries ==\n" );

	someAnswer = raw_input(wrapText(someInput));
	someAnswer = sanitizeInput( someAnswer );
	someAnswer = someAnswer.upper();

	return 0;
}

When run, this code produces the following output:

Code:

CSomething default constructor
CSomething default constructor
== compound statement ==
CSomething copy constructor
CSomething copy constructor
CSomething destructor
CSomething copy constructor
CSomething destructor
CSomething copy constructor
CSomething destructor
CSomething assignment operator
CSomething destructor
== explicit temporaries ==
CSomething copy constructor
CSomething copy constructor
CSomething destructor
CSomething copy constructor
CSomething destructor
CSomething assignment operator
CSomething destructor
CSomething copy constructor
CSomething copy constructor
CSomething destructor
CSomething assignment operator
CSomething destructor
CSomething assignment operator
CSomething destructor
CSomething destructor

The code that assigns results (that is, the second example, using explicit temporaries) invokes the operator=() function three times. If this function is expensive, then the performance will be substantially different. I hope that makes my point clear, and in particular shows the differences between the two constructs to those who think there's no difference.

jimmyb · Aug 19, 2008

Interesting. While it's nice to assume that the copy and assignment constructors will have same effect, I suppose neither we nor the compiler can.

I misspoke earlier, when I said "inline", I meant as a compound statement. I took a course a couple years back which touched on some of these issues. The classic example they used was with pointer aliasing.

tim_m · Aug 19, 2008

if you're really worried (not that you should be) and your language allows it just put it on multiple lines:

Code:

answer =
    sanitizeInput(
        raw_input(
            wrapText(questionStr),
            lineWidth)
    ).upper()

TheDude05 · Aug 19, 2008

Thanks everyone for your replies and mike for demonstration. The question was more out of pure curiosity and trying to expand my own knowledge. The code is in Python if it makes any difference.

wuggle · Aug 20, 2008

mikeblas said:
It's hard for me to agree with this.

Choosing the correct algorithm needs to be done first. The code you write implements an algorithm. Tweaks that you make to the code generally aren't going to change the cost of the implementation with the same order of magnitude as choosing the correct algorithm.

This thread started out asking what language was used to write Windows. It ended up containing some code to compare C++ and assembler which implemented bubble sort. As a part of the exercise, I implemented the same code with quick sort.

The C++ code for bubble sort was slowest. The provided assembler, and the tweaked assembler are only 10 or 20% different. The change in the algorithm, even though it was written with C++, shows an improvement that's about ten times faster.

It doesn't require having written the code to know that quick sort is faster than bubble sort.

Yes. You're right, choosing the appropriate algorithm is more important than tweaking the algorithm for efficiency reasons. However, my point was slightly different:

My point is that using quick sort vs bubble sort may not even matter in the scope of the bigger picture. If you need to sort an array of 10 ints, what will be the performance difference between bubble sort vs quick sort vs optimized quick sort? nanoseconds? milliseconds? Now compare that against how much time will it take to properly implement and test each one of those algorithms.

I'm pulling these numbers out of thin air, but lets say it takes 5 minutes to implement bubble sort, 30 minutes to implement quick sort and 2 hours to implement optimized quick sort. Lets say (these numbers are also out of thin air) that bubble sort of 10 integers takes 10 ns, qsort takes 5ns and opt qsort takes 4 ns. A "run" of the application takes ~200 ms.

My point is this: 10 ns may be "good enough" for this particular application. It may not be. You don't know until you see it running in front of you.

Note: I am *NOT* justifying using a bad algorithm instead of a good algorithm to solve a problem. You should always use the appropriate solution for the problem you are solving. And you should be aware of all the implications of your decision, be it execution time or implementation time or readability or you-name-it.

I personally recommend to get something up and running as quickly as possible and get as much feedback as possible. For example, solution A would take 1 day to implement, B would take 15 days and C would take 30 days. C is going to be faster than B and B is going to be faster than A. Instead of going for solution "C", consider doing A first to get feedback (maybe A is good enough and you don't even need B or C).

*grabs popcorn*
think we can go a week without doing this mike?

mikeblas · Aug 20, 2008

wuggle said:
My point is this: 10 ns may be "good enough" for this particular application. It may not be. You don't know until you see it running in front of you.

Certainly, dvelopers have to budget how they spend their time. But what I'm disagreeing with is the assertion that you don't know what "good enough" is until you actually run the code. It's possible that, in a complicated system, the end-to-end cost of a feature is hard to judge. Why is it that you don't believe you can decide which algorithm to use based on requirements analysis?

wuggle said:
I personally recommend to get something up and running as quickly as possible and get as much feedback as possible. For example, solution A would take 1 day to implement, B would take 15 days and C would take 30 days. C is going to be faster than B and B is going to be faster than A. Instead of going for solution "C", consider doing A first to get feedback (maybe A is good enough and you don't even need B or C).

What you're advocating is betting, then, that you only ever need A. Without thinking it through, you'll end up implementing A, then tearing it down in favor of B or C. Why spend A+B or A+C when you can do a bit of analysis and get the right solution on the first try?

wuggle said:
*grabs popcorn*
think we can go a week without doing this mike?

Hunh? Without doing what?

wuggle · Aug 20, 2008

mikeblas said:
Certainly, dvelopers have to budget how they spend their time. But what I'm disagreeing with is the assertion that you don't know what "good enough" is until you actually run the code. It's possible that, in a complicated system, the end-to-end cost of a feature is hard to judge. Why is it that you don't believe you can decide which algorithm to use based on requirements analysis?

Can you tell me, without implementing, how long would it take to sort 1000 ints? Bubble is "slower" than quick sort, but can you quantify "slower"? I disagree with the point you seem to be implying: that it is possible to get the performance metrics of an algorithm before you have implemented it, especially in the case of complicated systems with many moving parts. Sure, you can guess relative performance of various algorithms, but I just don't understand how you can figure out the total performance without seeing it in action in some shape or form.

If end-to-end cost is expensive then thats a separate issue. You'd be better off trying to find a way to decrease that cost, such as creating a simple model of the system. If high end-to-end cost is unavoidable, that needs to be factored in to decision making process.

Most of the requirements that I see are business requirements that are a world away from the implementation details (including platform/language/algorithm). eg, design an application that solves this problem. Business doesn't care what algorithm you are using. They care about a) time to market, b) meeting some "minimal performance" figure. Getting back to your question, I don't believe that, given a set of reasonable algorithms/solutions, it is possible to predict which will meet that "minimal performance" and which will not.

Example:
Problem: Sort 1000 random numbers.
Performance constraint: 10 seconds

Given that requirement, what is the quickest (in terms of implementation time) solution? Can bubble sort do it in 10 seconds? Only one way to find out... Implement and see if it can do it in less than 10 seconds. If so, you're done and business can start selling. If not, business can start selling it anyway and you can go back and implement optimized quick sort. Everyone wins. Even the client (who is most impacted by the slower performance) as their immediate problem (sorting the numbers) is solved.

mikeblas said:
What you're advocating is betting, then, that you only ever need A. Without thinking it through, you'll end up implementing A, then tearing it down in favor of B or C. Why spend A+B or A+C when you can do a bit of analysis and get the right solution on the first try?

It depends on what the relative cost of analysis vs "A" vs "B" vs "C" is. If analysis is cheap, then you should analyze first. If "A" is about as cheap as analysis, I'd with A as IMHO a working product is better than some analysis. It depends on what the situation is.

mikeblas said:
Hunh? Without doing what?

These long winded discussions in random threads

tim_m · Aug 20, 2008

linear search ftw

mikeblas · Aug 20, 2008

wuggle said:
Can you tell me, without implementing, how long would it take to sort 1000 ints? [...] Bubble is "slower" than quick sort, but can you quantify "slower"?

Why do you need to know how long, exactly, it'll take? Can I guarantee that, on certain hardware, it'll take less than the time required by the user? Sure. Do I know that, at 1000 integers, I can expect quicksort over simple integer keys to be about 300 times faster? Sure do.

These relative comparisons are all that's really needed. If you want precise times, then yeah--you've got to go measure it. And that will take lots of time to do right, as well.

wuggle said:
Getting back to your question, I don't believe that, given a set of reasonable algorithms/solutions, it is possible to predict which will meet that "minimal performance" and which will not.

Perfectly predict it with tight absolute constraints? Maybe not. Predict that a better algorithm is going to be faster? Of course you can.

wuggle · Aug 20, 2008

mikeblas said:
Why do you need to know how long, exactly, it'll take? Can I guarantee that, on certain hardware, it'll take less than the time required by the user? Sure. Do I know that, at 1000 integers, I can expect quicksort over simple integer keys to be about 300 times faster? Sure do.

These relative comparisons are all that's really needed. If you want precise times, then yeah--you've got to go measure it. And that will take lots of time to do right, as well.

quicksort is 300x faster than bubble sort for n=1000? More like ~100x. n log n is actually n log(base 2) n

But thats besides my point... my point is that you don't know how much time it will take to do all that number crunching. It could be 10s vs 100ms or 10 ms vs 100 ns. *THATS* my point. Yes, quick sort will be faster but for all intents and purposes, if bubble sort takes 10 ms that can be "good enough" and you can move on to the next deliverable. And if it winds up being 10 s, you know that you can always improve the algorithm to use quick sort and it will be 100ms. My whole point is that you don't know the "real time" number until you see it in front of you.

Heres another example: You have a problem with two solutions A and B. A is 10 * nlogn, B is 2*nlogn. B will take 10x longer than A to implement. What do you do?

mikeblas said:
Perfectly predict it with tight absolute constraints? Maybe not. Predict that a better algorithm is going to be faster? Of course you can.

Yes. I agree, quick sort IS faster than bubble sort. Its been tried and proven. But in the real world, users don't care about which algorithm is being used. They just want a solution to their problem. That's all.

Xeno8 · Aug 21, 2008

"Premature optimization is the root of all evil" - Donald E. Knuth

devman · Aug 21, 2008

wuggle said:
Yes. I agree, quick sort IS faster than bubble sort. Its been tried and proven. But in the real world, users don't care about which algorithm is being used. They just want a solution to their problem. That's all.

I'd like to know this where this world is in which end users do not care about performance, sounds like nice clients to me!

With small enough N, differences between algorithms is minimal no argument, but usually when your talking about algorithm efficiency your talking about for sufficiently large N.

mikeblas · Aug 21, 2008

wuggle said:
is log2

Yeah, I realized that after I had posted, but got busy and didn't fix it ... Sorry.

wuggle said:
Heres another example: You have a problem with two solutions A and B. A is 10 * nlogn, B is 2*nlogn. B will take 10x longer than A to implement. What do you do?

Figure out what n is right now, and what it's expected to be in a few months, and ask what it's presumed to be through the length of the software's life.

I need to get it right on the first try. If I don't have 10 time units to spend on a better solution, I certainly don't have 11 time units to spend on the solution, either.

wuggle · Aug 22, 2008

devman said:
I'd like to know this where this world is in which end users do not care about performance, sounds like nice clients to me!

They do care. But time to market > performance atleast for the short term. If you think about it, thats the definition of a product that is in beta testing. People accept the fact that the product is not completely stable and they are okay with that as long as the new features are worth it. Now if something is in permanent-beta-testing stage, thats a different story... Just think about this, how many times have you tried a new piece of software that was absolutely perfect and flawless from the very beginning? Hell, millions of people are using IE at this very moment...

mikeblas said:
Figure out what n is right now, and what it's expected to be in a few months, and ask what it's presumed to be through the length of the software's life.

I wish. Accurately predicting the future is about as likely to happen as requirements not changing.

mikeblas said:
I need to get it right on the first try. If I don't have 10 time units to spend on a better solution, I certainly don't have 11 time units to spend on the solution, either.

This is where I disagree. I'd do the quick solution and get it out the door. For all you know, that 5x slowness may not even matter in terms of the big picture / whole product. And if it does, you can go back and do it properly. The difference is that now you know for whether or not the quick solution is "good enough" and you can accurately predict the performance improvement (in terms of real time) of switching over to the new algorithm.

mikeblas · Aug 22, 2008

wuggle said:
I wish. Accurately predicting the future is about as likely to happen as requirements not changing.

Perfect! You know that n[now] is not equal to n[later]. So, is n[now] > n[later], or is n[now] < n[later]?

wuggle said:
This is where I disagree. I'd do the quick solution and get it out the door. For all you know, that 5x slowness may not even matter in terms of the big picture / whole product.

For all you know, I'm senior enough to be in control of the whole product to see what the big picture is, have convincing estimates of the requirements as we go towards the future, and also know that we're more interested in shipping something fast now than rewriting it later.

And I'm not being flippant (well, not really): the point is that the more experience you have, the better feel you have for knowing which solution is better. If n = customer count, or n = hits per day, or n = something else tied to making money, it's really going to go down?

Certainly, there are times when getting something working is more important than getting it working right. But rare in my long career have been the times where robust software was less desirable than fast software, or even less desirable than demoware.

Syntax and efficiency question

Limp Gawd

Limp Gawd

Mangina Full of Sand

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

Weaksauce

[H]ard|DCer of the Month - May 2006

2[H]4U

[H]ard|DCer of the Month - May 2006

2[H]4U

[H]ard|DCer of the Month - May 2006

2[H]4U

[H]ard|DCer of the Month - May 2006

2[H]4U

i'm so nice

Limp Gawd

Weaksauce

[H]ard|DCer of the Month - May 2006

Weaksauce

i'm so nice

[H]ard|DCer of the Month - May 2006

Weaksauce

Limp Gawd

2[H]4U

[H]ard|DCer of the Month - May 2006

Weaksauce

[H]ard|DCer of the Month - May 2006