Reimplementing a Solaris command in Python gained 17x performance improvement from C

dgz

Supreme [H]ardness
Joined
Feb 15, 2010
Messages
5,838
Here's something you don't read every day. It reminded me of that YT: Jim Keller: Moore’s Law is Not Dead video where, among other interesting things, he also shared his views about abstractions and how they're not inherently bad.

This is especially hilarious to me since just yesterday my boss was ranting about his recent experience with Python and how it's literally the worst language he has ever seen. "Everything about it is an anti pattern"

The C code was largely untouched since 1988 and was around 800 lines long, it was written in an era when the number of users was fairly small and probably existed in the local files /etc/passwd or a smallish NIS server.

It turns out that the algorithm to implement the listusers is basically some simple set manipulation. With no arguments listusers just dumps a sorted list of all users in the nameservice, with the -l and -g it filters down the list of users and groups.

My rewrite of listusers in Python 3 turned out to be roughly a 10th of the number of lines of code - since Python includes native set manipulation and the C code had implemented the set operations it via linked lists.

But Python would be slower right ? Turns out it isn't and in fact for some of my datasets (that had over 100,000 users in them) it was 17 times faster. I also made sure that the Python version doesn't pull the entire nameservice into memory when it knows it is going to be filtering it based on the -l and -g options.

Source: https://blogs.oracle.com/solaris/re...hon-gained-17x-performance-improvement-from-c
 
Interesting concept

most dull and nearly useless command to demonstrate it
 
If you rewrote it in c you could probably make it as fast as the python code, though it would probably still be more lines of code (maybe not 800).

There are probably c libraries (which python no doubt utilizes at least some) that have been written to do many of the operations which were hardcoded into this program way back when.

Python is very easy and simple for such script like tasks.
 
If you rewrote it in c you could probably make it as fast as the python code, though it would probably still be more lines of code (maybe not 800).

There are probably c libraries (which python no doubt utilizes at least some) that have been written to do many of the operations which were hardcoded into this program way back when.

Python is very easy and simple for such script like tasks.
Was thinking the same thing... Largely untouched 31 year old code vs refreshed new code... Hmm is the pyton rework going to work 'largely ' untouched for 31 years? If anything i see it like statement of great things for the c language.
Otherwise wise, don't think these things are exclusionary.. the languages just are what they are.
 
Good, but if you're not going to touch the code for another 17 years, I think it would be a good idea to re-implement the whole thing in Rust.

There are two different aspects at work here. One is the time-complexity of the algorithms in use (the O-notation stuff). The new implementation, in this case, probably uses more efficient algorithms, i.e. which have a lower complexity. That does NOT mean Python as a language is faster than C - only that Python allowed the programmer an easy way to switch to more efficient algorithms.

Which leads us to the second aspect. Some programming languages are inherently faster than others even when all implementations use algorithms of the same complexity. If, for example, you re-implemented the Python program in something like Rust (a programming language that aims to be fast while eliminating the drawbacks of C), you would find it to be even faster.
 
Good, but if you're not going to touch the code for another 17 years, I think it would be a good idea to re-implement the whole thing in Rust.

There are two different aspects at work here. One is the time-complexity of the algorithms in use (the O-notation stuff). The new implementation, in this case, probably uses more efficient algorithms, i.e. which have a lower complexity. That does NOT mean Python as a language is faster than C - only that Python allowed the programmer an easy way to switch to more efficient algorithms.

Which leads us to the second aspect. Some programming languages are inherently faster than others even when all implementations use algorithms of the same complexity. If, for example, you re-implemented the Python program in something like Rust (a programming language that aims to be fast while eliminating the drawbacks of C), you would find it to be even faster.

K
 
probably not everyones experience, but the speed of languages is pretty academic. in my career the direction has always been-

"its still basically instant right? we really dont give a **** how fast it runs as long as the feature is getting tested by friday."

-or-

"you could write it in <insert new current-year hotness>, but everyone here will hate you and we will call YOU whenever it breaks"

plus no company is going to let you go back and rewrite code that already works on their time, no matter how many lines it is unless there is some $$$ tied to it.
 
If you rewrote it in c you could probably make it as fast as the python code, though it would probably still be more lines of code (maybe not 800).

There are probably c libraries (which python no doubt utilizes at least some) that have been written to do many of the operations which were hardcoded into this program way back when.

Python is very easy and simple for such script like tasks.
Was going to say. In my limited experience with Python it is just like C, just with a bunch of built-in libraries. It's more akin to C++ in that way. I'm sure any modern programmer could take the original C code and rewrite it in C and achieve the same kind of performance and resource gains as this Python code.
 
The only annoying thing about Python was that going from v1/2 to v3 they changed some syntax for whatever reason. Spacing is important too :)

It's good but with different versions native to different linux releases, kinda sucks and for scripts you need a couple versions...
 
A python script will only ever run on a single cpu core. Python's concept of multi-threading is constrained by its implementation (seriously, look up Python GIL). Python isn't something I'd consider a go-to for performance critical work.

C, on the other hand. A good implentation of pthreads is glorious when used right.

This speedup is clearly attributed to taking a completely different approach to tackling a similar problem. For the most part, the choice of language is largely irrelevant. Even the original author stated their intent with using Python was maintainability (makes sense, given that the number of Python programmers far outweigh the number of C programmers these days). The large speedup was an added bonus (although I think they are downplaying their vastly superior algorithm/approach).
 
Back
Top