Python vs PHP

cnick79 · Mar 10, 2013

When it comes to programming its all about using the right tool for the job. I dont know if theres one language that rules them all, they each serve their purpose. If i were looking to do multithreading on a website, i would have a backend program written in something faster than php or python.

mikeblas · Mar 10, 2013

tedych said:
- Most of the sorting is (should be) bound to the RDBMS - you don't sort a resultset (or any set) bunch of times per request/second.

It depends. If the sort is required for the correct processing of the desired query, then it belongs on the database server. If not, the site-wide architecture probably benefits from moving the work off the database server because the database server can only scale up and can't scale out. Having cheap worker servers do the sorting before presentation means that any number of those worker servers can be added to scale the sorting problem and reduce load on the database server. Placing the work on the database server means that the database server -- probably the most expensive machine in the herd -- has to do the work.

As you indicate, caching helps a lot. But once you've cached something, you no longer have the DBMS to sort it or filter it so the filtering or sorting of those results ends up happening the PHP code.

tedych said:
If your app needs threading, go and use another language, it's almost that simple. But PHP driving more than 70% of our web today (not sure about the numbers) is still indicative.

Popularity isn't indicative of quality or even propriety.

cnick79 said:
If i were looking to do multithreading on a website, i would have a backend program written in something faster than php or python.

And that might be a good approach, but the tradeoff you make probably ends up adding complexity -- the language you choose will almost certainly be more complicated than PHP. PHP isn't necessarily slow -- while its interpreted, several compilers and trnaslators exist for it. These tools make it easire to apply than changing languages.

Fort_Major · Mar 10, 2013

tedych said:
PHP is normally not used for heavy server side lifting. Most of its time is spent in sql queries, and other external stuff, building response, etc. Server side is meant to service multiple requests at a time. This scales well because each request is normally serviced by a different thread or process (CGI). Unless you have only one or two users of your web app, requiring all the server power at a given time .
PHP and threading?! - you'd ask for too much from all those poor php 'experts' out there . If a great need for this arises in the future, no doubt they'll implement it somehow.

If 80% of your tiime is spend in the SQL Queries than you may need to optimize your database, write better queries and etc.

Fort_Major · Mar 10, 2013

tedych said:
If your app needs threading, go and use another language, it's almost that simple. But PHP driving more than 70% of our web today (not sure about the numbers) is still indicative.

This is quite-possibly a classic example of made-up statistics.

Fort_Major · Mar 10, 2013

cnick79 said:
When it comes to programming its all about using the right tool for the job. I dont know if theres one language that rules them all, they each serve their purpose. If i were looking to do multithreading on a website, i would have a backend program written in something faster than php or python.

That's the beauty about some languages. They are making doing this quite transparent so you don't have to spend 2 million dollars rewriting your application in a completely different language.

tedych · Mar 10, 2013

Once I've cached something, I no longer need to sort it again (in PHP if we go that way). I sort it only once in a while (the timeout of the cache). That was the point when I involved caching in the first place.
For the majority of web apps/sites on the web, it seems PHP is up to the job. Big web sites rarely use only one technology anyway. I for once needed easy PDF routines and used ColdFusion for that on the same site that was mainly using php.

As to popularity - I agree. But even the worst language had to have something to support such high popularity for almost two decades. Yes, PHP is slowly advancing, but the most important is - it just delivers for what it was made. Popularity brings even more benefits, I will not dive into this now.

As to cnick79's approach and complexity - yes, the language he chooses would be more complicated, that was the point - to support at least threading. He was talking about the back-end that will support threading, and it can be plugged into the PHP script, called from it to do the threaded task. And not the entire backend to be implemented in the other language (why not, it all depends on the situation).

"If 80% of your tiime is spend in the SQL Queries than you may need to optimize your database, write better queries and etc. "
- Not necessarily the case. Bunch of scripts' job is to fetch and present DB data, reports etc. A query may run for 100msec, the script itself could just move the data to the html template (~1-5 ms)... You cannot tell 80% is good or bad - that's made-up statistics. I think you are missing the point.

"This is quite-possibly a classic example of made-up statistics. "
- I don't know the exact numbers, in any way I read somewhere it was more than 50%, but 60 or 70, I cannot remember.

toast0 · Mar 12, 2013

I've done a lot of PHP, and have been doing Erlang for the last year or so... I don't think bolting threading onto PHP would be a big win.... that said, I found this, and am frightened.

Most of my PHP has been fetch data from here, here, and here, and smoosh it into a webpage (or, for webservices, a serialized array that someone else will smoosh). multi_curl lets you do multiple http requests more or less simultaneously, if you can hide your DB calls behind a webservice interface (I was usually able to do this), you can get some of them simultaneously too, otherwise I guess you'd have to use threading. In my experience, I usually had enough simultaneous user requests that trying to do the CPU intensive stuff in parallel vs in sequence wouldn't really help (especially if it means going from easy to understand PHP to pthreads).

Re: sort in DB vs sort in PHP; at least with MySQL 4, there were a lot of sorting you could do in MySQL that would end up with a temporary table which kills performance, better to do a faster unsorted query and sort in PHP (webservers are usually cheaper to scale than db servers). Does an ORM know how to do that?

Fort_Major · Mar 13, 2013

toast0 said:
I've done a lot of PHP, and have been doing Erlang for the last year or so... I don't think bolting threading onto PHP would be a big win.... that said, I found this, and am frightened.

Yeah, that's quite horrible. Most of the "Multi-Threading" that happens per web requests actually are not "Threads". They are "Fibers" (20-100 fibers to a single thread, iirc) and transparent "Worker Pools". Meaning you say I want to multi-thread these 3 database requests and the server finds a sleeping thread to throw it on. If it can't it waits, if it does you get multi-threaded - win. But it's all transparent - you don't do any7thiong except say prefix the methodcall with async.

toast0 said:
Re: sort in DB vs sort in PHP; at least with MySQL 4, there were a lot of sorting you could do in MySQL that would end up with a temporary table which kills performance, better to do a faster unsorted query and sort in PHP (webservers are usually cheaper to scale than db servers). Does an ORM know how to do that?

This is probably more a problem with MySQL and maybe the DBA (for not knowing how to create indices (which provide optimization hints)). In MS SQL/Oracle you can create indices for tables that provide hints to how data will be queried / sorted.

E.g. 2 million rows belonging to parents. maybe 20,000 items per parent. If you know common querying patterns are by parent -> date, by parent -> entry , by date -> entry - then you create indices on those columns/combos.

However, as I stated earlier, for huge projects with massive data sets with numerous inter-relationships you can't just slap an ORMs at it as most ORMS will give you horrible queries.

Caching at the Web-App Level "really" in theory "should" almost NEVER happen. If your PHP Application can sort faster than your database then you have a much bigger problem.

nry · Mar 14, 2013

toast0 said:
I don't have a good reference for this, but please please please avoid mixing of html and php by going in and out of php; it's best to just be in PHP all the time, and use echo (with heredocs for large blocks of html).

Sorry but that's some of the worst advice I have seen in some time. If I picked up someones project which was written like that I think I would break down and cry

Anyway I am a PHP developer, been doing this for almost 4 years now and have to agree with the comments previously where a so called 'developer' creates a for loop and classes themselves as a professional. I have picked up projects by previous 'professionals' and finding code duplication so bad in places that the same code was repeated 5+ times in one project!
These projects I picked up on had no design pattern, was all procedural which made them impossible to maintain to an outside developer. So comments about no ORM or procedural style I find very hard to relate to. If a new developer comes into a project there is no standard design pattern for them to follow.

And if you want to see how not to write PHP, look at WordPress. In my opinion that should be deleted from the internet it's so badly written!

As for PHP being slower, I have built full MVC applications which are 10000s lines long and load in the blink of an eye. Which proper Opcode caching/PHP-FPM/Nginx/DB Caching it's actually fairly easy to achieve this if you know what your doing...

As for the Op's question:
If you want easy go with PHP
I'd ignore Phython, if you want to look in languages growing in demand look and Ruby (Ruby on Rails)

mikeblas · Mar 14, 2013

Fort_Major said:
Yeah, that's quite horrible. Most of the "Multi-Threading" that happens per web requests actually are not "Threads". They are "Fibers" (20-100 fibers to a single thread, iirc) and transparent "Worker Pools". Meaning you say I want to multi-thread these 3 database requests and the server finds a sleeping thread to throw it on. If it can't it waits, if it does you get multi-threaded - win. But it's all transparent - you don't do any7thiong except say prefix the methodcall with async.

There's on fixed relationship between fibers and threads -- maybe you're thinking of some specific library or implementation, but fibers are just a description of thread-like constructs that are cooperatively multitasking instead of preemptively multitasking. That is, fibers run until they end or explicitly yield while threads run until their scheduler decides they're to be suspended.

Indeed, you can have worker pools as a simple way to get concurrency, but PHP does not support even this construct -- nor does it support asynchronous functions.

The lack of these features gimps PHP for modern applications which require multiple database requests, multiple back-end requests, multiple second-site requests (either to local servers or cloud services), and so on. To those experienced with only small applications it's easy to understand that such things seem too complicated an unnecessary. For larger applications, such constructs are fundamental tools to build architectures that run well at high scale.

Fort_Major said:
Caching at the Web-App Level "really" in theory "should" almost NEVER happen. If your PHP Application can sort faster than your database then you have a much bigger problem.

The point isn't which sorts faster; the goal is to consider system-wide performance and scalability.

Say that you've got a database server with 16 gigs of memory. You do a request that takes a megabyte of data, sorts it, and returns the result set. If you do this sorting work on the database server, you can only ever have 16,000 such requests in flight at any moment. Really, you'll have far fewer since you've got to have the OS and the DBMS in that memory, too. And you don't want to be using that memory for sorting; you'd rather be using it for page cache so the DBMS is less likely to go to disk. If you need more memory to handle more requests, you've got to take the databaese server down, buy more expensive memory for it, and get it up again. Your performance sucks since you're polishing all the memory with your sort requests and doing physical I/O all the time because you've got little page cache left over. (Or, there's lots of latency as the database query scheduler waits queries which require that memory grant so that it can preserve the page cache memory.)

But what if you have 16 PHP servers, each with 4 gigs of memory? If they were doing the sorting, then you'd have room for 4000 * 16 == 64,000 requests to be sorted at a maximum. Sure, you'd have less -- the OS and PHP itself are there, your request script, and so on. The memory on these machines is not nearly as precious as the memory on the database server, though. The request that the database server sees doesn't include an ORDER BY; it just buffers and returns a page worth of data at a time and doesn't use memory for sorting so it isn't blocking requests or diminishing its page cache. The PHP script owns the sorting. If the PHP servers look like they're starting to need more memory, its easier to upgrade them or add more servers and since they're underneath their load balancer that work can be done without taking the site down.

Fort_Major · Mar 14, 2013

mikeblas said:
There's on fixed relationship between fibers and threads -- maybe you're thinking of some specific library or implementation, but fibers are just a description of thread-like constructs that are cooperatively multitasking instead of preemptively multitasking. That is, fibers run until they end or explicitly yield while threads run until their scheduler decides they're to be suspended.

What I'm talking about is technology that is fully implemented and supported in Microsoft Windows, IIS and .NET 4.5 and ASP.NET MVC 4. It exists, and I know of no other framework + web server + OS that has optimized the whole stack from top to bottom to not only better support this but make it be very transparent. (e.g. You have to make almost no changes to your code)

mikeblas said:
Indeed, you can have worker pools as a simple way to get concurrency, but PHP does not support even this construct -- nor does it support asynchronous functions.

The lack of these features gimps PHP for modern applications which require multiple database requests, multiple back-end requests, multiple second-site requests (either to local servers or cloud services), and so on. To those experienced with only small applications it's easy to understand that such things seem too complicated an unnecessary. For larger applications, such constructs are fundamental tools to build architectures that run well at high scale.

The point isn't which sorts faster; the goal is to consider system-wide performance and scalability.

Say that you've got a database server with 16 gigs of memory. You do a request that takes a megabyte of data, sorts it, and returns the result set. If you do this sorting work on the database server, you can only ever have 16,000 such requests in flight at any moment. Really, you'll have far fewer since you've got to have the OS and the DBMS in that memory, too. And you don't want to be using that memory for sorting; you'd rather be using it for page cache so the DBMS is less likely to go to disk. If you need more memory to handle more requests, you've got to take the databaese server down, buy more expensive memory for it, and get it up again. Your performance sucks since you're polishing all the memory with your sort requests and doing physical I/O all the time because you've got little page cache left over. (Or, there's lots of latency as the database query scheduler waits queries which require that memory grant so that it can preserve the page cache memory.)

But what if you have 16 PHP servers, each with 4 gigs of memory? If they were doing the sorting, then you'd have room for 4000 * 16 == 64,000 requests to be sorted at a maximum. Sure, you'd have less -- the OS and PHP itself are there, your request script, and so on. The memory on these machines is not nearly as precious as the memory on the database server, though. The request that the database server sees doesn't include an ORDER BY; it just buffers and returns a page worth of data at a time and doesn't use memory for sorting so it isn't blocking requests or diminishing its page cache. The PHP script owns the sorting. If the PHP servers look like they're starting to need more memory, its easier to upgrade them or add more servers and since they're underneath their load balancer that work can be done without taking the site down.

That's not how a proper robust database server generally works. A real database coupled with a DBA who understands how data is not only being queried but being sorted/filtered will create indices. Which the database server will then use to return sorted results anywhere from 500-10000 times FASTER than you could sort them NATIVELY.

Hence, why I stated that part of the problem is "the choice of DB" and the other part of the problem is the person who created the database.

ShoeLace · Mar 15, 2013

What I'm talking about is technology that is fully implemented and supported in Microsoft Windows, IIS and .NET 4.5 and ASP.NET MVC 4. It exists, and I know of no other framework + web server + OS that has optimized the whole stack from top to bottom to not only better support this but make it be very transparent. (e.g. You have to make almost no changes to your code)

Go is one language example that has pretty much transparent concurrency in the form of "goroutines". You just prepend a function call with "go foo()" rather than "foo()" to run it. If you need to communicate with it, then you send messages back and forth over a channel. Most of the popular libs like the http and major db drivers abstract that away too. You just use it as if it were normal sequential code and it works concurrently.

Javascript is also optimized from top to bottom to run concurrent code, but it uses the event loop pattern instead of message passing with coroutines.

Dogs · Mar 15, 2013

Fort_Major said:
That's not how a proper robust database server generally works. A real database coupled with a DBA who understands how data is not only being queried but being sorted/filtered will create indices. Which the database server will then use to return sorted results anywhere from 500-10000 times FASTER than you could sort them NATIVELY.

Hence, why I stated that part of the problem is "the choice of DB" and the other part of the problem is the person who created the database.

Indexes or no indexes, everything he said still applies. Since we're talking about scale here, let's also keep in mind the drawbacks of using indexes: Maybe spatial constraints make it difficult/impractical to index every possible field.

mikeblas · Mar 15, 2013

Fort_Major said:
That's not how a proper robust database server generally works. A real database coupled with a DBA who understands how data is not only being queried but being sorted/filtered will create indices. Which the database server will then use to return sorted results anywhere from 500-10000 times FASTER than you could sort them NATIVELY. [/QUOITE]Indexes are better used to satisfy selection predicates and not to implement sorts. Great if they can do both, but the prior has priority for obvious reasons.

Just work through a trivial example to see why: you've got the classic orders table and customer tables:

Code:

CREATE TABLE Customers ( CustomerID INT NOT NULL, CustomerName VARCHAR(80) NOT NULL );
CREATE TABLE Orders ( OrderID INT NOT NULL, CustomerID INT NOT NULL, OrderDate DATETIME NOT NULL, CostCents BIGINT NOT NULL );

Customers gets an index to implement its PK (on CustomerID), and Orders gets an index for its PK, too:

Code:

CREATE UNIQUE CLUSTERED INDEX Customers_PK ON Customers ( CustomerID );
CREATE UNIQUE CLUSTERED INDEX Orders_PK ON Orders ( OrderID );

Could do these with constraints, of course; and we could even declare an FK constraint between Orders.CustomerID and Customers.CustomerID.

An interesting query would be to see the orders placed since March 1st, listing the customers alphabetically and then by order date newest first:

Code:

  SELECT C.CustomerName, Orders.OrderID, Orders.OrderDate
    FROM Customers C
    JOIN Orders O
      ON O.CustomerID = C.CustomerID
   WHERE O.OrderDate >= '2013-03-01'
ORDER BY C.CustomerName ASC, Orders.OrderDate DESC;

Let's suppose, though, that we've been in business for 25 years and have 30 million rows in the Customers table. And we've been doing well for 25 years, too! We have more than 200 million rows in the Orders table. These aren't unrealistic numbers, of course; 25 years is about 9000 days, so that means about 25,000 orders per day. On teh internets, that's actually kind of slow!

Given the physical schema I've described, the most important thing to do is find orders by date. Otherwise, we're going to scan (and lock!) the whole table of 200 million rows to find the orders with dates we want. So, let's create an index over date:

Code:

CREATE INDEX Orders_DateAK ON Orders ( OrderDate );

That's a lot better; now the query processor can emit a plan that filters out the orders by date before trying to resolve the join between the big Customers table and the very large Orders table. That join means we're looking at rows in Orders and finding the matching customer ID. The most appropriate execution pattern is getting one row from the output of the filter finding dates, then seeking to the clustered index to look up the customer's name.

The ORDER BY mandates a sort after that operation completes. Sorts are start-and-stop operations; they require all the rows be present at their input before they start sorting, since any row could be in any position after the sort.

Point is, tough, that there's no index that helps this query do its sort. If we add an index over the CustomerName field on the Customers table, like this:

Code:

CREATE INDEX Customers_CustomerNameAK ON Customers ( CustomerName );

we have the index. But it isn't useful in the execution plan necessary to satisfy the join. If we force the query processor to use it, it will end up reading all 20 million ordered rows from that index to build a hash table on the CustomerID. Then, it can execute the filter over the date getting the orderID and orderDate, plus the CustomerID as a key to probe the hash. This is a horrible execution plan because it means reading the whole Customers table and keeping it in memory -- that's probably more memory than the sort would use, since the sort only sees the filtered rows. On the other hand, the sort sees the rows amplified by the join, so maybe there's more, but it won't likely match the cardinality of the whole table.

If the QP decides to use that table just to avoid the CPU charge for the sort, the I/O load and the memory load go soaring, then. We'd like the database server to do as little work as possible. Of course, necessary work is necessary work, but efficiency means we're only doing the needed work and not extraneous work. Using an index to order the result of a join results in lots more work.

If we have the server do the best possible plan, it ends up doing a sort. But it needs memory and CPU to implement that sort. We can have the database server not do any sort, and simply return the matching rows -- if we distribute the sorting work to other servers, we end up with a more scaleable solution.

There are other ways to skin the cat, of course. Caching could help, but since the results of this query change so frequently, it's not of great value. 25,000 orders per day means the result of this query change at least every 3.5 seconds. The order process would have to invalidate the cache each time it adds an order. Maybe we could make a process that builds a new pool of results by processing the list appropriately (inserting the new order into the existing ordered list, deleting records that we don't need). That's a lot harder than just moving the sort to the PHP servers, though it's quite possible portions of both solution are necessary to meet the requirements of the project.

Since we need the date filter from the Orders table in an index, and we need the CustomerName index on another table, we can infer there's no single index which will implement the sort without blowing out the fundamental query functionality. But that situation means another alternative is to forgo caching and have the DBMS itself maintain the query results using a materialized view. Rich DBMS servers implement MVs intrinsically, so that's not too hard. Even some of those that don't (hint: MySQL) end up offering triggers which can be used to cobble together something similar.

Really, though, relying on PHP to do the sorting is an appropriate way to reduce load on the database server without adding a lot of complexity. Additional complexity might be necessary for other reasons, but getting the sort off the database server can reduce locking, memory, and CPU enough to buy enough headroom that those bigger tools aren't necessary until much later.

Fort_Major · Mar 15, 2013

Wow, multiple people fighting to turn a horrible performing server side language into performing the task of a finely tuned database, which is optimized to query/sort and cache data, by having it cache and sort data.

I bow out to your obviously more extensive experience in this arena than me.

mikeblas · Mar 16, 2013

Fort_Major said:
I bow out to your obviously more extensive experience in this arena than me.

Indeed, learning to make software systems scale isn't something that comes easily. The more you read about it the better you'll get, though there's no better teacher than experience. I'm sure you'll let us know if you have further questions.

krogen · Mar 16, 2013

I don't think spending countless hours on optimizing queries is a great way to scale... It certainly helps but there's a point where sacrificing more hours won't lead to significant performance improvements. Adding replica systems and sharding are the true solutions to them problem. If MySQL is the limit then perhaps it's time to look at different DB solutions.

mikeblas · Mar 16, 2013

Optimizing queries only takes hours if you've got a crummy data model. If you've got a crummy data model, you've got a problem in the development process that you need to work to solve first. (And you've doubtlessly got lots of other data-centric problems that are boiling for attention.) Usually, you can attack it with some education. That's not too bad, as it's an investment in people that ends up paying off repeatedly.

Presumably, the work involved in tuning anything isn't a "sacrifice" -- it too should be an investment, but this time in the product or featureset of the product. If it were an investment, then tuning that work is just a part of delivering it counts as part of the same investment. If the work is randomizing and tuning it is truly a throw away, then it probably shouldn't have been done in the first place.

All that said, replicas can indeed help, and so can sharing; but they also have downsides and repeating costs.

matrisking · Mar 19, 2013

Coldblackice said:
I've found a number of different articles on the subject through Google, but curious what [H]'ers opinion of the two is.

Do you have a preference of one over the other? Why might you use one over the other?

From what I've gathered by general feel over the past year or so, it seems that PHP is much more prevalent and in demand, but that there's a lot of hype and praise for Python. This has led me to believe that Python is gaining some significant speed, although I haven't really seen it much in job listings (compared with PHP, which is basically in any/all listings).

What kinds of programs are you looking to write?

Parmenides · Mar 19, 2013

thanks mikeblas for that post. Would common table expressions or inline views be a good solution for the problem too?

mikeblas · Mar 19, 2013

Parmenides said:
thanks mikeblas for that post. Would common table expressions or inline views be a good solution for the problem too?

Er, which problem? The issue of joining a couple tables and getting high perofrmance?

Parmenides · Mar 19, 2013

mikeblas said:
Er, which problem? The issue of joining a couple tables and getting high perofrmance?

yeah:

Code:

;WITH PreOrdered AS (
  SELECT C.CustomerName, Orders.OrderID, Orders.OrderDate
    FROM Customers C
    JOIN Orders O
      ON O.CustomerID = C.CustomerID
   WHERE O.OrderDate >= '2013-03-01')

select CustomerName, OrderID, OrderDate
FROM PreOrdered 
ORDER BY CustomerName ASC, OrderDate DESC;

Any nasty holes with that approach? I guess it's kind of like making a temp table into memory, but not quite as onimous?

toast0 · Mar 20, 2013

nry said:
Sorry but that's some of the worst advice I have seen in some time. If I picked up someones project which was written like that I think I would break down and cry

Well... seeing all the ins and outs makes me cry... especially when there are several nested blocks.

A survey of the interwebs says most people don't prefer my way though, so as with all internet advice, take mine with a grain of salt.

nry said:
Anyway I am a PHP developer, been doing this for almost 4 years now and have to agree with the comments previously where a so called 'developer' creates a for loop and classes themselves as a professional. I have picked up projects by previous 'professionals' and finding code duplication so bad in places that the same code was repeated 5+ times in one project!
These projects I picked up on had no design pattern, was all procedural which made them impossible to maintain to an outside developer. So comments about no ORM or procedural style I find very hard to relate to. If a new developer comes into a project there is no standard design pattern for them to follow.

Procedural doesn't have to mean hard to maintain. If I've learned anything from Perl, it's that OO methods are just functions with an implied first argument. As an outside developer, if I'm digging into something and I see a function call, I can grep the source and find the function (assuming developers were sensible enough not to reuse function names), if I see a method call, I need to determine which class the object is first, since I grepping for the method definition is likely to show multiple results. That's not to say the objects are always bad, but at least in PHP, I find bad procedural code is much easier to follow (and usually, to fix) than bad OO code; and PHP is certainly a language where bad code is found often. (Of course, if we all have different ideas of what bad code is, then that's part of the problem)

If people are copying code instead of making and calling functions, that's easy to fix; make a function and change the callers. If people made lots of functions that are very short and only get called once, you can inline them. When people spent too much time building OO layers for everything, it takes as much time to tear it out and get to the meat of the problem; I've seen people build a JDBC work alike in PHP, complete with the JDBC iterators, instead of the nice while ($row = mysqli_fetch_assoc($result)) ...

But we can agree on this (although I haven't looked at the code, I can tell it's true based on doing some admin for an install):

nry said:
And if you want to see how not to write PHP, look at WordPress. In my opinion that should be deleted from the internet it's so badly written!

Also, thanks mikeblas for a nice example! Another thing to consider is that adding an index makes each update more expensive too (and you may have to add the index during a scheduled downtime). I think someone mentioned sharding... if you want a sorted query over multiple shards, then you end up needing to do sorting in something that has access to all the records (could be a merge of sorted records; or could be a total sort of unordered records).

It definitely doesn't take countless hours to optimize the queries; if you start with somewhat sensible queries, and they're accessible. For many to most queries, small tweaks to the queries or the database schema will be the best solution, but for some queries letting the database do all the work isn't the best solution.

nry · Mar 20, 2013

In my last post I may have been on a little bit of a rant (bad day ha).

How do you test procedural code? With my ZF1 projects I tend to build a pretty intensive PHPUnit test suite, seems to help keep the code structured and under control.

toast0 · Mar 20, 2013

nry said:
In my last post I may have been on a little bit of a rant (bad day ha).

No problem... ranting is fun!

nry said:
How do you test procedural code? With my ZF1 projects I tend to build a pretty intensive PHPUnit test suite, seems to help keep the code structured and under control.

Well, there's that image macro about testing code; if there are no warnings and no errors in my error log, then everything is cool right? (or some library/extension closed STDERR*). I try to do more testing of the whole system, and less unit testing (there's a lot of difference of opinion here, this is just mine). You can unit test functions of course, and anything that's complex enough to want to do unit tests on can usually be put in a function.

* This actually happened to me, thankfully it was a company internal library, but it was a complete pain in the ass to track that down.

Parmenides · Mar 29, 2013

Parmenides said:
yeah:

Code:

;WITH PreOrdered AS ( SELECT C.CustomerName, Orders.OrderID, Orders.OrderDate FROM Customers C JOIN Orders O ON O.CustomerID = C.CustomerID WHERE O.OrderDate >= '2013-03-01') select CustomerName, OrderID, OrderDate FROM PreOrdered ORDER BY CustomerName ASC, OrderDate DESC;

Any nasty holes with that approach? I guess it's kind of like making a temp table into memory, but not quite as onimous?

Well, I thought of this question again and did a quick bit of reading and in answer to my own question, a CTE is essentially disposable VIEW, so there wouldn't be a performance benefit. A temp table however has the possibility of giving the performance benefit, though depending on the situation could hurt as well.

mikeblas · Mar 29, 2013

Parmenides said:
Any nasty holes with that approach? I guess it's kind of like making a temp table into memory, but not quite as onimous?

I don't think you need a temp table for this query. You don't need a CTE, either. A join with appropriate indexes is just fine. If you compare this query against the join, I think you're going to find they have the same execution plan (unless you've got some pedantically bad physical structure).

Python vs PHP

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd

Limp Gawd

Limp Gawd

Limp Gawd

Supreme [H]ardness

Limp Gawd

Limp Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd

Gawd

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

Limp Gawd

[H]ard|DCer of the Month - May 2006

[H]ard|Gawd

[H]ard|DCer of the Month - May 2006

n00b

Supreme [H]ardness

[H]ard|DCer of the Month - May 2006

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

[H]ard|DCer of the Month - May 2006