SQL Order and Group By Question

TheBuzzer · Jun 29, 2010

If i do

SELECT * FROM (SELECT * FROM `styles_domestic_specials` ORDER BY `endingdate` DESC) AS T1 GROUP BY `style`

does it do the same as

SELECT * FROM `styles_domestic_specials` GROUP BY `style` ORDER BY `endingdate` DESC

It seems like group by got to be before order by. So I am wondering does this make any difference?

TheBuzzer · Jun 29, 2010

well it seems like it does make a difference, so is there a way to sort by than group by without doing 2 selects?

PTNL · Jun 29, 2010

Your inner SELECT statement is actually adding unnecessary overhead without any gained value. Use your second statement instead. Also, I hope the "SELECT * " was just for simplicity of the post, and that you're not literally using "SELECT * " in your statement.

Though I am interested in your rationale or what you were trying to achieve with the first statement.

*Edit* This thread is better positioned in the "Webmastering and Programming" forum.

TheBuzzer · Jun 29, 2010

Oh i put it in the wrong section.

Well i am trying to get the correct row.

There can be multiple same style with different colors and start and end date.
I am trying to make it grab only the latest same styles.

Sgraffite · Jun 29, 2010

Does this give you the correct rows?

Code:

SELECT *, MAX(`endingdate`) as `endingdate`
FROM `styles_domestic_specials`
GROUP BY `style`) as temp_table;

I also agree with PTNL, don't use SELECT *, choose what columns you need.

TheBuzzer · Jun 29, 2010

I just tried it. it doesn't select the right row.

TheBuzzer · Jun 29, 2010

well my first thing doesnt take sql that long but I am just trying to learn of better ways. Maybe there is no better way to sort first than group.

TheBuzzer · Jun 29, 2010

would select * be slower than selecting all the cells? I need all the cells in that table.

PTNL · Jun 29, 2010

TheBuzzer said:
well my first thing doesnt take sql that long but I am just trying to learn of better ways. Maybe there is no better way to sort first than group.

From what you've described so far, Sgraffite's suggestion would yield the correct results (barring some some typo, of course). I suspect you're wanting something slightly/entirely different. Could you provide a better explanation, or perhaps split out the query a bit into the relevant parts with a description?

TheBuzzer said:
would select * be slower than selecting all the cells? I need all the cells in that table.

Listing each column individually in your SQL statement is much faster performing. Additionally, it greatly improves the readability of your SQL statement.

keenan · Jun 29, 2010

TheBuzzer said:
would select * be slower than selecting all the cells? I need all the cells in that table.

It's always good style to state your actual intent, thus naming the columns you actually intend to use. As far as performance is concerned, it won't make a notable difference unless you're returning a large number of rows and are selecting more columns that you actually need.

The other reason is that basically any other DBMS other than MySQL doesn't let you abuse GROUP BY in this nonsensical manner. For queries with GROUP BY to make sense, every non-grouped field must be aggregated somehow (which you can't do when you SELECT *). It's the only way GROUP BY makes sense, and probably why your query is not behaving as you want it to.

You will probably need to do something along the lines of, though it may be possible to beat

Code:

SELECT * FROM styles_domestic_specials AS outer 
WHERE 
  style_pk = (SELECT style_pk FROM styles_domestic_specials WHERE style=outer.style LIMIT 1);

TheBuzzer · Jun 29, 2010

I tried Sgraffite, it had the max date but the row it group was still the first row of that style.

keenan you are doing two selects which is what my code does so it isnt good.I was trying to simplify it to one select.

the date is the right date however, the color_code in the table it chose the wrong one.

TheBuzzer · Jun 29, 2010

here is a test table to try

Code:

CREATE TABLE IF NOT EXISTS `styles_domestic_specials` (
  `style` varchar(20) NOT NULL,
  `basestyle` varchar(20) NOT NULL,
  `colorcodes` mediumtext NOT NULL,
  `price` double NOT NULL,
  `startingdate` datetime NOT NULL,
  `endingdate` datetime NOT NULL,
  UNIQUE KEY `style` (`style`,`startingdate`,`endingdate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

INSERT INTO `styles_domestic_specials` (`style`, `basestyle`, `colorcodes`, `price`, `startingdate`, `endingdate`) VALUES
('81-398S', '81-398', '001,002,016', 0.99, '2010-05-03 00:00:00', '2010-06-01 00:00:00'),
('81-398S', '81-398', '002,003,016', 0.99, '2010-06-17 00:00:00', '2010-07-03 00:00:00'),
('81-340S', '81-340', '001,002,016', 0.99, '2010-06-03 00:00:00', '2010-07-01 00:00:00'),
('81-340S', '81-340', '002,003,016', 0.99, '2010-08-17 00:00:00', '2010-09-03 00:00:00');

if you try the different sql commands posted here it doesnt come out with same result.

keenan · Jun 29, 2010

TheBuzzer said:
keenan you are doing two selects which is what my code does so it isnt good.I was trying to simplify it to one select.

What makes you think that? It's a perfectly good approach, and often more clear than other techniques. I'm not even sure there's another way to do it, though it is probably possible with a self join. Don't prematurely optimize. Any query planner, even MySQL's, should be able to do a good job with this simple query.

You also need a proper primary key for this table, and you should probably normalize it a bit better.

Here's my amended query that works with your actual schema:

Code:

SELECT * FROM styles_domestic_specials AS o
WHERE
  id = (SELECT id FROM styles_domestic_specials WHERE style=o.style ORDER BY endingdate DESC LIMIT 1);

Note that this is quite different from your original subselect query which is going to be inefficient and just doesn't make sense because of your attempt to use group by to do something it's not designed to do.

TheBuzzer · Jun 30, 2010

well doing multiple queues is slower than just one queue. sure it isnt that much slower but still it is slower. I was just trying to see if there was any way to do it in one queue.

sure your method works too. But that is still 2 selects. I wanted to see was it possible with just once select.

keenan · Jun 30, 2010

TheBuzzer said:
well doing multiple queues is slower than just one queue. sure it isnt that much slower but still it is slower.

No it's not, that's not even a valid rule of thumb, let alone a fact. One poorly designed query can easily be far slower than using subqueries to accomplish the same task. Don't prematurely optimize, especially when you don't understand where the performance penalties are. Modern RDBMS' have fairly intelligent query planners, and while I know MySQL only recently got subselects so is probably poorly optimized for them, this is a simple case it should be able to do efficiently.

There's absolutely nothing wrong with using subselects if they are appropriate for your problem, but you do need to understand how to construct them so the DBMS can optimize away most of what would otherwise make them inefficient. Either way, write queries that make sense to you and then run them on actual data of a size similar to what you'll see in production and see what the DBMS query planner is doing and if that makes sense, and adjust from there if necessary.

Of course I'm not talking about doing multiple actual queries (e.g. to fake a join with a for loop), which is a whole different animal and should be avoided if a sensible alternative exists in SQL.

sure your method works too. But that is still 2 selects. I wanted to see was it possible with just once select.

I don't think so. Maybe with a self join, but I'm not a fan of doing things this way, and it should (assuming the RDBMS isn't braindead and you implement the same basic logic) result in the same query plan, and thus the same performance.

What you have to understand when working with SQL is that it's not like C or PHP or whatever, where every instruction is executed every time it appears. SQL is declarative - you declare the relationship of the data you want and it will figure out how pull it out of the database. You can't think of a query as a series of instructions, it's just a description of the data you're looking for, and the DBMS will look at the whole query and figure out how to extract that data efficiently. Multiple different ways to describe the same relationship (e.g. a JOIN vs. a subselect) will ideally result in the exact same query plan, and the DBMS executing the query the same way, even though the description is quite different.

PTNL · Jun 30, 2010

keenan's and sgraffite are on the right track with their suggestion. I just haven't yet seen anything definitive from the OP that suggests one suggested approach really trumps the other. But from what concrete details that have been shared, I'm leaning more toward keenan's suggestion.

TheBuzzer said:
well doing multiple queues is slower than just one queue. sure it isnt that much slower but still it is slower. I was just trying to see if there was any way to do it in one queue.

sure your method works too. But that is still 2 selects. I wanted to see was it possible with just once select.

One basic difference between your subselect and keenan's proposed subselect is that keenan's actually adds value by filtering the possible resultset being examined by the query execution. And if what you're reporting back is accurate that keenan's suggestion isn't grabbing the right first row based upon some max/min value, then why not adjust the ORDER BY clause in the subselect statement?
(Admittedly, I'm not as familiar with MySQL so if there's a better or more proper way for MySQL then please share the syntax.)

SQL Order and Group By Question

TheBuzzer

HACK THE WORLD!

TheBuzzer

HACK THE WORLD!

PTNL

Supreme [H]ardness

TheBuzzer

HACK THE WORLD!

Sgraffite

Supreme [H]ardness

TheBuzzer

HACK THE WORLD!

TheBuzzer

HACK THE WORLD!

TheBuzzer

HACK THE WORLD!

PTNL

Supreme [H]ardness

keenan

2[H]4U

TheBuzzer

HACK THE WORLD!

TheBuzzer

HACK THE WORLD!

keenan

2[H]4U

TheBuzzer

HACK THE WORLD!

keenan

2[H]4U

PTNL

Supreme [H]ardness