Ajax / PHP / MySQL question

AngryJim · Oct 19, 2009

I'm designing a site based around movies, and I have a search box with the Google style Ajax suggestion built in. It pulls the movie title from a database called "movies" as users type. The SQL search query uses "like %$title%" so it's searching on both the front and back end of the entered text, so typing "The" will also return "Indiana Jones And The Temple Of Doom", not sure if that's significant. The database "movies" also has tons of other data in it besides just the title, such as movie_id, year, genre, synopsis, etc. etc.

Would it be faster to create a second table called "movie_titles" that was nothing but the titles for the Ajax to search through, or would it not really make a noticeable difference in speed?

nameless_centurian · Oct 19, 2009

I have never written a database package, so others are more qualified to answer... but I am thinking that what you are suggesting wouldn't make much of a difference for searching. I would think it would make more of a difference for inserting and deleting, but this is all assuming that you are indexing that field.

My guess is that the speed increase for the inserts and deletes would come because you've got to maintain proper references to the data associated with a movie. This may slow down the indexing process on the field.

Otherwise, I assume it would make little difference.

*disclaimer* As stated, this is all guesswork. Someone who actually knows should free very free to rebutt me.

keenan · Oct 19, 2009

It might make a difference because this query is not indexable, so the whole table has to be scanned to find matches. That means loading every row into memory, even if you only want the one column. If the table fits in memory and is cached though it shouldn't make a noticeable difference.

You'll get a much bigger gain going to a more intelligent (indexable) search algorithm.

nameless_centurian · Oct 19, 2009

Why can't you index it? You're not indexing the QUERY, you're indexing the TABLE.

keenan · Oct 19, 2009

nameless_centurian said:
Why can't you index it? You're not indexing the QUERY, you're indexing the TABLE.

You can't use the index on a text field when the query contains a text search with '%' at the start of the string. It's pretty logical if you understand how the index is actually constructed and used internally, but there's also a note about it in the manual. Note also that if the '%' is in the middle of the string somewhere, all rows that match the part before the '%' must be retrieved and the remainder searched with an unindexed scan. See: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Mcot · Oct 20, 2009

AngryJim said:
The database "movies" also has tons of other data in it besides just the title, such as movie_id, year, genre, synopsis, etc. etc.

Would it be faster to create a second table called "movie_titles" that was nothing but the titles for the Ajax to search through, or would it not really make a noticeable difference in speed?

Seems like your db design might be a little inefficient. How exactly is everything laid out? From what I can make of it the query should only be on the movies column and the other columns in the table should not matter.

Now that I see how the suggest feature works I am not so sure I like it. It seems rather inefficient to constantly query using '%' for each letter typed.

nameless_centurian · Oct 20, 2009

keenan said:
You can't use the index on a text field when the query contains a text search with '%' at the start of the string. It's pretty logical if you understand how the index is actually constructed and used internally, but there's also a note about it in the manual. Note also that if the '%' is in the middle of the string somewhere, all rows that match the part before the '%' must be retrieved and the remainder searched with an unindexed scan. See: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

neat post, thanks.

i don't think, however, that the mysql designers did it right. I think you could build a suffix array on top of a B-Tree, and such an index would be possible.

AngryJim · Oct 20, 2009

here's how a snippet of the actual code looks:

Code:

if(strlen($querystring) > 2)
	{
	$query = $dbc->query("SELECT DISTINCT title FROM movies WHERE title LIKE '%$querystring%' AND movie_id IS NOT NULL ORDER BY title ASC LIMIT 7");
	}

And I just noticed a typo in my original post. "movies" is not a database, just a table in the database.

EDIT: Ok I just went ahead and tried it. The "movie_titles" table with just the title and movie_id is around 6% the size of the "movies" table, and everything is very noticeably snappier.

nameless_centurian · Oct 20, 2009

awesome! thanks for posting your results!

any chance on quantifying them?

AngryJim · Oct 20, 2009

I'm not quite sure how, at least with Ajax anyway. I don't know the first thing about Javascript, had to get the auto suggest working with an online tutorial I didn't really understand.

nameless_centurian · Oct 20, 2009

well, if you get the time, here's how i'd do it.

1) report the number of records in the table
2) report the number of columns, and their types.
3) if it's a variable-length type, give a rough estimate of the average length

4) don't go through AJAX, use the layer closest to the database. Do about 10,000 queries under each scenario. If there's no noticeable difference in timing, try 100,000 queries. use the same queries on each scenario. to build a query set, I'd either use the entire set of movie titles, or if it's too large, some subset of it. I'd do 25% or so of your 10,000 queries with valid titles. for the other 75%, just rearrange the words of the 25% so that it they are invalid movies, but get some matches on word length. again, save the queries so that you can use the same exact queries on both scenarios.

5) report the time.

keenan · Oct 20, 2009

Use EXPLAIN at the mysql console to get more information on query planning and timing.

i don't think, however, that the mysql designers did it right. I think you could build a suffix array on top of a B-Tree, and such an index would be possible.

Possible, yes, but it would double the size of the index as well so should be an optional thing and still doesn't help at all with %s at both ends of the LIKE string at the same time. You can hack support for % at the front with an index on REVERSE(string) and modifying the query slightly to query the text 'backwards' using a comparison to REVERSE(string) instead. At least, I think MySQL supports indexes on the values of functions...

The 'right way' of course would be to use some kind of full text search, either implemented manually or using the included full text search. If the indexed scan on the denormalized table gives satisfactory performance though it's probably fine since the data set isn't likely to grow very quickly (it seems, anyway). It won't scale well though.

nameless_centurian · Oct 21, 2009

keenan said:
Use EXPLAIN at the mysql console to get more information on query planning and timing.

Possible, yes, but it would double the size of the index as well so should be an optional thing and still doesn't help at all with %s at both ends of the LIKE string at the same time. You can hack support for % at the front with an index on REVERSE(string) and modifying the query slightly to query the text 'backwards' using a comparison to REVERSE(string) instead. At least, I think MySQL supports indexes on the values of functions...

The 'right way' of course would be to use some kind of full text search, either implemented manually or using the included full text search. If the indexed scan on the denormalized table gives satisfactory performance though it's probably fine since the data set isn't likely to grow very quickly (it seems, anyway). It won't scale well though.

Thanks.

calebb · Oct 21, 2009

nameless_centurian said:
neat post, thanks.

i don't think, however, that the mysql designers did it right. I think you could build a suffix array on top of a B-Tree, and such an index would be possible.

http://en.wikipedia.org/wiki/Sargable

Also, check out Full Text indexes. You'll just need to turn off noise word filtering if you want "The" to return a relevant result. http://en.wikipedia.org/wiki/Full_text_search (check for the parameter ft_stopword_file in your mysqld.conf for noise words)

If you want your search to also be very tolerant of spelling errors/typos, check out the open source algorithm http://en.wikipedia.org/wiki/Double_Metaphone - there are tons of great tutorial articles on the web for implementing this in your system.

Ajax / PHP / MySQL question

AngryJim

Limp Gawd

nameless_centurian

Gawd

keenan

2[H]4U

nameless_centurian

Gawd

keenan

2[H]4U

Mcot

Limp Gawd

nameless_centurian

Gawd

AngryJim

Limp Gawd

nameless_centurian

Gawd

AngryJim

Limp Gawd

nameless_centurian

Gawd

keenan

2[H]4U

nameless_centurian

Gawd

calebb

[H]F Junkie