• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

PostgreSQL Full text search question.

Mcot

Limp Gawd
Joined
Jul 21, 2007
Messages
131
I have a postgres database that has a few tables with columns of type TEXT. These columns store unstructured text.

I want to be able to search the text columns for specific things and also be able to do full text searches. After reading the documentation on the postgres website I can successfully perform ft searches on the columns, but I am having trouble pulling out specific things.

For example lets say in my unstructured text I have some URL's. According to this document http://www.postgresql.org/docs/8.4/interactive/textsearch-parsers.html the default full text parser for postgres will specifically parse URL's. But how do I perform a query that says, "For this table, give me every row where the text column contains a URL.?" I have a feeling I should just use regular expressions for this, but the ft parser is already parsing URL's so I was wondering if there was a way to sort of use it for that.

Also, what if I want to pull out something that the default parser does not look for? I can't seem to find a lot of information about how to extended the parser and add new items.

I know I am trying to use the ft search for something that its not EXACTLY intended for, but if it works it would make my queries a lot easier.
 
I'm familiar with the full text search of PostgreSQL, but from the examples in the manual it would seem the parser returns a table. If you would use subselects to extract the URLs, then you can most likely check whether it the column that contains subselect results has the value you want. There will be a performance penalty using subselects this way, but it could work.

Posting the query you have so far would help.

If nobody else gives you any more hints, I suggest you ask this on the PostgreSQL novice mailing list, you'll probably get a good answer there, almost certainly faster than here.
 
Hi, thanks for the response. I found that i can do what I want to do with regex queries and it works OK, so I think I am going to stay with those unless I see a huge performance problem. I have been asking around on the postgres IRC channel and no one seems to know a whole lot about full text search.

The documentation on the postgres site as a whole is awesome, but I find the fts documentation to be a little less friendly than the rest. I will try the mailing lists through, that's a good suggestion.
 
The documentation on the postgres site as a whole is awesome, but I find the fts documentation to be a little less friendly than the rest. I will try the mailing lists through, that's a good suggestion.

FTS in Postgres is a recent addition to the official distribution (since 8.3); it used to be included as a contrib module called 'tsearch2', and appears to have changed very little. Knowing that might help you find more information, but I'm afraid I don't have an answer to your question. It seems like it should be possible.
 
Code:
SELECT alias, description, token FROM ts_debug('foo-bar-beta1');
      alias      |               description                |     token     
-----------------+------------------------------------------+---------------
 numhword        | Hyphenated word, letters and digits      | foo-bar-beta1
 hword_asciipart | Hyphenated word part, all ASCII          | foo
 blank           | Space symbols                            | -
 hword_asciipart | Hyphenated word part, all ASCII          | bar
 blank           | Space symbols                            | -
 hword_numpart   | Hyphenated word part, letters and digits | beta1

The above query is the only query that I can find that returns the actual alias that the particular token matches.

ts_debug is obviously supposed to be used for debugging.

From a high level perspective I want to return all urls in a column, so I want to use the full text parser to create tokens for me given a column, and then return all the tokens which matched the url alias.
 
Last edited:
Back
Top