• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Regular Expressions

pr0pensity

[H]ard|Gawd
Joined
Sep 2, 2003
Messages
1,738
I'm reading up on this, but I don't quite get the syntax. How would I parse [b] [/b] tags or replace all ascii characters from 0 to 31 with a space?
 
Here's some examples in perl.

To replace anything in HTML angle brackets, one would use the following command, which removes anything in between and including < >

$thisLine =~ s/<[^>]*>//g;


To replace any ascii char that is not a letter or a number (thereby filtering out ascii 0x0 to 0x30, use the following, which replaces anything that is not a letter or number with a space.

$thisLine =~ s/[^0-9a-zA-Z]/ /g;

Please note that in the first example, if the html <tags> span multiple lines, then it won't work properly. Be creative with loops.
 
While generally adequate, REs aren't really up to the job of parsing HTML (or HTML-like markup such as vB-code) where you can legally nest things to arbitrary depths. Basic CS stuff - REs are not powerful enough to handle CFGs.
 
(?<=<b>).*?(?=</b>) for html bold tags

(?<=\[b\]).*?(?=\[/b\]) for what was in original posting

will find the 'b' tags.
 
^^ that only finds the tags, it doesn't replace them. then again, the OP never specified what language was being used
 
ameoba said:
While generally adequate, REs aren't really up to the job of parsing HTML (or HTML-like markup such as vB-code) where you can legally nest things to arbitrary depths. Basic CS stuff - REs are not powerful enough to handle CFGs.

What would I use?
 
A real parser. Although most RE nowadays aren't RE in the pure sense that the original poster mentioned.

I agree though regexp is a poor choice for parsing html. Take a look at cpan for some better parsing things.
 
tim_m said:
^^ that only finds the tags, it doesn't replace them. then again, the OP never specified what language was being used

(?<=<b>).*?(?=</b>) for html bold tags
can be replaced using the regex class .net framework. Using .net is not the only way do to it though.
 
you said it yourself
(?<=\[b\]).*?(?=\[/b\]) for what was in original posting

will find the 'b' tags.

as i said, you're using a look behind and look ahead assertion to ensure that the <b> and </b> are there. this only lets you know that there is a <b> and </b> in the string somewhere if the regex matches. you need some sort of regex replace to find [ b] and [/b ] and replace them with <b> and </b>

as for a 'real' parser, the way it works is with a stack or two and lots of black magic

an example:
http://www.christian-seiler.de/projekte/php/bbcode/index_en.html
 
tim_m said:
you said it yourself


as i said, you're using a look behind and look ahead assertion to ensure that the <b> and </b> are there. this only lets you know that there is a <b> and </b> in the string somewhere if the regex matches. you need some sort of regex replace to find [ b] and [/b ] and replace them with <b> and </b>

as for a 'real' parser, the way it works is with a stack or two and lots of black magic

an example:
http://www.christian-seiler.de/projekte/php/bbcode/index_en.html

i understand now, thx.
 
Back
Top