Regular Expressions

pr0pensity · Apr 16, 2005

I'm reading up on this, but I don't quite get the syntax. How would I parse [b] [/b] tags or replace all ascii characters from 0 to 31 with a space?

BollWeevil · Apr 16, 2005

Here's some examples in perl.

To replace anything in HTML angle brackets, one would use the following command, which removes anything in between and including < >

$thisLine =~ s/<[^>]*>//g;

To replace any ascii char that is not a letter or a number (thereby filtering out ascii 0x0 to 0x30, use the following, which replaces anything that is not a letter or number with a space.

$thisLine =~ s/[^0-9a-zA-Z]/ /g;

Please note that in the first example, if the html <tags> span multiple lines, then it won't work properly. Be creative with loops.

ameoba · Apr 16, 2005

While generally adequate, REs aren't really up to the job of parsing HTML (or HTML-like markup such as vB-code) where you can legally nest things to arbitrary depths. Basic CS stuff - REs are not powerful enough to handle CFGs.

HJB417 · Apr 16, 2005

(?<=).*?(?=) for html bold tags

(?<=\[b\]).*?(?=\[/b\]) for what was in original posting

will find the 'b' tags.

tim_m · Apr 16, 2005

^^ that only finds the tags, it doesn't replace them. then again, the OP never specified what language was being used

pr0pensity · Apr 16, 2005

ameoba said:
While generally adequate, REs aren't really up to the job of parsing HTML (or HTML-like markup such as vB-code) where you can legally nest things to arbitrary depths. Basic CS stuff - REs are not powerful enough to handle CFGs.

What would I use?

Fryguy8 · Apr 16, 2005

A real parser. Although most RE nowadays aren't RE in the pure sense that the original poster mentioned.

I agree though regexp is a poor choice for parsing html. Take a look at cpan for some better parsing things.

HJB417 · Apr 16, 2005

tim_m said:
^^ that only finds the tags, it doesn't replace them. then again, the OP never specified what language was being used

(?<=).*?(?=) for html bold tags
can be replaced using the regex class .net framework. Using .net is not the only way do to it though.

tim_m · Apr 16, 2005

you said it yourself

(?<=\[b\]).*?(?=\[/b\]) for what was in original posting

will find the 'b' tags.

as i said, you're using a look behind and look ahead assertion to ensure that the and are there. this only lets you know that there is a and in the string somewhere if the regex matches. you need some sort of regex replace to find [ b] and [/b ] and replace them with and 

as for a 'real' parser, the way it works is with a stack or two and lots of black magic

an example:
http://www.christian-seiler.de/projekte/php/bbcode/index_en.html

HJB417 · Apr 16, 2005

tim_m said:
you said it yourself

as i said, you're using a look behind and look ahead assertion to ensure that the and are there. this only lets you know that there is a and in the string somewhere if the regex matches. you need some sort of regex replace to find [ b] and [/b ] and replace them with and 

as for a 'real' parser, the way it works is with a stack or two and lots of black magic

an example:
http://www.christian-seiler.de/projekte/php/bbcode/index_en.html

i understand now, thx.

Regular Expressions

pr0pensity

[H]ard|Gawd

BollWeevil

Limp Gawd

ameoba

Supreme [H]ardness

HJB417

[H]ard|Gawd

tim_m

i'm so nice

pr0pensity

[H]ard|Gawd

Fryguy8

[H]ard|Gawd

HJB417

[H]ard|Gawd

tim_m

i'm so nice

HJB417

[H]ard|Gawd