regex help

Bigbacon

Fully [H]
Joined
Jul 12, 2007
Messages
21,290
I am trying to get a single bit of information out of an HTM page using regex.

I am opening the page in a streamreader into a string in asp.net

I am then trying to use regex to pull just what I need and I can't figure it out.

what I'm trying to get is the path out of this string, which is from a javascript function on the page.

NextPage=../module1/M1040.htm"

so I need whatever is between NextPage= and the double quote at the end. So from 'NextPage=' till I hit a " and then stop.
 
Regex sounds like a much more expensive operation than "string.Substring()" parsing.
 
Regex sounds like a much more expensive operation than "string.Substring()" parsing.

the length could be different, and I still have to find the start position within the document of where that string starts (easy).

I have parse any number of HTM files for this string so everything could vary greatly.
 
Code:
NextPage=.{1,}"
will capture it, but you'll have to trim the ending quote, and the "NextPage" text. I think it's possible to make the quote not capture, but I don't remember how :confused:

Or like ptnl said you could do something like
Code:
xxx.Substring(xxx.IndexOf("NextPage"),(xxx.IndexOf("\"",xxx.IndexOf("NextPage")))-xxx.IndexOf("NextPage"));
 
I got it working with the substring method. well cool, I can move on with this at somepoint when I get some more time. thanks guys, you are always so helpful when it comes to the silly stuff.
 
Regex is my specialty. So assuming "content" is the HTML page use this for quick access C# code below.

String regexNextPage = "(?:nextpage=(?<NextPageLink>[^\"]+)\")";
Match nextPageMatch = Regex.Match(content, regexNextPage, RegexOptions.Singleline | RegexOptions.IgnoreCase);

String nextPageLink = nextPageMatch.Groups["NextPageLink"].Value;

Hope this helps.
 
Back
Top