Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Find "page" when it is NOT followed by "article"

Last post 05-12-2008, 6:01 PM by Lyndar. 5 replies.
Sort Posts: Previous Next
  •  05-09-2008, 6:31 AM 42072

    Find "page" when it is NOT followed by "article"

    Hi guys,

     I'm quite new to the whole RexExp thing (about ~1 hour).

     What I'm trying to do is parse a list of URLs I have. They look like this:

    www.domain.com/page.php?Page=123&Article=123

    But sometimes, the Article-Parameter is missing. They then just look like this:

    www.domain.com/page.php?Page=123

    I want to find those with the missing Article parameter. What I tried was:

    Page.*(Article){0}

    Meaning: I want to find the URL where Page is followed by anything but not "Article". This doesn't work. Do you have any ideas how to do it?

    Thanks a lot,

    pHew

  •  05-09-2008, 9:33 AM 42074 in reply to 42072

    Re: Find "page" when it is NOT followed by "article"

    the solution heavily depends on how exactly you trying to match your strings, in other words, real examples in real text are needed. Leaving alone language/platform you are using, which u also did not mention. Being a programmer, u know that every solution implementation is platform/language specific, regex is no exception. Shooting in the dark, i can offer this sysntax

    \bwww\.\S*?Page((?!Article)\S)+(?=\s|$)

    assuming that your urls are matched as separate strings delimited either by a white space or a new line

  •  05-09-2008, 9:49 AM 42076 in reply to 42074

    Re: Find "page" when it is NOT followed by "article"

    Hello Sergei,

     Thanks for your help. The URLs are not in a file or anything. I use the application A1 Sitemap Generator to generate a sitemap. And it has an option to "Disallow internal links that match regex". So I have no clue what language that is in. It's on Windows Vista, if that helps (I doubt it).

    I was thinking I don't need to concern about \b and $, since it just tries to find the regex in the specifig URL. If it is found, the URL is deleted. If not, happy times! The URL is saved.

    Unfortunately I don't understand the expression you suggested, besides having read regex tutorials all day long. But I will try to make it! Thanks for your help.

  •  05-09-2008, 10:17 AM 42081 in reply to 42076

    Re: Find "page" when it is NOT followed by "article"

    did you try to use the regex? It sh not match on those urls that have a str "Article" in them. U can also drop \b and (?=\s|$).

    U can parse the regex in regexBuddy or Expresso to get the meaning of the syntax. Expresso is a free regex tool.

  •  05-09-2008, 10:41 AM 42082 in reply to 42081

    Re: Find "page" when it is NOT followed by "article"

    I'm trying right now. Thanks for the tips.

     

    It didn't work :-(

  •  05-12-2008, 6:01 PM 42142 in reply to 42082

    Re: Find "page" when it is NOT followed by "article"

    Try this:

    ^(?!.*?&Article).*$

     

    Matches:

    Everything between beginning of line and end of line that does not have &Article located within it.

    Note that if the regex is being applied to a complete list at once, you will need to have the singleLine (or dot matches all) option Off.

     

View as RSS news feed in XML