Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex which match between two signs

Last post 03-31-2011, 6:08 PM by Aussie Susan. 5 replies.
Sort Posts: Previous Next
  •  03-30-2011, 9:04 AM 79780

    Regex which match between two signs

    Hi,

    I want to create a regex which match everything BETWEEN two "\"

    e.g. "Thats\aTest\For\my\Regex"
    then it should match: aTest\For\my
    I get this:
    \\.*\\

    But it match: \aTest\For\my\
    any ideas?
  •  03-30-2011, 1:53 PM 79783 in reply to 79780

    Re: Regex which match between two signs

    Quantifiers are normally "greedy", they initially try to match as much as possible and try matching less only if it has to.

    What you want are "lazy" quantifiers, which initially try to match as little as possible and try matching more only if it has to.

    Usually a lazy quantifier looks like the equivalent greedy quantifier but with a "?" afterwards, like this:

        \\.*?\\

    You didn't say which regex variant you're using (there isn't a single "standard" regex), so I'm assuming a Perl-like one.

  •  03-30-2011, 1:59 PM 79784 in reply to 79783

    Re: Regex which match between two signs

    ohh sorry, I use a CMAKE reg ex

    http://www.cmake.org/cmake/help/syntax.html

     

    Note: "This\is\a\TestFile.cpp"

    should match to: "is\a"

  •  03-30-2011, 6:29 PM 79787 in reply to 79784

    Re: Regex which match between two signs

    Looking at the page you reference for the CMAKE regex syntax, it doesn't mention the use of lazy quantifiers that MRAB was talking about - I suspect that is does not support them (but I have not done a in-depth investigation of the capabilities of the CMAKE regex engine).

    However, going back to your original problem, what exactly are you trying to do. If, going by your 2nd posting example, you want to match the characters between the first and 3rd "\" characters and you know the characters that can occur on either side of the 2nd "\" (assumed here to be alphanumeric plus a couple of other common file path characters) then something like (untested):

    [\w$ -]+\\[\w$ -]+

    would provide the match you specify from your example.

    However, if you want the characters between the first and last "\" characters (as shown in your original posting), then the pattern you have suggested yourself is correct. If the only thing that is wrong with it is that it includes the leading and trailing "\" characters then try (again untested):

    \\(.*)\\

    and use the text captured in match group #1.

    Susan

  •  03-31-2011, 3:01 AM 79798 in reply to 79787

    Re: Regex which match between two signs

    yeah, this is what I wanted:

    \\(.*)\\

     

    Can you please explain me the use of "( )"?

     

  •  03-31-2011, 6:08 PM 79832 in reply to 79798

    Re: Regex which match between two signs

    The '( )' construct started life as a way of indicating a "capture group" which tells the regex engine to keep a copy of the (last) text that whatever is inside the parentheses matches. For example

    \d(\d)\d

    will match 3 consecutive digits but capture the 2nd digit. Once the text is captured it can be referenced later in the pattern (often referred to as a back-reference) and/or in a replacement string and/or within the program that surrounds the matching operation.

    This can be useful as in:

    (\w+).*?\1

    which will match 1 or more alphanumeric characters and save the value in match group #1. It will then scan forward looking for a second instance of the captured text. Given (say)

    the cat sat on the mat

    the first "the" text might be match in match group #1 and the '\1' will then match against the 2nd repetition of the characters. (By the way don't use this for real - there are MANY things that can go wrong with this example pattern and it is just to try to demonstrate back-references).

    The parentheses can also be used to group portions of a pattern together. For example, if you want to find a repeated group of a digit followed by a space you could write

    (\d\s)+

    If you wrote

    \d\s+

    then the '+' quantifier would apply only to the '\s' but with the parentheses the quantifier will apply to whatever is between them. In effect it treats the inner sup-pattern as a single item and applies the quantifier to that.

    As often happens during the evolution of regex pattern syntax, the parentheses have been pressed into service in many other ways. For example, if you combine the above to uses of parentheses, you soon realise that every time you use parentheses to group a set of regex operators, you also tell the regex engine to capture the matched text. However some regex engines only allow a limited number of captures (often 9 so that you can use '\1' to '\9' as back-references but will use '\11' as a way of expressing an octal literal character value etc - this just gets messy!) and so a "non-capturing" variant was introduced with the syntax '(?:   )' - this works by letting you apply a quantifier to a sub-pattern without capturing the matched text.

    There are many other variations of this but most start with '(?   )' with some character(s) after the '?' that determines the operation within the parentheses.

    I hope this explains what the '(   )' is all about.

    Susan

View as RSS news feed in XML