Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

pattern matching problem

Last post 04-17-2012, 11:28 PM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  04-16-2012, 12:11 PM 84970

    pattern matching problem

    hi, I have a regex that matches strings like:

    wstring wstrTxt34 = L"5.2 miles from destination ";

    const wregex distPat(L"\\s*\\d+\\.?\\d*\\s*miles from destination\\s*");

    const wregex distPat2(L"\\s*\\d+\\.?\\d*\\s*miles\\s+(from destination)?\\s*"); 

    if(regex_match(wstrTxt33, distPat))

    {

    m_strVIPSResult += CString(L"Distance");

     however, the regex cant match up with the string, and i cant find whats wrong with my regex. where went wrong, thx.

    cheers 

  •  04-16-2012, 7:56 PM 84972 in reply to 84970

    Re: pattern matching problem

    Ignoring the "wstrTxt34" vs. "wstrTxt33" situation, I have tried both of your patterns against the test string and both match.

    As the pattern would seem to be correct, then you need to look at some of the other areas that can go wrong.

    It might be something to do with how the regex engine processes whitespace in the pattern (but I'm only guessing) - if the default is to "ignore whitespace" then the literal space character between (say) "from" and "destination" may be being ignored. You could alway s try replacing the space characters with '\s' (or '\s+') to check this.

    You could also try a simpler pattern (say '\d+') and check that it does find the correct characters. Then you can build it up slowly (say '\d+\.?\d*' as the next step) until you find what is not working.

    Susan

  •  04-17-2012, 12:06 PM 84982 in reply to 84972

    Re: pattern matching problem

    yes, I put the wrong string for testing. and I have another problem here see the following regex:

    const wregex roadPat(L"\\s*\\bStreet|St|Avenue|Ave|Wharf|Boulevard|Blvd|Highway|Hwy|Road|Rd|Lanes|Gardens?|Gate|Place|Terrace|Rue|Bd|Harbour|Way|Common|Drive|Circle|Freeway|Loop|Expressway|Expy\\b\\s*", regex_constants::icase);

    I set up a work boundary here, but this regex will match the sample string

    wstring wstrTxt34 = L"5.2 miles from destination ";

    which is wrong, so how to fix this?

    cheers 

  •  04-17-2012, 11:28 PM 84987 in reply to 84982

    Re: pattern matching problem

    There are 2 things going on here.

    The first is that the alternation operator has a very low precedence. This means that:

    \s*\bStreet|Expy\b\s*

    (which is a very cut down version of your pattern) will match 2 alternatives namely '\s*\bStreet' and 'Expy\b\s*'. I suspect that you want the '\s*\b' at the start and the '\b\s*' at the end to apply to all of the alternatives in between. In that case you need to specify this as

    \s*\b(Street|Expy)\b\s*

    (obviously with all of the other alternatives added).

    The second issue is why the pattern is matching the test string. If you look at what is actually being matched you will see that it is the characters "st". Given what I said before, the "|St|" alternative is being seen by the regex engine as being the ONLY characters that are needed for this alternative to match. The way the regex engine works is to set a pointer to the start of the text and then try to see if the pattern can match starting from that point.

    If it fails (as it will with your test string) the regex engine then advances the text pointer 1 step and tries the pattern again. This is repeated until either a match is made or the text pointer gets to the end of the string.

    What is happening is that the text pointer is pointing to the "s" character in "destination" and the pattern is then trying each alternative in turn until it gets to the '|St|' one. As you have the "ignore case" option set, it will declare a match.

    If you make the change I mentioned to the first problem, then this one will go away (i.e. the '\b' anchors will be applied to the start and end of each option and so will only match a complete "word" and therefore will not match the "st" characters within a word.

    Susan

View as RSS news feed in XML