Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Negative Lookahead trouble

Last post 06-20-2012, 10:47 AM by gman. 5 replies.
Sort Posts: Previous Next
  •  06-18-2012, 8:07 AM 85498

    Negative Lookahead trouble

    I am trying to match 1-7 numeric digits at the beginning of the field where "CO" and "LLC" are not present but only at end of string. I found where negative lookahead may be right for this situation but am having trouble getting it to work.

    This is an example where no match is desired:

    1123 FINANCE CENTER CO

    and one example where I do want a match:

    331 ACR CO RD 12

    As always your help is much appreciated.

    Gerald

  •  06-18-2012, 7:30 PM 85503 in reply to 85498

    Re: Negative Lookahead trouble

    I'm not sure if this is "optimised" but try"

    ^(\d{1,7})(?:(?!co|llc).)*((?:co|llc)(?!\r?$)|\r?$)

    with the "ignore case" and possibly the "multiline" set, and the "singleline" option clear. If your platform (you don't say that you are using) ends lines with just "\n" then you can get rid of the '\r?' references.

    The way this works is as follows:

    ^                              - match the start of the line
    (\d{1,7})                    - match the 1 to 7 digits and capture them in group #1
    (?:(?co|llc).)*             - skip over characters until you get to either"co" or "llc" (or the end of the line as implied by the "singleline"options NOT being set)
    ((?:co|llc)(?!\r?$)|\r?$) - match the "co" or "llc" unless it is followed by the end of the line OR match the end of the line

    I've sprinkled a few '(?:' constructs through the pattern to stop the captures so that only group #1 will have the digits you are after. They are not really necessary but the grouping structures they are in are.

    I've tried this on:

    1123 FINANCE CENTER CO
    1123 FINANCE CENTER
    331 ACR CO RD 12

    and it matches the 2nd and 3rd lines. I've added the 2nd line as I interpreted your requirement to include this as a valid line.

    Susan

  •  06-19-2012, 12:14 PM 85506 in reply to 85503

    Re: Negative Lookahead trouble

    The regex is being developed in Javascript in a KETTLE environment. I copied your regex example into Regex Buddy and did some testing with the desired results.

    Thanks so much for your prompt and helpful reply.

    Gerald 

     

  •  06-19-2012, 1:11 PM 85507 in reply to 85506

    Re: Negative Lookahead trouble

    I tried to apply the pattern you supplied to another situation and did not achieve the results that I wanted. Using this regex, I want to match whenever TRUST appears at the beginning of the string except when followed by DEPT. 

    ^(TRUST)(?:(?!DEPT).)*

    Presently, it matches on the word TRUST regardless whether DEPT is present or not. I'm sure it's something very small which I'm overlooking.

    Thanks again for your support.

    Gerald 

  •  06-19-2012, 6:57 PM 85514 in reply to 85507

    Re: Negative Lookahead trouble

    What your pattern will do is:

    ^                    - match the start of the line/string
    (TRUST)         - match the characters "TRUST" immediately at the start of the line and capture the text (#1)
    (?:(?!DEPT).)* - check to see if the next characters are "DEPT" - if not (this is a negative lookahead) then step forward 1 character and repeat - if so then stop

    (#1 - there is little need to capture the text in this case. The captured text can only be "trust" [perhaps ignoring upper or lower case] and so you know beforehand that if the pattern as a whole matches, then the first characters must be "trust" without capturing them. In your original question, the initial characters could be any digit; therefore there may have been value in capturing them so they could be used later on)

    In effect this makes sure that the work "trust" is at the start of the line and then matches everything until either the work "dept" is found or the end of the line is reached.

    Therefore you will need to add a check that what stops the 2nd part of the pattern is the end of the line and not the word "dept". Therefore you will need to do something like:

    ^(TRUST)(?:(?!DEPT).)*\r?$

    (my previous comment about the use of '\r?' applies here as well). What this does is to say, of the 2 conditions that cause the '(?:(?!DEPT).)*' part to stop matching, the '$' anchor makes sure that it is the end of the line. If the word "dept" had caused the sub-pattern to stop repeating, then the '$' would not match the "D" and the pattern as a whole would fail.

    One thing you need to be aware of (and I overlooked this in my previous response as well) is that the '^TRUST' part will also match if the text begins "trusting" or any other word that begins with "trust". Similarly, the 'DEPT' will match any character sequence that happens to have those 4 characters in sequence within it - in this case "dept" does not generally occur within English words (at least according to the web site I checked) but when you start using this approach in other circumstances, then this can be an important issue.

    The way around it is to use the '\b' anchor to ensure the beginning/end of a word is taken into account. For example, your pattern could become:

    ^TRUST\b(?:(?!\bDEPT\b).)*\r?$

    There is no need for a '\b' before "trust" because you already have the '^' there which means that the '\b' in '^\bT' must always match and so is redundant.

    Susan

  •  06-20-2012, 10:47 AM 85520 in reply to 85514

    Re: Negative Lookahead trouble

    You shed a ton of light on how the negative lookahead construct works and I much appreciated the word boundary suggestions that would have led to false matches. Thank you again for your help!!

    Gerald

View as RSS news feed in XML