Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Help with Multi Line Match

Last post 01-15-2013, 7:12 AM by killahbeez. 2 replies.
Sort Posts: Previous Next
  •  01-14-2013, 6:12 AM 87211

    Help with Multi Line Match

    I need help to match the following over mutiple lines but cant figure it out as when there is multiple dates/time stamps in a row it matches the first date/timestamp with the username (UTVxxx).

    Apologies for made up data, by I need to keep my client data secure, Heres my problem - Im looking through a massive log file for when the dates when the user (UTVxxx) accessed our site.  

    Here what I want: I want to find UTV282 and capture the date that immediately precedes it

    Example of what i am matching at the moment

    2012-11-18 15:27:04
    Data Here
    Data Here
    Data Here
    2012-11-18 16:10:12
    Data Here
    Data Here
    Data Here, for:UTV282, count now: 2
    2012-11-18 15:17:04
    2012-11-18 18:42:25
    2012-11-18 01:27:34
    Data Here
    Data Here
    Data Here
    Data Here, for:UTV282, count now: 2

     

    Example of how I want it to match (i.e the first preceding date/time)

    2012-11-18 15:27:04
    Data Here
    Data Here
    Data Here
    2012-11-18 16:10:12
    Data Here
    Data Here
    Data Here, for:UTV282, count now: 2
    2012-11-18 15:17:04
    2012-11-18 18:42:25
    2012-11-18 01:27:34
    Data Here
    Data Here
    Data Here
    Data Here, for:UTV282, count now: 3

    My construction at the moment: [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},.*?for:UTV282?

    Can somebody please help! - Many thanks! :)

  •  01-14-2013, 4:52 PM 87212 in reply to 87211

    Re: Help with Multi Line Match

    Try:

    ^(\d{4}-\d{2}-\d{2})((?!(\d{4}-\d{2}-\d{2}|UTV\d{3})).)*UTV\d{3}

    with the "singleline" and "multiline" options set.

    You don't say what regex variant you are using so the above pattern assumes that you can use lookaheads (a fairly common capability).

    The way it works is as follows:

    ^ - with the "multiline" option set, this matches the start of each line in the text

    (\d{4}-\d{2}-\d{2}) - this is a capture group to get the date; I don't know if you really need to capture this date and, if not, then just drop the parentheses. Alternatively, if you need the time to be recorded as well, then extend this part of the pattern as you have done in your trial version.

    ((?!(\d{4}-\d{2}-\d{2}|UTV\d{3})).)*UTV\d{3} - Before getting to the complexity of this pattern, I'll take you through a simpler version. Lets say you want to match everything from where you are in the text up to the next "Z". You could use your idea of a lazy quantifier, but there is al alternative:

    ((?!Z).)*

    What this does is to use a negative lookahead to see if the next character is a "Z"; if it is then the negative lookahead fails and this part of the match will stop. If the next character is not "Z", then the negative lookahead will succeed and the '.' will match that character. The '*' quantifier will then make the pattern repeat until either the enxt character is a "Z" or you reach the end of the text. We can test for these 2 conditions and require that the next character really is a "Z" with

    ((?!Z).)*Z 

    By the way, the use of the '.' operator here is the reason shy the "singleline" option must be set; otherwise the match will stop at the end of the line as the '.' operator matches all characters except the new-line, unless the "singleline" option is set. 

    OK, now lets extend this example so that we match from where we are in the text to the next "Z" unless there is a "Y" in between . The pattern for this is:

    ((?!Z|Y).)*Z

    This works in basically the same way except that the negative lookahead will stop with the next character is either a "Y" or a "Z". The "Z" at the end makes sure that we fail the whole pattern if we get to a "Y" (or the end of the text; in fact anything other than a "Z").

    Now we have the structure we require, we can look back at your requirement. You have already found a date and you want to move forward looking for the "UTV282". Therefore we can substitute the "Z" for "UTV282. However we also don't want a date in between and so we use the date pattern we had above as the "Y" part of our template pattern. The result is the above pattern (with the exception that I allowed for the 3 trailing characters to be any digits as per your requirements statement)

    By the way, I've used '\d' where you used '[0-9]' - in effect these are equivalent but I think the '\d' is easier to read as it is shorter.

    Susan 

  •  01-15-2013, 7:12 AM 87213 in reply to 87212

    Re: Help with Multi Line Match

    (?m)^(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})$(?-m)(?:(?!(?1)|UTV282).)*(UTV282)
    http://portal-vreme.ro
View as RSS news feed in XML