Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Repeated chunk of characters

Last post 02-09-2009, 7:56 PM by Aussie Susan. 25 replies.
Page 2 of 2 (26 items)   < Previous 1 2
Sort Posts: Previous Next
  •  08-28-2008, 3:02 PM 45761 in reply to 45760

    Re: Repeated chunk of characters

    ok i tried this instead but it doesnt give me all matches

     \d\d/\d{3}\s+(?:(?![A-Z]{8}).)*?\s+[A-Z]{3,4}\b(?:(?!\d\d/\d{3}).)*?\s+DATA\s+CRITICAL\s+\d{3}\s+M\s+RADIUS\s+AT\s+\d{6}[NS]/\d{7}[WE] 

    Hmm almost there.

     AGP

     

  •  08-28-2008, 3:07 PM 45762 in reply to 45761

    Re: Repeated chunk of characters

    \d\d/\d{3}\b(?:(?!\d\d/\d{3}).)*?\b[A-Z]{3,4}\b(?:(?!\d\d/\d{3}).)*?\s+DATA\s+CRITICAL\s+\d{3}\s+M\s+RADIUS\s+AT\s+\d{6}[NS]/\d{7}[WE]
  •  08-28-2008, 3:12 PM 45763 in reply to 45761

    Re: Repeated chunk of characters

    ok I think I was going about it the wrong way. I will assume that for now that extra word is 8 characters exactly so i could do something like

     \d\d/\d{3}\s+([A-Z]{8})?(?:(?![A-Z]{8}).)*?\s+(?<myid>[A-Z]{3,4})\b(?:(?!\d\d/\d{3}).)*?\s+DATA\s+CRITICAL\s+\d{3}\s+M\s+RADIUS\s+AT\s+\d{6}[NS]/\d{7}[WE] 

    once I run it the myid group matches the first 3 or 4 letter identifier right after the message number. I may be doing this with a bit more complication than is necessary though. Do you think the way I wrote it is inefficient? I will try your way as well and report on my testing.

    AGP

     

  •  08-28-2008, 4:42 PM 45765 in reply to 45763

    Re: Repeated chunk of characters

    I can't comment on efficiency without testing with your data.
  •  08-29-2008, 1:17 AM 45769 in reply to 45765

    Re: Repeated chunk of characters

    well what i meant about efficiency was the construct of the regexp. the one i ended up with works but yours is built different and was wondering if the construct that i came up with looks sound.

     

    AGP

  •  08-29-2008, 9:45 AM 45775 in reply to 45769

    Re: Repeated chunk of characters

    It depends on the variance of your source, I guess I would use my pattern unless it failed to fit your source.


  •  02-05-2009, 1:58 PM 50717 in reply to 45775

    Re: Repeated chunk of characters

    ok back to this challenge. i have made incremental changes to the base regexp but only for slight variation in spelling. asides from that the message I am parsing looks like so

     251824 08/380 ZZZ THIS IS WORTHLESS TEXT
     0808251645-0808251830
    251824 08/381 ZZZ NOT WORTH ANYTHING
     0808251645-0808251830
    181225 08/027 ZZZ DATA CRITICAL 305 M RADIUS AT 372419N/1154323W  

    There could be some other words between ZZZ and DATA CRITICAL 305 M RADIUS AT 372419N/1154323W. So in other words I want to match in bold

    08/027 ZZZ DATA CRITICAL 305 M RADIUS AT 372419N/1154323W 

    08/035 ZAA RESPOND OTS DATA CRITICAL 25 M RADIUS AT 3725500N/1152350W  

    07/044 ZBB INDETERMINATE USER NO RESPONSE DATA CRITICAL 9 M RADIUS AT 3725500N/1152350W   

    Now a variation has shown up in the messages that looks like so:

    09/055 KZLA 0902021930 0902060630 1930-2230/0230-0630 DLY QXXXX DATA CRITICAL 35 M RADIUS AT 371908N/1154250W

    When I use my regexp it works out well but the myid variable is being assigned the value="DLY" but the value should instead be "KZLA"

    (?<msgnum>\d\d/\d{3}) ([A-Z]{8})?(?:(?![A-Z]{8}).)*?(?<myid>[A-Z]{3,4})\b(?:(?!\d\d/\d{3}).)*?(?<event>(DATA CRITICAL )(?<radius>\d{1,}) ?M RADIUS AT (?<lat>\d{5,6}\.?\d?[NS])/(?<lon>\d{5,7}\.?\d?[EW])) 

    I've been experimenting with different methods to assign the correct value but any help is appreciated. 

    AGP
  •  02-05-2009, 7:00 PM 50727 in reply to 50717

    Re: Repeated chunk of characters

    The problem is that you have two completely optional groups that sit on either side of your '<myid>' group.

    Before it you have:

    ([A-Z]{8})?(?:(?![A-Z{8}).)*?

    which can match nothing or, as in this case, the first part will match nothing but you then allow it to match anything, stopping only when you get a sequence of 8 alpha characters or earlier because of backtracking.

    After it you have

    \b(?:(?!\d\d/\d{3}).)*?

    which can again match nothing or any sequence of character as long as it is not 'dd/ddd' or earlier according to backtracking.

    The secret here is the fact that both parts are subject to backtracking. The regex engine will backtrack only as far as it needs to go from the last processed part of the pattern/text.

    With your text:

    09/055 KZLA 090......DLY

    it will match the '<msgnum> part, and try to match the "KZLA" text. Therefore it will skip over the '([A-Z]{8})?' part of the pattern and macth the KZLA and following characters to the '(?:(?![A-Z{8}).)*?' part until it finds a string of 3 or 3 (3 in this case) alpha when it gets to the "DLY" text.

    As it continues on, it gets to '\b(?:(?!\d\d/\d{3}).)*?' which (except for the '\b' placeholder) can be satisfied with a 0 character match. What follows them matches OK.

    There is never a reason why it should go back and try to remove some of the characters from the first optional match and place them in the second one.

    The solution is to re-think the patterns that are "optional" and guide the regex as to what you need to do to find the "KZLA" and not the "DLY".

    Susan 

  •  02-06-2009, 11:07 AM 50740 in reply to 50727

    Re: Repeated chunk of characters

    I think I see what you are saying. Let me experiment with various modifications. In the end I was trying to match two basic groups like so:

    06/095 KDZZNAXX ZBB ...<almost any string including numbers>...DATA CRITICAL 9 M RADIUS AT 3725500N/1152350W 

    07/044 ZBB ...<almost any string including numbers>...DATA CRITICAL 9 M RADIUS AT 3725500N/1152350W

    I put the match but not capture regexp so that some intermediate strings like 06/022 would not get captured. My main problem is avoiding matches from myid (like ZBB) to the key words DATA CRITICAL. Ill post back with my attempts.

     AGP

     

  •  02-09-2009, 12:13 PM 50799 in reply to 50740

    Re: Repeated chunk of characters

    I tried a couple of things but the backtracking part is not my forte. What i was trying to do was avoid false positives which sometimes show up with strings in the middle of the type 05/255. In my case I do need to filter out those unwanted strings. Right now what I've done is just taken the return match and manually test for the 2nd or 3rd string and then overwrite my value with the manual. Its crude but it works. What would your suggestion be here to try and fix my regexp?

     

    AGP

  •  02-09-2009, 7:56 PM 50811 in reply to 50799

    Re: Repeated chunk of characters

    As I said before, the only way we can answer this is for you to provide us with the rules as to how we can determine what is valid in the various optional parts of your pattern. In other words, is there something that says (for example), if there is a particular sequence of digits then this must be after the 3/4 letter code I'm looking for.

    Unfortunately this is something that only you can answer for us as you know the complete context in which this work is being done.

    As a suggestion, you might try looking at what the various intervening characters represent and see if there is some way you can link their presence/absence/length/whatever with the code(s) you are seeking.

    Susan 

Page 2 of 2 (26 items)   < Previous 1 2
View as RSS news feed in XML