Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Finding Nth match

Last post 03-19-2008, 9:12 AM by ddrudik. 10 replies.
Sort Posts: Previous Next
  •  03-18-2008, 7:45 PM 40438

    Finding Nth match

    I'm trying to parse out data from a report file using regex in a .net framework.

    The format of the data looks like this: "recordNum","location","lastName","firstName","dob", etc.

    I've been able to grab the first field using both "([0-9A-Z]*)" and "([^"]*)" . I was able to grab the second field using ","([^"]*)"

    But what if I want to grab the 10th or 30th field? I can't seem to find the correct way to implement a \n where 'n' is the occurence number.
     

  •  03-18-2008, 7:49 PM 40439 in reply to 40438

    Re: Finding Nth match

    Provide an actual data sample to demonstrate with.
  •  03-18-2008, 8:08 PM 40441 in reply to 40439

    Re: Finding Nth match

    I can't give exact data, because it's a medical record, but it looks a lot like this:

    "PATIENT","000099999","O OUT","DOE","JOHN","","01/01/1950",...

    "EMPLOYER","SELF","","","",""

    "GUARANTOR","DOE","JOHN","","999-99-9999",... 

  •  03-18-2008, 8:15 PM 40443 in reply to 40441

    Re: Finding Nth match

    Fair enough, show which nth one you want to match.  I assume you mean nth one per line but please confirm that.
  •  03-18-2008, 8:28 PM 40445 in reply to 40443

    Re: Finding Nth match

    Yes, to clarify I do mean Nth per line. Well I have to map all of them, so let's assume I'm looking to grab just the first name or "John" in the example provided. I'm hoping the solution is something that will allow me to simply cut and paste the regex and change an iteration variable (ie \4 for the 4th field, \5 for the fifth, etc.).

    But to further clarify, I'm essentially writing an XML templates that will pour through 70,000 records per month. So the solution can not be dependent on data in the fields since that changes. The regex has to work something like "after the 8th comma, grab the alpha-numeric characters between the quotes".
     

  •  03-18-2008, 8:41 PM 40447 in reply to 40445

    Re: Finding Nth match

    How about:

    ^("[^"]*",){2}("[^"]*")

    with the 'multiline' option set. This assumes that each item in the source text is surrounded by double-quotes and that there are no 'escaped double quotes' within an item (e.g. "hello"" world"). Look at match group #2 for the item you are after.

    Just set the quantifier to be 1 less than the item you want. This works if you are after the 1st item (in which case the quantifier would be {0}).

    Susan 

  •  03-18-2008, 8:56 PM 40448 in reply to 40447

    Re: Finding Nth match

    Susan, that is very close! You are a regex genious!

    Because this is being used within some alpha software our vendor sent us (which is at least smart enough to allow me to select the desired line), I can't use the leading carat. So I used ("[^"]*",){4}("[^"]*"). That grabbed the 5th field, which is the first name. It selected "JOHN", including the quotes and trailing comma, but I think I can play with it from there and see if I can figure it out...not that I'd complain if you wanted to strip out the unnecessary chars.

    I owe you dinner!
     

  •  03-18-2008, 9:55 PM 40449 in reply to 40448

    Re: Finding Nth match

    How about:

    ("[^"]*",){4}"([^"]*)"

    again, look at the match group #2.

    I'm not sure what is going on when you say that you were grabbing the trailing comma - I can understand the surrounding double-quotes as they are inside the grouping brackets (outside in my suggestion immediately above) but in either case the trailing double quote should stop the match! The comma WILL be included in the match group #1 item(s) but those are being ignored anyway.

    Susan

     

  •  03-18-2008, 9:57 PM 40450 in reply to 40448

    Re: Finding Nth match

    It all depends what your software supports, you might try:

    (?m:(?:"[^"]*",){2}"([^"]*)")

    ----------------------------------------------------------------------
      (?m:                     group, but do not capture (with ^ and $
                               matching start and end of line) (case-
                               sensitive) (with . not matching \n)
                               (matching whitespace and # normally):
    ----------------------------------------------------------------------
        (?:                      group, but do not capture (2 times):
    ----------------------------------------------------------------------
          "                        '"'
    ----------------------------------------------------------------------
          [^"]*                    any character except: '"' (0 or more
                                   times (matching the most amount
                                   possible))
    ----------------------------------------------------------------------
          ",                       '",'
    ----------------------------------------------------------------------
        ){2}                     end of grouping
    ----------------------------------------------------------------------
        "                        '"'
    ----------------------------------------------------------------------
        (                        group and capture to \1:
    ----------------------------------------------------------------------
          [^"]*                    any character except: '"' (0 or more
                                   times (matching the most amount
                                   possible))
    ----------------------------------------------------------------------
        )                        end of \1
    ----------------------------------------------------------------------
        "                        '"'
    ----------------------------------------------------------------------
      )                        end of grouping
    ----------------------------------------------------------------------


  •  03-18-2008, 10:51 PM 40452 in reply to 40450

    Re: Finding Nth match

    ddrudik, that did it!

    With that code, I am able to simply alter the number between the braces to select the correct field. Thank you for the code break-down as well.

    I appreciate the help from both you and Aussie Susan, my Regex-Fu is pretty "weak tea" compared to my other coding skills. I think I'm going to have to pick up a copy of Mastering Regular Expressions and start learning in earnest.
     

    Damn, now I owe two dinners!

  •  03-19-2008, 9:12 AM 40471 in reply to 40452

    Re: Finding Nth match

    I see I left out the ^ from the original pattern, thus the (?m:) is not necessary:

    (?:"[^"]*",){2}"([^"]*)"


View as RSS news feed in XML