Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Acquire subset data

Last post 11-18-2008, 7:52 PM by Aussie Susan. 4 replies.
Sort Posts: Previous Next
  •  11-17-2008, 7:41 PM 48410

    Acquire subset data

    I have a dataset of the following: [0008]data_to_return[0009] or [0008]data_to_return#

    This regex: (\[0008\].*?\])|(\[0008\].*?#) will return: [0008]data_to_return[0009], but I would like to search, but only return the data within, for example: "data_to_return"

    Any suggestions?

  •  11-17-2008, 9:13 PM 48412 in reply to 48410

    Re: Acquire subset data

    Can you please read the 'posting guidelines' note at the top of this forum and provide the details mentioned there; in particular the regex and programming language you are using.

    The general answer to your question is along the lines of:

    \[0008\](.*?)(\[0009\]|#)

    and look at match group #1 for the data.

    Another approach may be:

    (?<=\[0008\])((?!#|\[0009\]).)*

    which will return only the item you are after but this requires lookarounds to be supported by your regex engine. Also, this assumes that the data to be retrieved will terminate at the first "#' or "[0009]" so that "fred [0008]wombat #3 is the subject of the study[0009]" will return "wombat ".

    Susan

  •  11-18-2008, 12:00 AM 48418 in reply to 48412

    Re: Acquire subset data

    So sorry...

     

    Thanks for the reply.

    I am using the .Net framework 2.0 and 3.5, C# 3.0.

  •  11-18-2008, 10:27 AM 48448 in reply to 48412

    Re: Acquire subset data

    Those formulas don't work for me, using the .Net Framework 3.0 (C#).

    I referenced look ahead and look behind functionality, and came up with this (notice the square brackets are escaped):

    (?<=\[0008\]).*?(?=\[)|(?<=\[0008\]).*?(?=#)

    This will return the value that I was expecting.

    [0008]somedatawithin[0100]extradata = somedatawithin

    [0008]moredatawithin#andyetmore = moredatawithin

  •  11-18-2008, 7:52 PM 48481 in reply to 48448

    Re: Acquire subset data

    Interesting! Apart from introducing a new element into your requirements (that the terminator is not always "[0009]"), the pattern I provided works exactly the same as yours except that I explicitly test for the terminator character as I move forward rather than rely on any backtracking to find it. If you gather the common factors on either side of the alternation from your pattern you go from:

    (?<=\[0008\]).*?(?=\[)|(?<=\[0008\]).*?(?=#)

    to:

    (?<=\[0008\]).*?((?=\[)|(?=#))

    to:

    (?<=\[0008\]).*?(?=\[|#)

    which is functionally the same as my suggestion. Therefore I would be interested in knowing the errors you were getting to make it "don't work for me".

    Susan

View as RSS news feed in XML