Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Breaking text into parts at keywords

Last post 08-28-2008, 3:46 AM by prometheuzz. 7 replies.
Sort Posts: Previous Next
  •  08-27-2008, 12:03 PM 45708

    Breaking text into parts at keywords

    Dear colleagues,

    I like to parse a text and split it at specific keywords (e.g. k1 or k2). I'll use it in a .NET application like

    MatchCollection matches = Regex.Matches (sSrc, sRegEx); but for testing I am using RadSoftware Regex-Designer.

    My sSrc is something like:

    "k1 blakblak\r\nblakblak k2 blablabla k1 data3 k2 bla\r\nbla"

    I like to get multiple matches:

    • "k1 blakblak\r\nblakblak "
    • "k2 blablabla "
    • "k1 data3 "
    • "k2 bla\r\nbla"

    I tried: "\b(k1|k2)\b[^(k1|k2)]*", but matches ended at any character 'k' and not at the word k1 or k2, returning "k1 bla" instead of "k1 blakblak\r\nblakblak".

    I don't know how to stop matching at specific words instead instead of single characters.

    Thank you for your help.

    Regards, Thomas

     

  •  08-27-2008, 12:24 PM 45710 in reply to 45708

    Re: Breaking text into parts at keywords

    Try a regex like this:

    (?:k1|k2)((?!(?:k1|k2)).)*+

    and enable a DOT-all option/flag so that the DOT will match new line characters.

  •  08-27-2008, 12:37 PM 45711 in reply to 45708

    Re: Breaking text into parts at keywords

    Extending that pattern you could also use:

    (?s)(k1|k2)(?:(?!(?1)).)*

    Where (?1) is the contents of the first parens group, in this case, "k1|k2".  Since you are using .NET you could also dynamically build the keyword list "k1|k2" if desired.




    looking for a new regex book?
    Regular Expressions Cookbook
  •  08-28-2008, 3:19 AM 45727 in reply to 45710

    Re: Breaking text into parts at keywords

    Thank you for your help.

    (?:k1|k2)((?!(?:k1|k2)).)*+

    What is the ending plus sign for? "*+" reports an error.

    While testing it in RadSw Regex Designer I did not find an DOT-All option. But replacing the dot with [\s\S] worked fine.

    Thank you again. Regards, Thomas

  •  08-28-2008, 3:28 AM 45728 in reply to 45727

    Re: Breaking text into parts at keywords

    The plus after the star makes all the matches the plus makes, possessive. Details about possessiveness can be found here:

    http://www.regular-expressions.info/possessive.html

     Good to hear you've got it sorted, and you're welcome.

  •  08-28-2008, 3:33 AM 45729 in reply to 45711

    Re: Breaking text into parts at keywords

    Dear ddrudik,

    thank you for your answer. I have not heard about ?1 yet. Sounds interesting and makes repetitive parts shorter and easier to read.

    I will try it. Thank you.

    Sincerly, Thomas 

  •  08-28-2008, 3:39 AM 45730 in reply to 45728

    Re: Breaking text into parts at keywords

    Dear prometheuzz,

    thank you for your indepth guiding. As a regex-newbie I definitly need it. ;-)

    As I learned at http://www.regular-expressions.info/dotnet.html .NET lacks of possessive features.

    But the way you told me works fine yet.

    Thank you and have a nice day,

    Thomas

  •  08-28-2008, 3:46 AM 45731 in reply to 45730

    Re: Breaking text into parts at keywords

    You're welcome Thomas, and thank your for informing me about the lack of possessive features in C#: I have also learnt something new!
View as RSS news feed in XML