Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Trying to split a chunk of text into two parts...

Last post 11-20-2008, 12:24 PM by mash. 5 replies.
Sort Posts: Previous Next
  •  11-19-2008, 3:08 PM 48524

    Trying to split a chunk of text into two parts...

    Hi, I'm a Regex newbie and I'm trying to figure this out, your help is most appreciated!

    The platform I'm using is Yahoo! Pipes' Regex function -- documentation is here: http://pipes.yahoo.com/pipes/docs?doc=operators#Regex

    What I'm trying to do is take the item.description from an RSS feed (stripped of all HTML using <[a-zA-Z\/][^>]*> ) and return an excerpt of the first 20 words (or so, it's not important how many exactly) followed by an ellipsis. The text can be multiple paragraphs.

    In the library I found this expression to search for the first few words: ^([\s\S]){1,200}([\b\.])

    The problem I'm having is that I can't figure out how to replace the *entire* text with just the excerpt. Yahoo's regex will only let me replace a search-string with a variable-string. So I need to either

     A) split the text into two variables:

    $1: The first 200 characters (without splitting up any words)

    $2: The rest of the text

     Then I can run a search for ($1)($2), and replace the whole text with just $1

     or 

    B) Create a Regex that will search for the *last* chunk of text -- basically the inverse of ^([\s\S]){1,200}([\b\.])

    Then I can replace that last chunk with an ellipsis.

    I hope you'll be able to help -- thanks for your efforts!

     

     

  •  11-19-2008, 7:23 PM 48537 in reply to 48524

    Re: Trying to split a chunk of text into two parts...

    Can you please provide us with some sample text so we can see what you are trying to achieve? We need to see both the 'before' and 'after' versions so that we can understand your problem a little better.

    Susan

  •  11-19-2008, 8:09 PM 48539 in reply to 48524

    Re: Trying to split a chunk of text into two parts...

    In limited tests this seemed to do something like you seek (example for 20 "words" shown, where a "word" is 1 or more non-whitespace characters followed by 0 or more whitespace characters):

    Raw Match Pattern:
    (?s)^((?:\S+\s*){20}).*$

    Raw Replace Pattern:
    $1...

    If your app doesn't like the pattern, try removing "(?s)" from the pattern as the singleline setting might be already assumed by your application.




    looking for a new regex book?
    Regular Expressions Cookbook
  •  11-20-2008, 1:19 AM 48551 in reply to 48524

    Re: Trying to split a chunk of text into two parts...

    This sounds very similar to my word break regex http://regexadvice.com/blogs/mash/archive/2005/02/09/324.aspx

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  11-20-2008, 9:37 AM 48563 in reply to 48539

    Re: Trying to split a chunk of text into two parts...

    ddrudik, that works perfectly! Thank you so much!

    Michael, I did look at your regex but I couldn't figure out how to modify it to do what I needed to do.

  •  11-20-2008, 12:24 PM 48566 in reply to 48563

    Re: Trying to split a chunk of text into two parts...

    IshMEL:

    ddrudik, that works perfectly! Thank you so much!

    Michael, I did look at your regex but I couldn't figure out how to modify it to do what I needed to do.

    Well with my pattern the original idea was to just grab a snippet for display purposes, not to replace it but if you wanted to replace it's a small modification.

    1)Make the first group a capturing group by removing the ?:

    2)Match the rest of the input outside of the capture.

    ^(?:[ -~]{2,100}(?:$|(?:[\w!?.])\s))[\s\S]+$

    My original pattern

    ^(?:[ -~]{2,100}(?:$|(?:[\w!?.])\s))

    My modified pattern

    Raw Match Pattern:

    ^([ -~]{2,100}(?:$|(?:[\w!?.])\s))[\s\S]+$



    Raw Replace Pattern:

    $1...



    $sourcestring after replacement:

    Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived ...

     The source was the Gettysburg Address.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML