Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex content transformation

Last post 04-06-2011, 5:17 PM by BaichtalJohn. 10 replies.
Sort Posts: Previous Next
  •  04-04-2011, 5:07 AM 80249

    Regex content transformation

    Hi i have this input i am trying to transform

     

    -1 | Select Size
    980 | S
    979 | M - Not Available
    978 | L

     

    i want to remove all the numbers and the "|" charactor there may also be a fourth item from time to time

     

    thanks so much in advance

    Mikey

  •  04-04-2011, 5:25 PM 80379 in reply to 80249

    Re: Regex content transformation

    try to match your digits and pipe first with this regex:

    \d+\s*\|

    then replace the match with an empty string.

    Waht is your programming language/platform/regex tool? All regex variants are different. What I wrote might not work for your situation.

  •  04-05-2011, 5:37 AM 80389 in reply to 80379

    Re: Regex content transformation

    thanks for the help its regex within visual web ripper. i tried using your line of code but got a error.

    \d+\s*\|

    $1

    is what i tried, any further help would be very much appreciated as i am completly lost!

    Thanks in advance

     

  •  04-05-2011, 10:51 AM 80391 in reply to 80389

    Re: Regex content transformation

    what kind of error you are getting?
  •  04-05-2011, 5:40 PM 80404 in reply to 80391

    Re: Regex content transformation

    Hi thanks again for helping me out

    using

    \d+\s*\|(.*)
    $1

     it gives

     Select Size
    980 | S
    979 | M - Not Available
    978 | L

    which removes the first part before Select size, how do i get it to apply to lines 2,3,4 and potentially 5?

     Thank you again for your help it is VERY appreciated!

    The program is regex based on .net framework

  •  04-05-2011, 7:10 PM 80406 in reply to 80404

    Re: Regex content transformation

    I'm not sure why you have added the '(.*)' to the end of Sergei's suggested pattern, nor included (what I assume is) the replacement string of "$1" as these have the potential to change the operation of the  regex completely.

    You have not told us the options that you are using that may affect what is happening, nor the way you are using the regex pattern. For example, if the "Singleline" option is set, then the '.*' will grab all characters after the first "|" to the end of the entire string - and the '$1" will simply put them all back again. However if you have asked for the regex to go a global replace, then it will start the end match at the end of the first one - but that is now the end of the string and so no more matches can be made.

    On the other hand, if you have not got the "singleline" option set, which of the various "Replace" methods are you using - some specify an upper limit on the number of times the replacement will be made.

    I would suggest a slight change to Sergei's pattern to get the correct behaviour for the first of the 4 lines you have given as your sample text:

    -?\d+\s*\|

    again with the null replacement string. Your first line has the value "-1" as the "number" but this will not be matched by the '\d'. With the '-?' at the start you will also remove any leading negative character from the "number". (By the way, as with Sergei's suggestion, this one does not depend on the setting of any options).

    Finally I don't understand what you mean by "...apply to lines 2,3,4 and potentially 5?" I can only see 4 lines in your sample text.

    Susan

  •  04-05-2011, 7:34 PM 80407 in reply to 80406

    Re: Regex content transformation

    thanks for the reply susan, the program allows for manipulation of plain text data via RegEx.To be honest i know nothing about Regex but have tried furiously to find out more.

    Sorry if i am not explaining myself

    the plain text i get outputted from the program is

    -1 | Select Size
    980 | S
    979 | M - Not Available
    978 | L

    all on seperate lines. i want to be able to remove from all the lines the numbers and the | so my result would be

    Select Size
    S
    M - Not Available
    L

    from time to time there maybe an additional item ie. XL hence the 5th item.

     If i put your code into the program 

    "-?\d+\s*\|"

    the output i get is -1 nothing else

    if i put

    Line 1 "-?\d+\s*\|(.*)"

    Line 2 "$1"

    the output i get is

     Select Size
    980 | S
    979 | M - Not Available
    978 | L

     Thank you for trying to help me out i really appreciate it!

     

  •  04-05-2011, 11:04 PM 80408 in reply to 80407

    Re: Regex content transformation

    I must admit that I just saw your comment about the regex being .NET based - I've only now gone back to see that you are using Visual Web Ripper. (My testing has been using the Expresso regex test platform which is .NET based. I do not have access the Visual Web Ripper nor do I have the ability to download and run the trial version).

    I found a page on the web(http://www.visualwebripper.com/forum/default.aspx?g=posts&t=32) that says "...VWR only returns the first match". I really don't know this program but it might explain why you are only getting a hit on the first line.

    Also I'm making a wild guess here but could it be that your program differentiates between a "match" and a "replace" operation by there being something in "Line 2"? That would seem to explain why you appear to get a match with when "Line 2" is blank (i.e. you are seeing the text that the pattern is locating) but you are seeing the result of the "replacement" (which deletes the matched text and then replaces it with the constructed replacement string) when you have something there. If that is the case, then try Sergei's and my suggestions but use something like "$1" as the replacement string - there is no 'match group #1" defined in our patterns and so the "$1" reference will always be a null string - enough to fool the program into performing a replacement operation but still not putting anything back into the text.

    Susan

  •  04-05-2011, 11:49 PM 80409 in reply to 80408

    Re: Regex content transformation

    thanks for the help susan i think i am there or there abouts this seemed to work

    -?\d+\s+\s*\|\W(.*?)
    replace $1

    returning what i was after

    again thanks for your help!

    Legend

  •  04-06-2011, 1:06 AM 80410 in reply to 80409

    Re: Regex content transformation

    Glad it is working for you.

    By the way, '\s+\s*' is exactly the same as '\s+'. The first '\s+' will match all whitespace characters and only stop at the first non-whitespace character or the end of the string. Therefore there is nothing that the '\s*' is able to match. because the '*' quantifier will happily match zero instances,you will be getting an overall match - but the '\s*' will never do anything useful.

    Susan

  •  04-06-2011, 5:17 PM 80423 in reply to 80410

    Re: Regex content transformation

    Yes you are right good post
View as RSS news feed in XML