Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

How catch all occurences?

Last post 04-07-2011, 7:17 PM by hyle. 3 replies.
Sort Posts: Previous Next
  •  04-07-2011, 6:47 AM 80428

    How catch all occurences?


     I am working in PHP and came up with this:


         preg_replace('/(.*?)[Cc]-? ?[Ss]harp(.*)/i', '$1C#$2', $wgTitle) );

     It is intended to catch "csharp", "c sharp", and "c-sharp" regardless of case and replace it with "C#". The trouble is I do not know how to make it catch all occurences in the string. I have tried, but end up with a huge, unwieldy string which doesn't work anyway.

     Any help appreciated...

    C# Online.NET
    Filed under:
  •  04-07-2011, 12:52 PM 80442 in reply to 80428

    Re: How catch all occurences?

    I would try:

    preg_replace('/c[ -]?sharp/i', 'C#', $wgTitle)


  •  04-07-2011, 7:06 PM 80450 in reply to 80428

    Re: How catch all occurences?

    By way of explanation as to why your approach will not work, you are falling into a (seemingly common) trap of thinking that you need to match the entire string with the pattern, rather than just the characters you are interested in.

    What will happen with your pattern is as follows:

    - (.*?)[Cc]      - this will match zero or more characters (matching as few as possible) until it comes to an upper- or lower-case "C"
    -? ?[Ss]harp  - this will match the literal characters specified; by the way, if the "c" that stopped the above match is NOT followed by these characters, then the "c" will be assumed to be part of the '.*?' match and the above line repeated until you actually do get to successfully match this part of the pattern
    - (.*)              - this will match everything to the end of the line (not necessarily the end of the string as the "singleline" option has not been specified)

    Let's say that you have found a instance of the target characters. The '(.*)' at the end will have moved the last matched character to the end of the line. When the regex engine tries to repeat the match/replace, it will always start from the next character after where it finished the last time. That means that it will start at the end of the line (or the end of the string). Now the '(.*?)' will again be stopped by the newline/end of string and so the 2nd attempt to match/replace will fail and the regex engine will consider its job done.

    In this case, simply removing the '(.*)' from the end of the pattern may have got things going if there were 2 or more instances of the target string on the same line. However if you have the instances of the target string on separate lines, then the '(.*?)' at the start will always stop the search at the end of the first line.

    (By the way, including the "singleline" option would have helped the '(.*?)' situation but the '(.*)' at the end would have then matched everything to the end of the string and so you would still have only ever got 1 match)

    Doug's suggested pattern gets around all of these problems by only trying to match on the target characters (as well as fixing up the character set definitions etc). What the regex engine will do is to start trying to match the pattern at the first character in the string. If it fails, it will automatically stop forward one character (regardless of what that character might be, including newlines etc) and try the match again. This is repeated until either a match is made or it reaches the end of the string (in which case it finishes). If it does make a match and it has been told to make multiple matches, as I said above, it will start looking for the next match at the character after the end of the last match - note that it does NOT just step forward 1 character in this case but effectively jumps the entire previous match.


  •  04-07-2011, 7:17 PM 80451 in reply to 80442

    Re: How catch all occurences?

    Thx! You are a genius!! (Compared to me.)
    C# Online.NET
View as RSS news feed in XML