Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Matching repeated subpatterns

Last post 03-11-2010, 5:16 PM by Aussie Susan. 10 replies.
Sort Posts: Previous Next
  •  03-10-2010, 12:16 AM 60626

    Matching repeated subpatterns

    I would like to match a string with repeated subpatterns under php

    eg

    string:  abcdefghij
    re: /^(\w{2})+$/
    expected result:
    [0]=>"abcdefghij"
    [1] => "ab"
    [2] => "cd"
    [3] => "ef"
    [4] => "gh"
    [5] => "ij"

    actual result:
    [0]=>"abcdefghij"
    [1]=>'ij'

    I found that the re I use can match the all subpatterns and first subpatterns only. Any way to match all subpatterns and seperate them into different variable like the expected result?(assuming the string can be in arbitrary length)

     Thanks dude!

    Filed under:
  •  03-10-2010, 6:26 AM 60643 in reply to 60626

    Re: Matching repeated subpatterns

    /(\w{2})/

    if you want to make sure that the entire string is compose of a multiple of 2 characters, you get the length of the string and check for that.


    http://portal-vreme.ro
  •  03-10-2010, 7:20 AM 60646 in reply to 60643

    Re: Matching repeated subpatterns

    What I plan to do is much more complicated than that, but it requires similar concept.

    Can RegEx achieve that?

  •  03-10-2010, 7:25 AM 60647 in reply to 60646

    Re: Matching repeated subpatterns

    Achieve what?

    http://portal-vreme.ro
  •  03-10-2010, 9:11 AM 60651 in reply to 60647

    Re: Matching repeated subpatterns

    achieve similar result what post 1 said
  •  03-10-2010, 9:37 AM 60652 in reply to 60651

    Re: Matching repeated subpatterns

    I show you the solution for that allready. Use it with preg_match_all and that's it

    http://portal-vreme.ro
  •  03-10-2010, 11:06 AM 60653 in reply to 60646

    Re: Matching repeated subpatterns

    joe1205:

    What I plan to do is much more complicated than that, but it requires similar concept.

    Can RegEx achieve that?

    What do you plan to do?  In the Posting Guidelines we ask you tell us your overall task to help determine if a regex can/should be used and an approach if applicable.


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  03-10-2010, 11:51 PM 60676 in reply to 60653

    Re: Matching repeated subpatterns

    Here is the details

    I am working on a parser which can parse logic input from user

    basic format will be like this

    Opening Brace* + Relational operator + Digit + Opening Brace* + (Logical Operator  + Opening Brace + Relational operator + Digit + Opening Brace)* ...

    Key:

    *=optional

    Brace= (,)

    Relational operator= >, >=,<=,<,<>,=

    Logical Operator= &,|

    eg1

    User input: >4&<10

    Meaning: to find out a product with price > 4 and < 10

    eg2

    User input: >1000|(<10&>5)

    Meaning: to find out a product with price > 1000 or (price < 10 but >5)

    I have finish a draft parser but it doesn't work very well.

    /\s*(\(?)\s*(>|<|>=|<=|=|<>)\s*(\d+)\s*(\)?)\s*(?:(&|\|)\s*(\(?)\s*(>|<|>=|<=|=|<>)\s*(\d+)\s*(\)?))*/

     For eg1, it works

    [0] => >4&<10
    [1] =>
    [2] => >
    [3] => 4
    [4] =>
    [5] => &
    [6] =>
    [7] => <
    [8] => 10
    [9] => 

    For eg2, the input is found to be a full match, but for unknown reason, it doesn't  capture "<10" as subelements

    [0] => >1000|(<10&>5)
    [1] =>
    [2] => >
    [3] => 1000
    [4] =>
    [5] => &
    [6] =>
    [7] => >
    [8] => 5
    [9] => )
     

    I know this parser is far from perfect, any comments are welcome.

  •  03-11-2010, 12:56 AM 60678 in reply to 60676

    Re: Matching repeated subpatterns

    I suggest that you don;t even start down the path of using a regex as a general language parser - you are simply asking for a world of pain and frustration. There are simply too many restrictions that you will face (the captured text from a repeating matched group will only be the last text seen for example) to make this viable without a whole lot of programming code support (embedding multiple regexes within recursive procedures etc) that makes other alternatives look easy.

    There are so many language parsers out there (lex and all of its cousins for a start) that are designed for exactly this type of work.

    Susan

  •  03-11-2010, 5:52 AM 60685 in reply to 60678

    Re: Matching repeated subpatterns

    Thanks Aussie Susan

    Since I only need a simple parser and I am just one step behind what I need.

    I just hope if the missing subpattern can be captured.

    Since I am working on php, I am afraid that lex is not my cup of tea

     

  •  03-11-2010, 5:16 PM 60789 in reply to 60685

    Re: Matching repeated subpatterns

    The only way I know of to achieve what you state in your OP (i.e. to both validate that the string is made up of a sequence or 2 character pairs AND to get the values of each pair) is to use 2 passes. Basically the first pass does what your original pattern ' '^(\w{2})+$' - does in that it validates the string as a whole is the right length and composition but what it gives you is the full validated string as an output. You need to pass this string through a second pattern - '\w{2}' - that will match each of the character pairs as a separate match.

    You will note that the second pattern does NOT involve a repeating group but a repeated sequence of complete matches by the regex engine.

    Your made up example does not really need the first pass as there are other (possibly better in the circumstances) ways of performing the first validation. However if you wanted to (say) parse the declaration of a procedure where you had a name, a comma separated list of parameters and a return type, then you would use the first pass to separate these into the 3 parts (name, parameters, return type) and them use the second pass on the parameters to get each parameter definition. If you tried to get the individual parameters by simply capturing them in a repeating matching group, you would capture the last one only (as you show in your OP).

    In some of your subsequent posting, you seem to want to parse an expression that contains nested parentheses. The PCRE library (which PHP uses) does have a technique based on PCRE-specific syntax to handle this type of pattern matching but, again, what you get out is the total match at the highest level rather than the individual sub-expressions (without a whole lot of special coding that makes this approach practically unworkable in my opinion). You then need to do your own repeated checking of each nested sub-expression for nested parentheses and parsing of those if any are found.

    Susan

View as RSS news feed in XML