Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

need advice

  •  07-11-2012, 6:31 PM

    need advice

    Here is a problem I have been banging my head at for a couple of days.
    I was wondering if anyone could help.

    I have this class of strings with this pattern:

    s1?=s2? s3 s4?          = always follows s1 only, and only if s1 is not empty

    s1 is  ({?\(?word( word)*\)?\}?)?  (it may be empty). If there is more that one word they MUST have () around them.
    word is made out of a specific set of characters like [%a-k]+ (it is not \w+ but this is irrelevant)
    If { is present so is }. Same for (). If they are both present {} are outside. (I don't it matters here but I might as well say it).
    5 examples: 'abc', '(a)', '{bbc}', '{(a bc def)}', '' (nothing)
    This is not possible: 'a bc' (no paren around 2 words), '(a}' (mismatched ({s ), '({a b c})' ({} and inside the parens).
    If s1 is not empty it WILL be followed by '='.

    s2 is like s1, it can also be empty

    s4 is like s1 except that it cannot have {}s

    s3 is either a single word or 2 or 3 words in parens.
    Ex: '%ab', '(daa KK)', '(abc DEF ghi)'
    If there are no parens ( () or } or =) to separate s3 from its neighbour then 1 space is introduced so words are not merged into a single word (e.g. 'a b c' would become 'abc' and we wouldn't know if it's 1, 2 or 3 words)

    Example of strings belonging to this "language" are.

    'a=b AND(c d e)'  :space added before AND
    'F x'             :s3 s4, space added after F
    '{(a plus b)}=X'  :s1=s4
    'a(ff COMP gg)(alpha %%%)'   :s2 s3 s4
    '{(many items%£)}={oblabioblada}(x ROWBOAT)(a b c d e f)'    :s1=s2 s3 s4
    'XYZ'             :s3 only as s1, s2 and s4 are all empty

    My problem is to find, in s3, either the 1st word if there is only one, or the second one if there are more (2 or 3).
    They are capitalized in the examples. Those are (let's call them) the MAIN words.
    There is no need to check for parens balancing, we know that they do. 

    I am using PCRE.
    I have found patterns to retrieve MAIN when s2 is empty and s3 is a single word but not the general case.

    This is not critical, I am doing this for fun, as I try to learn more about regexes.
    If anyone is enclined to give me advice I would appreciate it.

    Thanks

    Filed under:
View Complete Thread