Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

pattern ({[|something|]})

Last post 04-20-2012, 11:50 AM by googerdi. 10 replies.
Sort Posts: Previous Next
  •  04-06-2012, 5:15 AM 84908

    pattern ({[|something|]})

    Hi,

     

    What is regular expression that grep "something".

     

    Thanks in advance

  •  04-09-2012, 8:15 PM 84930 in reply to 84908

    Re: pattern ({[|something|]})

    I assume you are trying to use the "grep" utility to locate the word "something" anywhere in a file.

    As far as "grep" is concerned this could be something as simple as:

    grep something myfile.txt

    However there are 3 levels of complexity that you need to consider. The "lowest" level (as it were) is the actual pattern that is seen by the "grep" utility. There are a large number of operators that can be used to build up the pattern but it all depends on the version of "grep" that you are using as to which syntax variant is used (the GNU grep is not the same as some of older UNIX version of grep).

    The next level of complexity are the various command qualifiers that can be applied to the pattern that control how the actual "grep" utility parses the pattern. For example, there are options top direct "grep" to use a specific syntax or to assume certain pattern modifiers are set (e.g. "ignore case" or "multiline"). These are included in the "grep" command line in various ways and also depend on the "grep" variant you are using.

    The final level of complexity is to make sure that "grep" actually gets to see the pattern as you intend it to be seen. The "grep" command line is typically parsed by the shell which can perform character substitutions and interpretations of its own on the characters used in a pattern. Therefore you need to understand the way the shell works to make sure that you actually specify the pattern in such a way that it is passed to "grep" in the way you intend.

    I would suggest that you tell us the actual problem you are facing along with the environment you are using (Windows vs Unix vs whatever, the actual version of "grep" etc).

    By the way, if I have misunderstood your question and you are trying to understand what the pattern '({[|something|]})' would match, then it would either return a syntax error (incorrect use of the '{' and '}' characters) or match a single  character from the set "e, g, h, i, m, n, o, s, |" that, again depending on the "grep" you are using, either saving the result in a capture group or have the character surrounded by parentheses. Either way the character set definition is poorly written as it contains a duplicate '|' character - the pattern parser will generally ignore teh duplicated character.

    Susan

  •  04-10-2012, 3:57 AM 84931 in reply to 84930

    Re: pattern ({[|something|]})

    Thanks Aussie Susan,

    I'm sorry because i didn't said exactly.The thing that i want to is this :

    I save ({[|something1|]})({[|something2|]})({[|something3|]})({[|something4|]}) in table. after fetching it from database i want to extract something1,something2,something3,something4.

  •  04-10-2012, 9:58 PM 84936 in reply to 84931

    Re: pattern ({[|something|]})

    What database are you using? Some (such as MySQL) support the use of regex patterns in a SELECT statement but many don't. Therefore you might not be able to perform the selection directly but will need to extract the record to a text file and then use some other program to perform the extraction. Going by your original comment about using "grep" then I'm guessing that this is what you are doing.

    Therefore, does your text line look EXACTLY like

    ({[|something1|]})({[|something2|]})({[|something3|]})({[|something4|]})

    Are there multiple lines in the text file? Do they all look like this or are there other variations and you only want to select these lines?

    Do you want to extract the character sequences you mention as a line or as separate items?

    Perhaps providing us with an example of exactly how the source looks and how you want the output to look would help.

    If you are using grep, please remember that it works by finding lines within a file that match the pattern but it does not do anything with the lines it finds other than send them to the output file. you may find something like 'awk" or "sed" better if you want the manipulation part as well.

    You might try a pattern such as

    \(\{\[\|\w+\|\]\}\)

    to locate a single item on a line and have grep output that line. WE can talk about what to do after you have answered the above questions to make sure that I'm on the right track.

    Susan

  •  04-16-2012, 7:24 AM 84967 in reply to 84936

    Re: pattern ({[|something|]})

    Thanks Susan

     At first  ({[|something1|]})({[|something2|]})({[|something3|]})({[|something4|]}) is pattern and can continue as long as it cans.

    Second something can be any thing even |]} or {[| .  I use this for save something in one field(And I have to do it because overhead of joining is huge and i can't do it . I have to do some self joining that it is not possible).

    I used this pattern because i wanted to be somehow sure that any one won't use this combination.

     

    Third I use php.

    and finally that pattern doen't solve my problem.

  •  04-16-2012, 7:33 PM 84971 in reply to 84967

    Re: pattern ({[|something|]})

    I must admit that I'm struggling to understand exactly what you are trying to do here.

    What I suggest is that you start with a sample of the line of text you are dealing with. Then with pencil and paper, write down the steps you would use to go through the line, character by character, to locate and extract the text you are after. Pretend you are a computer and what you write down is the "program" so you need to keep the steps clear and simple ("is the next character alphanumeric" or "skip forward to the next "(" character" etc.).

    When you have the set of rules, then get several other samples of the input text and check that the rules work with all of them.

    Then let us know the samples of text you used and the rules you have defined. With those we may be able to help you define a suitable pattern.

    Susan

  •  04-18-2012, 9:00 AM 84989 in reply to 84971

    Re: pattern ({[|something|]})

    How can i tell with regular expressions that extract every thing that is between ({[| and |]}). It could be every thing between word, number,white spaces and even punctuations like | ] } ).

    The only thing that I'll do before parsing is to remove ({[| and |]}) to prevent extracting unwanted ones.

    I don't know how to tell , end when arrive at  |]}) sequence. ^ work only for one character and I can't tell something like this (\(\{\[\|(^\|\]\}\)))+

    Thanks

  •  04-18-2012, 7:13 PM 84994 in reply to 84989

    Re: pattern ({[|something|]})

    I can see a couple of issues with your pattern.

    Firstly, if we break down your pattern it becomes:

    ( - start a numbered capture group #1

    \( - match a literal open parenthesis
    \{ - match a literal open curly bracket
    \[ - match a literal open square bracket
    \| - match a literal vertical bar

    ( - start a numbered capture group #2

    ^ - match the beginning of the text or line (depending on the setting of the "multiline" option
    \| - match a literal vertical bar
    \] - match a literal close square bracket
    \} - match a literal close curly bracket
    \) - match a literal close parenthesis

    ) - end the numbered capture group #2

    )+ - end the numbered capture group #1 and repeat it one or more times

    At the start of this thread, you mentioned that you were using "grep". What follows its possible closer to what you want BUT it will NOT work with "grep". It will work with several other regex libraries and language classes but you have not mentioned anything else except an unspecified database (which I think you are using to store the strings and not to process them).

    My previously suggested pattern was based on the understanding that what was between the opening and closing character sequences were alphanumeric characters as shown in your examples (one of the reasons why we always ask that you DON'T make up your example text). However if you can have any character sequence EXCEPT "|]})" then you could try

    \(\{\[\|.*?\|\]\}\)

    with the "singleline" option set if you also want to line line terminators. This pattern requires that the regex variant can handle lazy quantifiers. (Just a warning - the performance of this pattern can be quite poor because of the backtracking that the lazy quantifier can trigger.

    An alternative is:

    \(\{\[\|((?!\|\]\}\)).)+

    again with the "singleline" option set if you want this to cross line terminators. This pattern requires that the regex variant can handle negative lookaheads. It also has the advantage (or perhaps disadvantage) that it will allow a "({[|" sequence at the start and a missing "|]})" sequence right at the very end. Finally the performance is better than the previous example because there is no backtracking involved.

    Susan

  •  04-19-2012, 5:18 AM 84996 in reply to 84994

    Re: pattern ({[|something|]})

    Thanks Susan 

    Here is what i did :

    $value = "({[|something1|]})({[|something2|]})({[|something3|]})({[|something4|]})";
    $pattern = "/\(\{\[\|((?!\|\]\}\)).)+/";

    preg_match_all($pattern2,$value,$result);
    var_dump($result);

     

    I get this result 

    array(2) {
      [0]=>
      array(4) {
        [0]=>
        string(14) "({[|something1"
        [1]=>
        string(14) "({[|something2"
        [2]=>
        string(14) "({[|something3"
        [3]=>
        string(14) "({[|something4"
      }
      [1]=>
      array(4) {
        [0]=>
        string(1) "1"
        [1]=>
        string(1) "2"
        [2]=>
        string(1) "3"
        [3]=>
        string(1) "4"
      }
    }
    

     

    at the end i'd like to know what is the meaning of ! in your pattern.

  •  04-19-2012, 7:31 PM 84999 in reply to 84996

    Re: pattern ({[|something|]})

    Now we are making progress.

    It looks like you are using PHP which in turn uses the PCRE regex library - that would have been useful to know (rather than references to "grep").

    Now try changing your pattern to:

    $pattern = "/\(\{\[\|(((?!\|\]\}\)).)+)/";

    and then look at "result[1]" which will be an array containing all of the "somethingx" strings but without the leading marker sequence.

    The '!' is not an operator in its own right - it is part of the '(?!   )' lookahead. If yuo break down the revised pattern I've just given it becomes:

     /\(\{\[\|   - this is the literal match for the starting marker
    (            - start of capture group #1 - this will be the main target for extracting the text you are after
       (         - start of an inner matching group #2 - its purpose is to match each character at a time and have the quantifier have it repeat as many times as necessary
          (?!\|\]\}\))   - this is a negative lookahead group. The '?!' at the start is part of the operator and the rest are the literal characters that need to be matched
          .      - the the lookahead lets us get this far then match the next character
       )+       - the end of matching group #2 and the quantifier to make it repeat as often as possible
    )            - the end of capture group #1

    Lookaheads are a fairly advanced concept with regex patterns. I like to think of them a bit like a subroutine in a program: the environment marks where you are, it goes off to do something and then comes back and carries on from the saved spot. The same is basically true with lookaheads: the text pointer is (in effect) saved and the pattern within the parentheses is used to test the characters that follow in the text; once the test has been completed then the previous environment is restored and the regex engine continues processing as before.

    What a lookahead does is really make a "match/no-match" decision and is one of a group of "zero-width assertions". When a regex engine has a character such as 'D', it will try to match the next character in the text with a "D" - if it matches then the text pointer is moved on and the character is included in the current "match"; if it doesn't match then text pointer stays where it is and the pattern is checked to see if there are any alternative characters that should be tested. As you can see, when a match is made, the text pointer 'consumes" the matched character.

    A "zero-width assertion" does almost the same thing but instead of moving the text pointer on a successful match, it only tests the next character(s) to see if they match. An example is the '$' operator that checks to see if the next character is a line terminator. If it is, then the line terminator character remains the next character .

    A lookahead is exactly the same except that it can use a more complex pattern that you provide.

    All of these assertions return a "match/no-match" result. There are 2 forms of lookahead that are referred to as "positive" and "negative". A positive lookahead will return a "match" of the entire lookahead pattern does match the next character(s) in the text.

    A 'negative' lookahead is similar to the "not" expression in a program - it flips the "match/no-match" result from the lookahead pattern and returns that.

    In this case we want to match each character but stop when we get to the "|]})" sequence. Therefore the lookahead pattern is used to check for exactly that character sequence but we want it to "match" if the next characters are NOT the exact sequence - therefore we need to use the negative lookahead.

    I hope this all makes sense - as I said, lookaheads are regarded as an advanced topic and they can make your head spin as you try to understand them.

    Susan

  •  04-20-2012, 11:50 AM 85000 in reply to 84999

    Re: pattern ({[|something|]})

    Thanks Susan

    Your pattern worked and your description was so useful.

    Now I get these as result :

    array(3) {
      [0]=>
      array(4) {
        [0]=>
        string(14) "({[|something1"
        [1]=>
        string(14) "({[|something2"
        [2]=>
        string(14) "({[|something3"
        [3]=>
        string(14) "({[|something4"
      }
      [1]=>
      array(4) {
        [0]=>
        string(10) "something1"
        [1]=>
        string(10) "something2"
        [2]=>
        string(10) "something3"
        [3]=>
        string(10) "something4"
      }
      [2]=>
      array(4) {
        [0]=>
        string(1) "1"
        [1]=>
        string(1) "2"
        [2]=>
        string(1) "3"
        [3]=>
        string(1) "4"
      }
    }
    

     

View as RSS news feed in XML