Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Need help with regex to match variables in a MediaWiki template

Last post 02-09-2010, 11:29 AM by mash. 1 replies.
Sort Posts: Previous Next
  •  02-09-2010, 3:43 AM 59450

    Need help with regex to match variables in a MediaWiki template

    I'm working on a MediaWiki extension that needs to read template contents and build an array of variables from the template.  The programming language is PHP5 (actually using php 5.2.11, but needs to be backward compatible to php 5.2)

    The variables are identified in a template as:

     {{{1}}} -- need to capture "1"

    {{{var}}} -- need to capture "var"

    {{{var|}}} -- need to capture "var"

    {{{var|undefined}}} -- need to capture "var"

    {{{var| }}} -- need to capture "var"

    {{{var name| }}}-- need to capture "var name"

    {{{var name}}} -- need to capture "var name"

    {{{var name|undefined}}} -- need to capture "var name"

    {{{var name |}}} -- need to capture "var name" (note the ending space after var name)

     

    The expression I have so far is:

    $b = preg_match_all('/\{\{\{([\w]*?)[\s+]?[\|+]?[\s+]?[a-zA-Z0-9_]?\}\}\}/', $a, $matches);

    where

    \{\{\{([\w]*?)[\s+]?[\|+]?[\s+]?[a-zA-Z0-9_]*?\}\}\}

    is my actual regex string.

    I'm not currently able to keep {{{var}}} -- this captured variable is showing up as blank, even though it's getting matched -- the second part of the expression that matches words after the pipe symbol is backtracking over the first match, but I don't know how to write this so that doesn't happen (I hope I'm using the right terminology here -- regular expressions are fairly new for me).

    Any assistance provided will be greatly appreciated.

    Lisa

  •  02-09-2010, 11:29 AM 59504 in reply to 59450

    Re: Need help with regex to match variables in a MediaWiki template

    OK there are several issues here.

    1. You've made every part of things inside the braces optional and since the first groups is also non-greedy it will always match null, and if any other part of you pattern satisfied the rest of the string there is no backtracking.
    2. You are misusing the character class. Only the last one is used correctly.  Common error. http://regexadvice.com/blogs/mash/archive/2008/01/31/A-touch-of-Character-Class.aspx 

    Based off the samples you provided the follow will return you value in group 1

    Raw Match Pattern:
    \{\{\{([\w ]+\w)

    PHP Code Example:
     
    <?php
    $sourcestring="your source string";
    preg_match_all('/\{\{\{([\w ]+\w)/',$sourcestring,$matches);
    echo "<pre>".print_r($matches,true);
    ?>
     

    $matches Array:
    (
        [0] => Array
            (
                [0] => {{{var
                [1] => {{{var
                [2] => {{{var
                [3] => {{{var
                [4] => {{{var name
                [5] => {{{var name
                [6] => {{{var name
                [7] => {{{var name
            )

        [1] => Array
            (
                [0] => var
                [1] => var
                [2] => var
                [3] => var
                [4] => var name
                [5] => var name
                [6] => var name
                [7] => var name
            )

    )


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML