Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

converting self-closing xhtml tags to html

  •  08-17-2010, 12:16 AM

    converting self-closing xhtml tags to html

    I have a CMS which is written to output xhtml.  I am trying to write a php script that will convert the system - completely and permanently - to output html instead.  The sticking point is creating a regex to find self-closing xhtml tags (e.g., <img /> ) and convert them to non-self-closing html tags (e.g., <img>).  I am using php on my local machine - I am working with the source code, not the CMS output.  The CMS generates both xhtml and xml, so I can't blindly replace /> with >.  My requirements:

    1. needs to replace all valid, self-closing xhtml tags (e.g., <img />) with valid, non-self closing html tags (<img>).
    2. needs to leave self-closing xml (non-xhtml, e.g., <xmldoc><xmltag /></xmldoc>) alone.
    3. needs to catch possible whitespace (e.g., both <tag /> and <tag/> are converted to <tag>).
    4. needs to ignore <?php?> tags.

    Right now, number 4 is the problem. (See my example below.)

    What I have so far (see my original thread at Dynamic Drive):

    <?php
    function file_2html($file){
        
    $xhtmlfile file_get_contents($file);
        
    $htmlfile preg_replace('/<(img|hr|br|link|meta|input)([^>]*?)\s?\/>/''<$1$2>'$xhtmlfile);
        
    file_put_contents($file$htmlfile);
        }
    }
    ?>

    The $file variable is the path to a php, xhtml, or txt file in the CMS.  This regex works successfully in initial testing (meets requirements 1, 2, and 3) but fails when there is a <?php ?> block inside the tag (e.g., for the CMS to insert a variable: <link rel="stylesheet" media="screen" type="text/css" href="<?php echo $this->getStyleSheet('main.css')?>" /> isn't matched).

     If anyone has an idea or question, please let me know.  thanks much!

View Complete Thread