I have a CMS which is written to output xhtml. I am trying to write a php script that will convert the system - completely and permanently - to output html instead. The sticking point is creating a regex to find self-closing xhtml tags (e.g., <img /> ) and convert them to non-self-closing html tags (e.g., <img>). I am using php on my local machine - I am working with the source code, not the CMS output. The CMS generates both xhtml and xml, so I can't blindly replace /> with >. My requirements:
- needs to replace all valid, self-closing xhtml tags (e.g., <img />) with valid, non-self closing html tags (<img>).
- needs to leave self-closing xml (non-xhtml, e.g., <xmldoc><xmltag /></xmldoc>) alone.
- needs to catch possible whitespace (e.g., both <tag /> and <tag/> are converted to <tag>).
- needs to ignore <?php?> tags.
Right now, number 4 is the problem. (See my example below.)
What I have so far (see my original thread at Dynamic Drive):
$xhtmlfile = file_get_contents($file);
$htmlfile = preg_replace('/<(img|hr|br|link|meta|input)([^>]*?)\s?\/>/', '<$1$2>', $xhtmlfile);
The $file variable is the path to a php, xhtml, or txt file in the CMS. This regex works successfully in initial testing (meets requirements 1, 2, and 3) but fails when there is a <?php ?> block inside the tag (e.g., for the CMS to insert a variable:
<link rel="stylesheet" media="screen" type="text/css" href="<?php echo $this->getStyleSheet('main.css')?>" /> isn't matched).
If anyone has an idea or question, please let me know. thanks much!