Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

finding doctype Regex

Last post 05-19-2008, 11:41 AM by mash. 6 replies.
Sort Posts: Previous Next
  •  05-17-2008, 4:05 PM 42363

    finding doctype Regex

    Hi ,
    I have a variable that saves the whole line doctype of a page using regular expression

    1. function get_doctype($file){
    2.     $h1tags = preg_match('/<!DOCTYPE (\w.*)dtd">/is',$file,$patterns);
    3.     $res1 = array();
    4.     array_push($res1,$patterns[0]);
    5.     array_push($res1,count($patterns[0]));
    6.    
    7.     return $res;
    8. }
     So the variable contains the string "<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1 Basic//EN"
    "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11-basic.dtd">"

    But I would like to save only SVG 1.1 Basic.

    So I am trying to make a regular expression for it like this :

    $res2 = str_replace("(.*)//DTD","", $res1);
    $res = str_replace("(^//EN) .* (dtd\")$","", $res2);

    What am I doing wrong ?
    Thank you

    P.S. : If you know could you help me and for encoding?

     

     

  •  05-18-2008, 6:23 AM 42366 in reply to 42363

    Re: finding doctype Regex

    Please ...
  •  05-18-2008, 7:51 PM 42369 in reply to 42363

    Re: finding doctype Regex

    I'm, not sure what the connection between using string replace functions and asking for a regex are but I'll let that slide. Also, I'm not sure what the rules are as to how 'doctype' tags are formatted and what variability there is in identifying the text you want to extract, but based on your pattern, I would suggest:

    <!doctype.*?//dtd\s+([^/]*)//EN.*?dtd">

    and look at match group #1 for the matched text. I'm using the same options as you have in your posting.

    You might be able to cut down the pattern if, for example, the text you are looking for only appears between the "//dtd' and '//en' strings and this is known not to appear elsewhere in the text, or you might want to leave in the first or last parts of my suggestion.

    Susan

     

  •  05-19-2008, 2:07 AM 42379 in reply to 42363

    Re: finding doctype Regex

    I am sorry Susan I don't understand .

    Could you make it more clearly ?

     

    Thanks a lot! 

  •  05-19-2008, 2:41 AM 42381 in reply to 42379

    Re: finding doctype Regex

    OK, I think I finally understand what you are trying to do.  You question wasn't very clear. In the future please try to follow the posting guidelines as much as possible.  I don't know the purpose of you application so I don't know why you can't just match the string 'SVG Basic 1.1'. Even if must match if with the DTD declaration, You can just match it with as a second regex. with the results of your first.  I don't know PHP so I don't know if the code is correct

    $res2 = str_replace("(.*)//DTD","", $res1);
    $res = str_replace("(^//EN) .* (dtd\")$","", $res2);

    $res = preg_match('/SVG 1.1 Basic/is',$res1);


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  05-19-2008, 4:42 AM 42382 in reply to 42363

    Re: finding doctype Regex

    The doctype is not constant.

    That is the reason why I would like a regular expression.

     

    I am trying

    function get_doctype($file){
        $h1tags = preg_match('/<!DOCTYPE (\w.*)dtd">/is',$file,$patterns);
        $res = array();
        array_push($res,$patterns[0]);
        array_push($res,count($patterns[0]));
        
        $res = preg_match('/XHTML 1.0 Strict/is',$res);
        
        return $res;
    }

    in page with this doctype

     But :

     Warning: preg_match() expects parameter 2 to be string, array given in my_file.php

    Thank you for replying! 

  •  05-19-2008, 11:41 AM 42388 in reply to 42382

    Re: finding doctype Regex

    OK, then Susan's regex is what you want. The value you want is in Group 1.  You'll need to read your documention on how to read from groups in your programming language. Here is a brief generic overview. http://regexadvice.com/blogs/mash/archive/2007/06/01/You_2700_ve-got-your-sub_2D00_matches-in-my-matches.aspx

    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
View as RSS news feed in XML