Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

Last post 11-30-2007, 2:22 PM by pritesh. 4 replies.
Sort Posts: Previous Next
  •  11-30-2007, 1:12 PM 37131

    Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

    Hi,

     I was looking for a regex which would strip all HTML and included text except what is included in <h3>,<h4> and <h5> tags.

    I am planning to use this regex in javascript. 

     
    example:

     
    Input:

    <ul>
          <li><h4>Sub Menu</h4>
              <ul>
                  <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub Menu</a></li>
                  <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub Menu</a></li>
                  <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub Menu</a></li>
                  
                  <li><h5>Sub-Sub Menu</h5>
                      <ul>
                          <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub-Sub Menu</a></li>
                          <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub-Sub Menu</a></li>
                          <li><a href="http://regexadvice.com/forums/AddPost.aspx?ForumID=68#">Sub-Sub-Sub Menu</a></li>
                      </ul>
                    </li>
              </ul>
          </li>
        </ul>

    Output: <h4>Sub Menu</h4><h5>Sub-Sub Menu</h5>

     

    Please help 

    - Pritesh
     

  •  11-30-2007, 2:05 PM 37136 in reply to 37131

    Re: Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

    Do you also need to strip HTML tags and their contents which are included within the heading tags?

    If not, you could use something like this:

    var newString = oldString.match(/<(h[345])\b[^>]*>[\S\s]*?<\/\1>/gi).join("").replace(/<\/?h[345]\b[^>]*>/gi, "");

    However, due to the nature of HTML, this task would probably be better accomplished using the DOM.


    My regex-centric blog :: JavaScript regex tester
  •  11-30-2007, 2:12 PM 37137 in reply to 37136

    Re: Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

    I need the heading tags and the text included within them in the output, rest everything will be stripped as I mentioned in the example.
     

  •  11-30-2007, 2:14 PM 37138 in reply to 37137

    Re: Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

    That doesn't explicitly clarify anything.

    In any case, try what I just posted. If you need help with implementing a DOM-based solution (which would be preferable anyway in this case), you can probably find better support for that elsewhere. 


    My regex-centric blog :: JavaScript regex tester
  •  11-30-2007, 2:22 PM 37140 in reply to 37138

    Re: Regex to strip everything except <h3>, <h4> and <h5> tags and the text within them

    Sorry for not being clear. I think this example would help.

     Input: <div><p>Some text</p><h3><a href ="#">link text here</a></h3></div>

     Output:  

    <h3><a href ="#">link text here</a></h3>
View as RSS news feed in XML