Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Help with removing unwanted attributes from anchors (hyperlinks)

Last post 05-15-2008, 6:35 AM by robstoves. 5 replies.
Sort Posts: Previous Next
  •  02-28-2008, 1:35 PM 39947

    Help with removing unwanted attributes from anchors (hyperlinks)

    Hi there,

    I've signed up on this forum in the hope that one of the Reg Experts can help me though a problem.  I'm working on a site which is an exchange for MindManager maps. Users can upload their own maps and enter in title, description and tags etc.

    I am currently using a regular expression to remove all traces of HTML from the desciption field before saving to the db when a user adds a new map. This works just fine, however, I would like people to be able to include a hyperlink or two in the description if they so desire.

    My goal is to remove all attributes other than:

    • href
    • target

    Target must for forced to "_blank".

    I don't want any onclick, class, id etc attributes left in.

    Here's an example:

    User input - <a href="http://www.mycoolsite.com" onclick="sneakyFunction()" name="myLink" class="clsLink">Visit my cool site</a> 

    Becomes -  <a href="http://www.mycoolsite.com" target="_blank">Visit my cool site</a>

    I plan to use the RegExp replace function to achieve this (provided I can sort out the pattern!) 

    I am using ASP VBScript on this site.

    If anyone could offer some advice on this I would be most grateful!

    All the best,

    Nick

  •  02-28-2008, 3:01 PM 39949 in reply to 39947

    Re: Help with removing unwanted attributes from anchors (hyperlinks)

    <%
    function sanitizehtml(ByVal html)
      Set regEx = New RegExp
      regEx.Global = True
      regEx.IgnoreCase = True
      regEx.Pattern = "<[^>]*>|[\S\s]*?(?=<[^>]*>)"
      Set Matches = regEx.Execute(html)
      For z = 0 to Matches.Count-1
        regEx.Pattern = "^<[^>]*>$"
        If regEx.Test(Matches(z)) then
          regEx.Pattern = "^<a\s+[^>]*(href=(['""])[\S\s]*?\2)"
          If regEx.Test(Matches(z)) then
            Set Matches2 = regEx.Execute(Matches(z))
            text = "<a " & Matches2(0).SubMatches(0) & ">"
          Else
            regEx.Pattern = "^<\s*/a>$"
            If regEx.Test(Matches(z)) then
              text = Matches(z)
            Else
              text = ""
            End if
          End if
        Else
          text = Matches(z)
        End if
        sanitizehtml = sanitizehtml & text
      Next
    end Function

    input="text0<a test> text1 <a href=""http://www.mycoolsite.com"" onclick=""sneakyFunction()"" name=""myLink""

    class=""clsLink"">Visit my cool site</a> text00 text01 <sometag> text2"
    response.write Server.HTMLEncode(sanitizehtml(input))
    %>

     

    In your code, you wouldn't use:

    response.write Server.HTMLEncode(sanitizehtml(input))

    you would just use:

    yourvar = sanitizehtml(input)

    The HTMLEncode was just to display what the code is doing to the string.




    looking for a new regex book?
    Regular Expressions Cookbook
  •  02-29-2008, 4:27 AM 39958 in reply to 39949

    Re: Help with removing unwanted attributes from anchors (hyperlinks)

    Hi ddrudik,

    This is great! You sir - are a God. 

    I made one tweak to add in the target="_blank" attribute. Its done. I owe you a pint.

    Many thanks,

    Nick

  •  05-14-2008, 6:36 AM 42235 in reply to 39949

    Re: Help with removing unwanted attributes from anchors (hyperlinks)

    Dear God (ddrudik)

    I need to do something very similar to Nick but am having difficulty adapting your code.  I wonder if you could cast your knowledgable eyes over my requirements?

    Input string
    <div>test</div><p>Paragraph text <a href="someurl.com">link text</a></p>

    Required output string
    <div>test</div><p>Paragraph text link text</p>

    So basically I need to remove the start and end anchor tags but leave the link text and any other html intact.

    Thanks in advance for any help you can give me

    Rob

    Filed under:
  •  05-14-2008, 7:14 PM 42265 in reply to 42235

    Re: Help with removing unwanted attributes from anchors (hyperlinks)

    Rob,

    Sorry I'm not God/ddrudik, but I have a suggestion: try using the pattern

    <a\s+[^>]*>([^<]*)</a>

    with the replacement string of

    $1

    and the 'ignore case' option set if you are ever likely to have the tag name in upper case.

    If you want to be more specific and only get tags with the 'href' attribute in them somewhere, then use the pattern:

    <a\s+(?:(?=.*?href)[^>]*)>([^<]*)</a>

    Susan

     

  •  05-15-2008, 6:35 AM 42293 in reply to 42265

    Re: Help with removing unwanted attributes from anchors (hyperlinks)

    Thanks so much Susan, that works a treat. Smile

View as RSS news feed in XML