Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

removing html links from block of text

Last post 07-24-2008, 6:34 AM by ddrudik. 3 replies.
Sort Posts: Previous Next
  •  07-23-2008, 12:56 PM 44493

    removing html links from block of text

    I have a public form on a website where users can enter information which is saved into a database. I have been getting a lot of submissions lately with links to porn sites.  I would like to remove any links from the submissions before they are saved into the database and would like to use a regular expression to do that. I am using vbscript on an IIS server.  I wouldn't mind removing any html tags from the submission - I tried the HTML 4.01 Elements submission on the regexlib.com website but it gave me a script error - so I thought I would try and just remove the anchor tags.

    any help would be appreciated!!


    Here is what I am trying but it removes everything:

    'this is one of the strings that was submitted recently
    strNatComplaint = "<a href="">test</a> Buy Viagra Buy Viagra: http://www.google.com/notebook/public/18056478031080516769/BDQPCQwoQqtfT27Qj <a href=http://www.google.com/notebook/public/18056478031080516769/BDQPCQwoQqtfT27Qj>Buy Viagra</a>"

    strRegPattern = "(<[aA]\s).+(</[aA]>)"
    blnIgnoreCase = true
    strReplace = ""
    strNatComplaint = ereg_replace(strNatComplaint,strRegPattern,strReplace,blnIgnoreCase)

    'found this function on the web which looked like it would work
    function ereg_replace(strOriginalString, strPattern, strReplacement, varIgnoreCase)
        ' Function replaces pattern with replacement
        ' varIgnoreCase must be TRUE (match is case insensitive) or FALSE (match is case sensitive)
        dim objRegExp : set objRegExp = new RegExp
        with objRegExp
            .Pattern = strPattern
            .IgnoreCase = varIgnoreCase
            .Global = True
        end with
        ereg_replace = objRegExp.replace(strOriginalString, strReplacement)
        set objRegExp = nothing
    end function

     

  •  07-23-2008, 1:16 PM 44496 in reply to 44493

    Re: removing html links from block of text

    try to modify your regex to

    strRegPattern = "<[aA]\s+href.+?</[aA]>"

    might delete all hyperlinks. I don't do VBS so no warranties are attached.

     

  •  07-23-2008, 2:02 PM 44497 in reply to 44496

    Re: removing html links from block of text

    thanks - that worked!
    I was thinking that we might not want to rely on the href attribute in that location - <a href....   - so I took out the href piece and then added a couple of attributes in the anchor tag before the href and it still works -

     
    strRegPattern = "<[aA]\s+.+?"

    thanks again!
     

  •  07-24-2008, 6:34 AM 44520 in reply to 44497

    Re: removing html links from block of text

    Note that in ASP/VBSCRIPT . (dot) does not match \n so you would need to use something like [\S\s] instead to match newline characters in links.


    looking for a new regex book?
    Regular Expressions Cookbook
View as RSS news feed in XML