Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

Trying to flag a string containing odd/strange characters

  •  10-15-2006, 12:35 AM

    Trying to flag a string containing odd/strange characters

    I've looked all over this site, and while I can find bits and pieces that may be helpful for what I'm needing, I can't find anything that covers my exact needs.  I'm hoping someone can help me with filling in the gaps.


    I get a ton of email on some of my accounts, sometimes it's a majority of spam.  I've been using filters to delete the messages I clearly know are not valid.


    What I'm stuck on is how to flag the messages with either a From name or a Subject line that looks something like these examples:


    ß⁄¶b∫Ù∏Ù•@¨…¡»ø˙±N™Ò§Q¶~§F°A®Ï§µ§—ß⁄§~™æπD°A≥o≠”§~¨O≥ön™∫§Ë™k


    /©›/±z/∑~/¡Z/™∫/¶n/¿∞/§‚


    ßÔ≈‹¶€§v§@•Õ™∫≠›¬æ


    ª¥√P®…®¸®S≠tæ·


    ∫Ù∏Ù≠›Æt¡Õ∂’ßA§@©w≠n™æπD


    •˛∞Í≥çjProxy®—¿≥Ø∏


    °Ω∞”∞»πq§l≥¯•Dæ˜Ø≤•Œ°Ω


    From my searches on this site, I came up with this RegEx:  [^(\p{IsBasicLatin}){Sm}]  But that still flags some valid characters, and given that I want to use this to not just flag the emails, but to automatically delete them, I need to figure out a way to get a more accurate result.  This RegEx still allows characters such as ™, ©, ®, /, …, etc.  While I know I can specify these characters specifically, I'm concerned as to what characters I don't know about that I should also add to the RegEx.  I would hope these kind of characters are together somewhere and not mixed up with the more odd characters.


    I've thought about requiring that the string contains 3 or 5 of the odd characters before deleting the email which would catch a majority of the emails.  But it might not catch those emails in which the spammers use the odd characters as replacements for alpha characters (π = n, Ë = E, etc.).


    Can anyone suggest how to go about handling this?


View Complete Thread