Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Hashtag regex

Last post 07-29-2010, 4:07 AM by IanBlack. 4 replies.
Sort Posts: Previous Next
  •  07-27-2010, 7:29 AM 70151

    Hashtag regex

    Hi there,

    I'm trying to match hashtags within a string in Javascript.

    So far I have "/#\w+/g" which is working great and matches all of the hashtags within the string except that it is also picking up on html codes.

    So for example the #33 inside ! is being matched. I would like to only match the hashtags if they do not start with a & and end in a ;

    I've tried using a negative lookbehind, ?!=& but that doesn't seem to be working.

    Any help is greatly appreciated!

    Ian. 

    Filed under:
  •  07-27-2010, 11:56 AM 70156 in reply to 70151

    Re: Hashtag regex

    Without knowing what your source looks like I don't really have a suggestion to handle all cases. http://regexadvice.com/blogs/mash/archive/2007/10/01/Remember-where-you-come-from.aspx

     But the JavaScript regex engine does not support lookbehinds so you'll have to approach your problem differently regardless


    Michael

    "In theory, theory and practice are the same. In practice, they are not."
    Albert Einstein
  •  07-28-2010, 5:06 PM 70183 in reply to 70156

    Re: Hashtag regex

    Sorry I thought I had given enough information about what was in the string.

    The string will be a tweet from twitter so could be:

    "Just finished watching season 1 of #kings and sad It got cancelled ! Would have been an amazing show if it was given a chance

    Sometimes as well as hashtags they include html codes. Not sure why but it just happens sometimes so the one in that tweet corresponds to an exclamation mark.

    I would like to find all occurrences of the hashtags so that I can turn them into links but I just want to leave the instance where it's actual an html code (starts with a & and ends with a ;)

    Thanks for the info about the lookbehind I didn't realise that it didn't work in javascript! :)

    Ian. 

  •  07-28-2010, 7:07 PM 70185 in reply to 70183

    Re: Hashtag regex

    Given the limitations of the Javascript regex variant (not just the lack of lookbehinds that Mash mentioned, but also the lack of atomic groups), a possible way would be to use:

    \#(x?[\da-f]+;|(\w+))

    (you may not need the '\' before the '#' as the Javascript regex doesn't support comments either and so may well not treat the '#' as the start of a comment) and then look to see if anything is captured in match group #2: if there is something then it is your "hashtag" (assuming '\w+' will match any possible hashtag - I have no idea so you may need to modify this bit) and if not then it was a character encoding.

    Based on this, you can decide if you want to process the match (using the location and length information in the match object) or skip over it.

    By the way, I've not fully tested this but that part of the pattern should handle both decimal and hexadecimal encoded characters.

    Susan

  •  07-29-2010, 4:07 AM 70209 in reply to 70185

    Re: Hashtag regex

    Thanks Susan, you have helped me solve it. I wasn't thinking that I could match both first and then check afterwards whether one of them ended in a semi-colon.

    I've ended up using this regex which also allows hyphens:

    #([a-z0-9-]+;|[a-z0-9-]+)

     

    So just for interest or anyone else who comes across the same problem my code looks like this:

    function convertLinks(text)
    {
    text = text.replace(/#([a-z0-9-]+;|[a-z0-9-]+)/ig, function (t) {
    if (t.substring(t.length - 1, t.length) == ";")
    { return t; // ends with a semi-colon so must be an html code so just return the original text } else
    {
    var tag = t.replace("#", "%23");
    return "<a href='http://search.twitter.com/search?q=" + tag + "' target='_blank'>" + t + "</a>";
                    } 
    }
    }

    Thanks to both of you for the help!

    Ian. 

View as RSS news feed in XML