Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Optimizing a string replacement algorithm in javascript

Last post 05-10-2011, 12:56 AM by Aussie Susan. 3 replies.
Sort Posts: Previous Next
  •  05-09-2011, 12:04 AM 81965

    Optimizing a string replacement algorithm in javascript

    Ok so I have a rather slow algorithm that I would really love some advice to speed up! It may repeat many times over different large data strings. The string 'data' contains "variables" (denoted by ${myVarName} in the data) that must be replaced with the value of the corresponding (either by id or name) element on the html page. I use javascript and jquery to do this. In particular, the line '\\$\\{'+rVars[key].replace(/([(])/g,"\\(").replace(/([)])/g,"\\)")+'\\}' surely can be replaced with some more efficient regex that I am not wizardy enough to invent. Perhaps I should be approaching this an entirely different way? I appreciate any help or advice!
     

    remainingVars = data.match(/\$\{.*?\}/g);
    rVars = new Array();
    reg = '';
    if(remainingVars != null) {
    for(key=0; key<=remainingVars.length; key++) {
    if(remainingVars[key] != undefined && remainingVars[key].length != undefined) {

    rVars[key] = remainingVars[key].substr(2,remainingVars[key].length-3);

    // In practice the following $('body') call is refined to a smaller context. 
    varVal = $('body').filter('[name="'+rVars[key]+'"],[id="'+rVars[key]+'"]').val();

    reg = '\\$\\{'+rVars[key].replace(/([(])/g,"\\(").replace(/([)])/g,"\\)")+'\\}';

    if(varVal != undefined) {
    data = data.replace(RegExp(reg,'g'),varVal);
    } else {
    data = data.replace(RegExp(reg,'g'),'');
    }
    }
    }
  •  05-09-2011, 2:42 AM 81966 in reply to 81965

    Re: Optimizing a string replacement algorithm in javascript

    (For what follows, I'm using "raw" patterns etc - you may need to add whatever is needed to make the characters be passed through to the regex engine correctly.)

    For a start, I would replace the '([(])' with '\('. There is no need to create a complete character set definition within the regex just to match against a single character. Also, as you are not using anything that is captured as part of the matching process, then I would also drop the surrounding parentheses. That would match the matching patterns

    \(

    and 

    \)

    Without being told what "rVars[key]" could possibly be expanded to, it is hard to say what other improvements could be made.

    However, when anyone starts talking about speeding up a regex match (or replacement), the first question is "how much of a problem is this really?". For example, if it takes 1 second and is repeated every hour, then is is really not worth spending much time on. However if it takes 1 second and is repeated every second, then it IS worth the effort.

    The next part is to examine exactly what is the slowest part of the process? You are doing 3 regex replacements as far as I can see within each iteration of the outer loop. The first 2 would appear to be modifying a regex pattern that is then used for (one of the) 3rd replacement operation. Which is actually causing the problem.

    If the "rVars" values are not changing, is it possible to do the replacements within these strings outside the loop so they get done once and are then used whenever the main part gets called?

    Also, it looks like you are scanning some piece of text ("data" ?) once for each of the "rVars" patterns (guessing here). It may be easier to scan the document once (or as few times as possible) with a combined pattern (and possibly more complex replacement string) - benchmarking may be needed once you know where the bottlenecks are.

    I would suggest that you give us some of the text that you are scanning and some examples of the patterns you are using and their corresponding replacement strings - it could also be that you are doing something with the patterns themselves (which we can't see) that is bordering on one of the pathological problem cases.

    Susan

     

  •  05-09-2011, 10:02 PM 81981 in reply to 81966

    Re: Optimizing a string replacement algorithm in javascript

    Thanks for the reply! The suggestion about brackets is certainly helpful. To expand on my situation: the above replacements may be done hundreds of times (on different datasets that cannot be combined) with the user waiting, resulting in a lag of a few seconds. It would be nice to make this lag less noticeable.

    The actual contents of 'data' may be very large, but are essentially snippets of xml. There may be hundreds of variables to be replaced. There will not be any curly brackets except for cases that need replacement. There may be open brackets outside the variable names. Something like:

    data = "<div>Some text ${replaceMe}</div><div>Some text with the same variable again ${replaceMe} and more text with some (brackets).</div><div>More text ${imAnArray(1)}</div>";

    Then on the page there are html elements for the values, like:

    "<input id='replaceMe' value='thanks' /><select><option id='imAnArray(1)' value='iAm1' /><option id='imAnArray(2)' value='iAm2' /></select>"

    The above form of remainingVars[key]=${imAnArray(1)} is about as complicated as it gets. There will not be quotes or further brackets inside it. rVars[key] is changing within the loop to correspond to each of 'replaceMe' and 'imAnArray(1)'. As you mentioned, the two replacements are done just to escape the brackets (if they exist) for use in the regex.

    Thanks again for your help! 

  •  05-10-2011, 12:56 AM 81983 in reply to 81981

    Re: Optimizing a string replacement algorithm in javascript

    I'm probably having a "grey" moment (that is a "blonde" moment but for us oldies) but I really don't understand what you are trying to do.

    With the double-replace line it looks like you are simply replacing one character with another. Rather than using a regex, a simple string function may give better performance, especially if the pattern changes all the time (I think that is what you have said) which means that the regex engine can't cache the pattern from one invocation to the next.Most programming languages have faster string processing functions than any regex.

    if you are repeatedly scanning the source text looking for the "${xxx}" pattern, it might be quicker to get the regex to look just for that generic pattern and capture the 'xxx" in between - something like:

    \$\{([^}]*)\}

    and use the captured text in match group #1. Then you can use the "xxx" as the key in an associative array to lookup the value that should be the "replacement" string. Remembering to work backwards (so the match offset values are not altered by string replacements further down the text, some pseudo-code such as:

    match-collection AllTargets = source-text.matchall( "\$\{([^}]*)\}")
    from the last AllTargets to the first as CurrentTarget
      TargetText = TargetText.substring(CurrentTarget.StartOffset, currentTarget.Length, ReplacementValues[CurrentTarget.CapturedText(1)])
    next

    This will use a single regex (generally the heaviest processing) one and then use the (generally light-weight) string functions to do the sub-string replacements.

    Susan

View as RSS news feed in XML