Wednesday, 1 February 2012

C# Replace Optimization

You have a method in your C# program that replaces many strings with other strings. This method can require a lot of time to run because it tries to replace so many things. In this article, we look at a way you can improve the performance of this kind of method without using any advanced data structures.

Examples

Let's look at the initial version of the method. It runs four Replace calls on the formal parameter to the method. String literals are used as the argument. When looking at the string literals, you can see that they contain some common substrings: two contain "<span>C" and two contain "<span>D". This property can be used to optimize.
First version of method [C#]

static string A(string text)
{
    text = text.Replace("<span>Cat ", "<span>Cats ");
    text = text.Replace("<span>Clear ", "<span>Clears ");
    text = text.Replace("<span>Dog ", "<span>Dogs ");
    text = text.Replace("<span>Draw ", "<span>Draws ");
    return text;
}

Second version of method [C#]

static string B(string text)
{
    if (text.Contains("<span>C"))
    {
 text = text.Replace("<span>Cat ", "<span>Cats ");
 text = text.Replace("<span>Clear ", "<span>Clears ");
    }
    if (text.Contains("<span>D"))
    {
 text = text.Replace("<span>Dog ", "<span>Dogs ");
 text = text.Replace("<span>Draw ", "<span>Draws ");
    }
    return text;
}
Second version. Now let's look at the optimized version. This version uses the Contains method around all the Replace calls with the specified common string literal. Thus, in version B, the first two Replace calls are only run when the common pattern is found; the second two Replace calls are also guarded.
Note: Another approach would be to use text.Contains("<span>"). This could be beneficial depending on how common that string is in the hypothetical data set.

What about StringBuilder?

The StringBuilder type contains a Replace method. This can help optimize certain situations, but the Replace call will require a search through the entire string still. If not many replacements occur, the StringBuilder will actually be slower because of the initial cost of the StringBuilder.

Summary

This replace optimization uses a heuristic to improve performance. If all the replacements always are run, guarding them in this way will hurt performance. If there are no good substrings to test for, this optimization will also not help. However, if many replacements are run, not all actually change the string, and they contain common character sequences, this technique can reduce the computational requirements of your method considerably.

No comments:

Post a Comment