Performance Gotchas in .Net 2 - Regex timeouts
Every programming framework has certain corner cases suck the performance out of an application. The .Net Framework is no exception. I’ve discovered a few in my work with C#, and blog about them as I find time.
I love using regex for search/replace/text manipulation tasks in programming. You don’t have to go to full perl mode for the awesomeness that is
Catastophic backtracking takes exponential time, leading to long page load times, unresponsive applications and denial of service attacks on websites that use broken regexes. Jeff Atwood covered some of the bare basics of horrendous regex performance here, ending with a wish for a way to keep regular expressions from going full into a full ReDoS on your server. Some years later in .NET 4.5, his wish has been granted. C# and VB.NET now allow you to specify a
TimeSpan that denotes how long a regular expression is allowed to take before giving up.
Below is a C# snippet based on Atwood’s code that can be run in LINQpad to illustrate the timeout in action:
string pattern = "(x+x+)+y";
RegexOptions options = RegexOptions.None;
Regex re = new Regex(pattern, options, TimeSpan.FromMilliseconds(1000));
string success = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxy";
string failure = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
Even though this is no excuse to keep from writing proper regular expressions, I like that it creates another layer of defense in depth against denial of service attacks based on the clever intern’s bad regex from last summer.