It is possible to write a Regex which needs in some cases exponential running time. Such an example is (aa|aa)*
. If there is an input of an odd number of a
s it needs exponential running time.
It is easy to test this. If the input contains only a
s and has length 51, the Regex needs some seconds to compute (on my machine). Instead if the input length is 52 its computing time is not noticeable (I tested this with the built-in Regex-parser of the JavaRE).
I have written a Regex-parser to find the reason for this behavior, but I didn't find it. My parser can build an AST or a NFA based on a Regex. After that it can translate the NFA to a DFA. To do this it uses the powerset construction algorithm.
When I parse the Rgex mentioned above, the parser creates a NFA with 7 states - after conversion there are only 3 states left in the DFA. The DFA represents the more sensible Regex (aa)*
, which can be parsed very fast.
Thus, I don't understand why there are parsers which can be so slow. What is the reason for this? Do they not translate the NFA to a DFA? If yes, why not? And what's the technical reasons why they compute so slow?
See Question&Answers more detail:os