Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm a little confused about regular expressions and greedy vs lazy. It's really very simple and it feels like I'm missing something obvious.

I've simplified my problem as much as I can to make it clear. Consider the following string and regex pattern.

string:
aaxxxb

pattern:
(?<=a)(.*?)(?=b)

result:
axxx

what I expected:
xxx

This result is what I would expect from using .* instead of .*?, what am I missing?

Obviously, same thing if I use a.*?b gives me aaxxxb. Why is this? Shouldn't lazy (like .*?) return as few characters as possible?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
521 views
Welcome To Ask or Share your Answers For Others

1 Answer

You are missing the fact that a regex engine works from left to right, position by position, and succeeds as soon as it finds a match at the current position.

In your example, the first position where the pattern succeeds is at the second "a".

The laziness works only on the right side.

If you want to obtain "xxx", a better way is to use a negated character class [^ab]* instead of .*?

Note: not exactly related to the subject, but good to know: a DFA regex engine will try to get the largest result in case of alternation, a NFA gives you the first that succeeds.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...