Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

This is an example string:

123456#p654321

Currently, I am using this match to capture 123456 and 654321 in to two different groups:

([0-9].*)#p([0-9].*)

But on occasions, the #p654321 part of the string will not be there, so I will only want to capture the first group. I tried to make the second group "optional" by appending ? to it, which works, but only as long as there is a #p at the end of the remaining string.

What would be the best way to solve this problem?

question from:https://stackoverflow.com/questions/66066650/r-using-tidyrs-extract-and-regex-to-extract-values-from-structured-character

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
172 views
Welcome To Ask or Share your Answers For Others

1 Answer

You have the #p outside of the capturing group, which makes it a required piece of the result. You are also using the dot character (.) improperly. Dot (in most reg-ex variants) will match any character. Change it to:

([0-9]*)(?:#p([0-9]*))?

The (?:) syntax is how you get a non-capturing group. We then capture just the digits that you're interested in. Finally, we make the whole thing optional.

Also, most reg-ex variants have a d character class for digits. So you could simplify even further:

(d*)(?:#p(d*))?

As another person has pointed out, the * operator could potentially match zero digits. To prevent this, use the + operator instead:

(d+)(?:#p(d+))?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...