Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to find all consecutive, repeated character blocks in a string. For example, consider the following:

s = r'http://www.google.com/search=ooo-jjj'

What I want to find this: www, ooo and jjj.

I tried to do it like this:

m = re.search(r'(w)11', s)

But it doesn't seem to work as I expect. Any ideas?

Also, how can I do it in Bash?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
418 views
Welcome To Ask or Share your Answers For Others

1 Answer

((w)2{2,}) matches 3 or more consecutive characters:

In [71]: import re
In [72]: s = r'http://www.google.com/search=ooo-jjjj'
In [73]: re.findall(r'((w)2{2,})', s)
Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]

In [78]: [match[0] for match in re.findall(r'((w)2{2,})', s)]
Out[78]: ['www', 'ooo', 'jjjj']

(w) matches any alphanumeric character.

((w)2) matches any alphanumeric character followed by the same character, since 2 matches the contents of group number 2. Since I nested the parentheses, group number 2 refers to the character matched by w.

Then putting it all together, ((w)2{2,}) matches any alphanumeric character, followed by the same character repeated 2 or more additional times.

In total, that means the regex require the character to be repeated 3 or more times.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...