Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have weird list of items and lists like this with | as a delimiters and [[ ]] as a parenthesis. It looks like this:

| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

I want to match items in lists called Ulist* (items 4-8) using RegEx and replace them with Uitem*. The result should look like this:

| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

I tryied almost everything I know about RegEx, but I haven't found any RegEx matching each item inside if the Ulists. My current RegEx:

/Ulist(d+)[[(s*(|s*[^s|]*)*s*)*]]/i

What is wrong? I am beginner with RegEx.

It is in Python 2.7, specifically my code is:

    def fixDirtyLists(self, text):
        text = textlib.replaceExcept(text, r'Ulist(d+)[[(s*(|s*[^s|]*)*s*)*]]', r'Ulist1[[ U3 ]]', '', site=self.site)
        return text

text gets that weird list, textlib replaces RegEx with RegEx. Not complicated at all.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
178 views
Welcome To Ask or Share your Answers For Others

1 Answer

If you install PyPi regex module (with Python 2.7.9+ it can be done by a mere pip install regex when in Python27Scripts folder), you will be able to match nested square brackets. You can match the strings you need, replace item with Uitem inside only those substrings.

The pattern (see demo, note that PyPi regex recursion resembles that of PCRE):

(Ulistd+)([[(?>[^][]|](?!])|[(?![)|(?2))*]])
^-Group1-^^-----------Group2--------------------^

A short explanation: (Ulistd+) is Group 1 that matches a literal word Ulist followed by 1 or more digits followed by ([[(?>[^][]|](?!])|[(?![)|(?2))*]]) that matches substrings starting with [[ up to the corresponding ]].

And the Python code:

>>> import regex
>>> s = "| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14"
>>> pat = r'(Ulistd+)([[(?>[^][]|](?!])|[(?![)|(?2))*]])'
>>> res = regex.sub(pat, lambda m: m.group(1) + m.group(2).replace("item", "Uitem"), s)
>>> print(res)
| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

To avoid modifying lists inside Ulist, use

def repl(m):
    return "".join([x.replace("item", "Uitem") if not x.startswith("list") else x for x in regex.split(r'listd*[{2}[^]]*(?:](?!])[^]]*)*]]', m.group(0))])

and replace the regex.sub with

res = regex.sub(pat, repl, s)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...