Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have an URLs from the access log. Example: /someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w

/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen

I cannot make any assumption on the service name or the function name.

I'm trying to find a regex that can only match in the first log:

67814
alloy%20nudge%20w

and in the second:

asdNmasdf423-asd342e
FS443GH
front%20parking%20sen

with some heuristic, I tried to use [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} match only long strings but the function names(getPersonFromAllAccessoriesByDescription, getDealerFromSomethingSomething) also had been caught.

I was thinking about regex that can do the same as [a-zA-Z0-9_%-]{15,} but with condition that it must be at least one digit, so this way the function names will be skipped.

Thank you

question from:https://stackoverflow.com/questions/65944214/regexpython-extract-from-url-path-parameters

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
151 views
Welcome To Ask or Share your Answers For Others

1 Answer

Your heuristics is fine, use

(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}

See proof.

Explanation

--------------------------------------------------------------------------------
                         the boundary between a word char (w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [a-zA-Z_%-]*             any character of: 'a' to 'z', 'A' to
                             'Z', '_', '%', '-' (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    [0-9]                    any character of: '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  [a-zA-Z0-9_%-]{5,}       any character of: 'a' to 'z', 'A' to 'Z',
                           '0' to '9', '_', '%', '-' (at least 5
                           times (matching the most amount possible))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...