Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'd like to match the urls like this:

input:

x = "https://play.google.com/store/apps/details?id=com.alibaba.aliexpresshd&hl=en"

get_id(x)

output:

com.alibaba.aliexpresshd

What is the best way to do it with re in python?

def get_id(toParse):
    return re.search('id=(WHAT TO WRITE HERE?)', toParse).groups()[0]

I found only the case with exactly one dot.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
353 views
Welcome To Ask or Share your Answers For Others

1 Answer

You could try:

r'?id=([a-zA-Z.]+)'

For your regex, like so:

def get_id(toParse)
    regex = r'?id=([a-zA-Z.]+)'
    x = re.findall(regex, toParse)[0]
    return x

Regex -

By adding r before the actual regex code, we specify that it is a raw string, so we don't have to add multiple backslashes before every command, which is better explained here.

? holds special meaning for the regex system, so to match a question mark, we precede it by a backslash like ?
id= matches the id= part of the extraction
([a-zA-Z.]+) is the group(0) of the regex, which matches the id of the URL. Hence, by saying [0], we are able to return the desired text.

Note - I have used re.findall for this, because it returns an array [] whose element at index 0 is the extracted text.

I recommend you take a look at rexegg.com for a full list of regex syntax.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...