Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

i need a regular expression to capture a given URLs SLD.

Examples:

jack.bop.com -> bop
bop.com -> bop
bop.de -> bop
bop.co.uk -> bop
bop.com.br -> bop

All bops :). So this regex needs to ignore ccTLDs, gTLDs and ccSLDs. The latter is the difficult part, since i wanna keep the regex as un-complex as possible.

The first task would be to remove ccTLDs then gTLDs, and then check for ccSLDs and remove them if present.

Any help is much appreciated :)

--

If it helps, ccTLDs are matched by:

.([a-z]{2})$

And gTLDs are matched by:

.([a-z]{3-6})$

Luckily it's two mutually exclusive patterns.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
375 views
Welcome To Ask or Share your Answers For Others

1 Answer

Technically, '.co.uk' is the second level domain in 'bop.co.uk'. What you seem to be asking for is the highest level part of the domain that was open to public registration, and you want to strip off the domain of the registrar.

RFC 6265 §5.3 calls the suffx that you don't want a "public suffix":

A "public suffix" is a domain that is controlled by a public registry, such as "com", "co.uk", and "pvt.k12.wy.us".

Mozilla maintains a list of all known public suffixes.

To create your regex, you'll have to enumerate all of the public suffixes. You should order them such that elements that are suffixes of other elements to appear later. An easy way to do this is to sort by descending length. It looks like reversing Mozilla's list would also suffice.

After that, the regex is pretty straightforward:

(.+.)?([^.]+).(?:<suffixes>)$

Where <suffixes> would be the | separated list of suffixes. A piece of it would look something like:

gov.uk|ac.uk|co.uk|com|org|net|us|uk

There are ways to make this shorter, by collapsing common-suffixes, though this makes the regex (and the process of computing it) much more complex. For example:

(?:gov.|ac.|co.|)uk|com|org|net|us

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...