Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need to write some javascript to strip the hostname:port part from a url, meaning I want to extract the path part only.

i.e. I want to write a function getPath(url) such that getPath("http://host:8081/path/to/something") returns "/path/to/something"

Can this be done using regular expressions?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
340 views
Welcome To Ask or Share your Answers For Others

1 Answer

RFC 3986 ( http://www.ietf.org/rfc/rfc3986.txt ) says in Appendix B

The following line is the regular expression for breaking-down a well-formed URI reference into its components.

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9

The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to

  http://www.ics.uci.edu/pub/ietf/uri/#Related

results in the following subexpression matches:

  $1 = http:
  $2 = http
  $3 = //www.ics.uci.edu
  $4 = www.ics.uci.edu
  $5 = /pub/ietf/uri/
  $6 = <undefined>
  $7 = <undefined>
  $8 = #Related
  $9 = Related

where <undefined> indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as

  scheme    = $2
  authority = $4
  path      = $5
  query     = $7
  fragment  = $9

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...