对于这组示例 URL/链接,提供 URL/链接的所有语义组件:
https://google.com:443 - https:// is the protocol, :443 is the port
www.google.com - www is the service, google.com is the domain, www.google.com is the FQDN
www.clients6.gc.google.com/project - clients6.gc is the subdomain, /project is the path
www.google.com/?query=x&#hash=y - ?query=x& is the query string and #hash=y is the hash string
我很高兴不解析用户/密码(不要这样做 - 安全性差)或查询参数(那些是特定于实现的,即一些 Java CGI 工具包使用
;
而不是 &
)。
使用 MDN Groups and Backreferences docs 和 Regex101 在线正则表达式创作/测试工具,我发现这个解决方案使用 Look-Forward, Non-Capturing, and Named Capturing Groups:
var URL = "https://www.sub.domain.google.com:443/maps/place/Arc+De+Triomphe/@48.8737917,2.2928388,17z?query=1&foo#hash";
var parts = URL.match(/^(?<protocol>https?:\/\/)?(?=(?<fqdn>[^:/]+))(?:(?<service>www|ww\d|cdn|mail|pop\d+|ns\d+|git)\.)?(?:(?<subdomain>[^:/]+)\.)*(?<domain>[^:/]+\.[a-z0-9]+)(?::(?<port>\d+))?(?<path>\/[^?]*)?(?:\?(?<query>[^#]*))?(?:#(?<hash>.*))?/i).groups
console.log(parts)
产量...
{
"protocol": "https://",
"fqdn": "www.sub.domain.google.com",
"service": "www",
"subdomain": "sub.domain",
"domain": "google.com",
"port": "443",
"path": "/maps/place/Arc+De+Triomphe/@48.8737917,2.2928388,17z",
"query": "query=1&foo",
"hash": "hash"
}
Pointy 指出,大多数浏览器都提供了一个 URL 类,用于将 URL 解析到它们的组件中。
console.log(new URL('www.google.com:443/path?query#hash')
产量...
URL {origin: 'null', protocol: 'www.google.com:', username: '', password: '', host: '', …}
hash: "#hash"
host: ""
hostname: ""
href: "www.google.com:443/path?query#hash"
origin: "null"
password: ""
pathname: "443/path"
port: ""
protocol: "www.google.com:"
search: "?query"
searchParams: URLSearchParams {}
username: ""
[[Prototype]]: URL
如果您要测试协议的字符串并在它丢失时添加它,那么它会起作用,但您可能会发现处理它的神奇属性具有挑战性,因为 URL 类实例拒绝被克隆或迭代。
const link = 'www.sub.domain.google.com:443/path?query=1#hash';
function parseLink(link){
link.match(/^[a-z]+:/) || (link = 'https://' + link);
var u = new URL(link);
var m = u.host.match(/^(?:(?<service>www|ww\d|cdn|mail|pop\d+|ns\d+|git)\.)?(?:(?<subdomain>[^:/]+)\.)*(?<domain>[^:/]+\.[a-z0-9]+)/).groups;
return {
fqdn:u.host,
domain:m.domain,
subdomain:m.subdomain,
service:m.service,
path:u.pathname,
port:u.port,
query:u.search,
hash:u.hash,
params:Object.fromEntries(u.searchParams.entries())
}
}
console.log(parseLink(link))
产量...
{
"fqdn": "www.sub.domain.google.com",
"domain": "google.com",
"subdomain": "sub.domain",
"service": "www",
"path": "/path",
"port": "",
"query": "?query=1",
"hash": "#hash",
"params": {
"query": "1"
}
}
请注意,它折叠了(默认)端口值,但它还处理查询参数的解析是很好的,尽管至少在 Chrome 110 上 .getAll() 方法不起作用,因此 Object.fromEntries() 调用。