开发人员最近为我编写了一个函数,用于在给定基本URL和要标准化的URL的情况下标准化URL。请参见下面的功能:
int
starts_with(const char *str, const char *pre)
{
size_t lenstr;
size_t lenpre;
if (str == NULL || pre == NULL)
return (-1);
lenstr = strlen(str);
lenpre = strlen(pre);
if (lenstr < lenpre)
return (-1);
return (memcmp(pre, str, lenpre));
}
char *
url_sanitize(char *base_url, char *url, int size)
{
char *newurl;
int base_url_len = strlen(base_url);
if (starts_with(url, "http") == 0) {
newurl = malloc(size+1);
if (newurl == NULL) {
fprintf(stderr, "1 malloc() of %d bytes, failed\n", size);
exit(1);
}
strncpy(newurl, url, size);
newurl[size] = '\0';
} else {
if (starts_with(url, "//") == 0) {
newurl = malloc(size+7);
if (newurl == NULL) {
fprintf(stderr, "2 malloc() of %d bytes, failed\n", size);
exit(1);
}
strncpy(newurl, "https:", 6);
strncpy(newurl+6, url, size);
newurl[size+6] = '\0';
} else {
newurl = malloc(base_url_len + size + 2);
if (newurl == NULL) {
fprintf(stderr, "3 malloc() of %d bytes, failed\n", size);
exit(1);
}
strncpy(newurl, base_url, base_url_len);
strncpy(newurl + base_url_len, url, size);
newurl[size + base_url_len] = '\0';
}
}
return (newurl);
}
问题是此功能不会删除./
或../
。同样,它不会小写主机和方案。它无法正常标准化。如何修改上述功能以正确规范化URL?
https://uriparser.github.io/处的uriparser C库不仅允许解析URL,而且还可以进行重组,因此,如果您使用该库进行解析并进行重组,则可能会得到所需的结果。