href 值我使用 Cheerio

Question

我正在尝试使用cheerio 进行抓取。不过我遇到了一个小问题。我在客户端获得的所有 href 值都以“/url?q=”开头。例如这样：

'/url?q=https://www.nimh.nih.gov/health/topics/auti… pkCZkQFnoECAYQAg&usg=AOvVaw1E4L1bLVm9OdBSFMkjJftQ'

谷歌搜索的元素是：

<a jsname="UWckNb" href="https://www.nimh.nih.gov/health/topics/autism-spectrum-disorders-asd"...

它不包含

"/url?q="

。

"/url?q="

从哪里来？

app.get('/scrape', (req, res) => {
    request('https://www.google.com/search?q=asd', (error, response, html) => {
        if (response.statusCode == 200) {
            
            const $ = cheerio.load(html);
            const results = [];
            const links = $('a'); 
            links.each((index, link) => {
                const href = $(link).prop('href'); 
                const h3 = $(link).find('h3'); 
                
                if (h3.length > 0) {
                    const textContent = h3.text().trim();
                    results.push({ href, textContent }); 
                }
            });
        
            const responseData = {
                links: results,
                total: results.length
            };

            res.json(responseData); 
        } else {
            console.error('Unexpected status code:', response.statusCode);
            res.status(500).send('Unexpected status code.'); 
        }
    });
});

我知道我可以这样解决：

 const actualUrl = decodeURIComponent(href.split('/url?q=')[1].split('&')[0]);

但是我想知道这个

"/url?q="

在哪里，我做错了什么？

Answer 1

这就是从服务器发送的静态 HTML 中的 URL 的样子。显然，一些 JS 会在加载后运行并修剪 href，但由于 Cheerio 不运行 JS，因此您对此无能为力。警惕您在开发工具中看到的内容——它包括动态 JS 脚本。

decodeURIComponent(href.split('/url?q=')[1].split('&')[0]);

似乎有点矫枉过正。我会用

href.replace(/^/url\?q=/, "")

。

还有：

您的选择器可以简化为仅搜索
```
a > h3
```
，或使用一个类，这看起来相当稳定（这个答案去年使用一个类仍然有效且更直接）。
无需将
```
.length
```
发送给客户。数组具有内置长度，因此它们可以自然地访问它。缓存长度是一种不好的做法，因为它是不必要的，并且很容易与实际长度不同步。
不要使用
```
request
```
。它已被弃用，回调也不再流行。更喜欢 Promise——fetch 现在是 Node 18+ 的标准配置，还有 axios，它也比
```
request
```
更受青睐。

href 值我使用 Cheerio

问题描述投票：0回答：1

1个回答

最新问题

href 值我使用 Cheerio

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1