我做了一个网络爬虫，但是输出的是乱码

Question

我正在制作网络爬虫来获取特定的内容和输出。然而爬虫的输出却是某种乱码。

代码：

import fetch from 'node-fetch';
import { JSDOM } from 'jsdom';

// Trying to crawl bbs posts
fetch('https://target-website.tld/bbs.cgi')
  .then(response => response.text())
  .then(htmlString => {
    const dom = new JSDOM(htmlString);
    // posts is inside of <pre> that is inside of <blockquote>
    const blockQuotes = dom.window.document.querySelectorAll('blockquote');
    blockQuotes.forEach(blockQuote => {
      const preElements = blockQuote.querySelectorAll('pre');
      preElements.forEach(pre => {
        console.log(pre.textContent.trim());
      });
    });
  })
  .catch(error => console.error('Error fetching HTML:', error));

输出示例：

�Ôg�I�����āI낻��܂��....����(;�L�D`)

我尝试在

UTF-8

上设置

fetch

标题，但不起作用。我想要输出没有乱码的文本。

P.S：99％的帖子都是用日语写的，所以我认为使用UTF-8可以解决这个问题，我不知道如何解决。

Answer 1

您可能需要提供您正在尝试抓取的实际网站，但乍一看，您似乎正在尝试将二进制数据（可能是图像）转换为文本

我做了一个网络爬虫，但是输出的是乱码

问题描述投票：0回答：1

1个回答

最新问题

我做了一个网络爬虫，但是输出的是乱码

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1