在PyCharm中使用Python web scraper的问题。(初学者)

Question

我最近开始学习Python。在学习网络搜刮的过程中，我按照一个例子从Google新闻中搜刮。在运行我的代码后，我得到的信息是："Process finished with exit code 0"，没有结果。"Process finished with exit code 0"，没有结果。如果我把网址改成"https:/yahoo.com" 我得到了结果。有人能指出我做错了什么吗？

代码。

import urllib.request
from bs4 import BeautifulSoup


class Scraper:
def __init__(self, site):
    self.site = site

def scrape(self):
    r = urllib.request.urlopen(self.site)
    html = r.read()
    parser = "html.parser"
    sp = BeautifulSoup(html, parser)
    for tag in sp.find_all("a"):
        url = tag.get("href")
        if url is None:
             continue
        if "html" in url:
            print("\n" + url)

news = "https://news.google.com/"
Scraper(news).scrape()

Answer 1

试试这个。

import urllib.request
from bs4 import BeautifulSoup


class Scraper:

    def __init__(self, site):
        self.site = site

    def scrape(self):
        r = urllib.request.urlopen(self.site)
        html = r.read()
        parser = "html.parser"
        sp = BeautifulSoup(html, parser)
        for tag in sp.find_all("a"):
            url = tag.get("href")
            if url is None:
                continue
            else:
                print("\n" + url)


if __name__ == '__main__':
    news = "https://news.google.com/"
    Scraper(news).scrape()

最初你是在检查每个链接是否包含 "html"。我假设你所遵循的例子是检查链接是否以'.html'结尾。

美丽汤的效果非常好，但你需要检查你所搜刮的网站的源代码，以了解代码是如何布局的。在chrome中的Devtools真的很好用，F12让他们快速。

我删除了。

if "html" in url:
            print("\n" + url)

并将其替换为：

else:
    print("\n" + url)

在PyCharm中使用Python web scraper的问题。(初学者)

问题描述投票：0回答：1

1个回答

最新问题

在PyCharm中使用Python web scraper的问题。(初学者)

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1