使用BeautifulSoup和Python获取元标记内容属性

Question

我正在尝试使用python和美丽的汤来提取下面标签的内容部分：

<meta property="og:title" content="Super Fun Event 1" />
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

我正在使用BeautifulSoup来加载页面并找到其他东西（这也从源代码中隐藏的id标签中获取文章id），但我不知道正确的方法来搜索html并找到这些位，我尝试过find和findAll的变种无济于事。代码迭代目前的网址列表...

#!/usr/bin/env python
# -*- coding: utf-8 -*-

#importing the libraries
from urllib import urlopen
from bs4 import BeautifulSoup

def get_data(page_no):
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read()
    soup = BeautifulSoup(webpage, "lxml")
    for tag in soup.find_all("article") :
        id = tag.get('id')
        print id
# the hard part that doesn't work - I know this example is well off the mark!        
    title = soup.find("og:title", "content")
    print (title.get_text())
    url = soup.find("og:url", "content")
    print (url.get_text())
# end of problem

for i in range (1,100):
    get_data(i)

如果有人可以帮我排序，找到og：title和og：内容真棒！

Answer 1

提供meta标记名称作为find()的第一个参数。然后，使用关键字参数检查特定属性：

title = soup.find("meta",  property="og:title")
url = soup.find("meta",  property="og:url")

print(title["content"] if title else "No meta title given")
print(url["content"] if url else "No meta url given")

如果您知道title和url meta属性将始终存在，则此处的if / else检查将是可选的。

Answer 2

试试这个：

soup = BeautifulSoup(webpage)
for tag in soup.find_all("meta"):
    if tag.get("property", None) == "og:title":
        print tag.get("content", None)
    elif tag.get("property", None) == "og:url":
        print tag.get("content", None)

使用BeautifulSoup和Python获取元标记内容属性

问题描述投票：23回答：2

2个回答

最新问题

使用BeautifulSoup和Python获取元标记内容属性

问题描述 投票：23回答：2

2个回答

最新问题

问题描述投票：23回答：2