我是python的新手,并试图使用xpath并请求使用here中演示的方法登录并从this tutorial中获取一些数据。我的python脚本目前如下:
from lxml import html
import requests
url = "http://www.londoncoffeeguide.com/Venues/Profile/26-Grains"
session_requests = requests.session()
login_url = "http://www.londoncoffeeguide.com/signin?returnurl=%2fVenues"
result = session_requests.get(login_url)
tree = html.fromstring(result.content)
authenticity_token = list(set(tree.xpath("//input[@name='__CMSCsrfToken']/@value")))[0]
payload = {
"p$lt$ctl01$LogonForm_SignIn$Login1$UserName": 'XXX',
"p$lt$ctl01$LogonForm_SignIn$Login1$Password": 'XXX',
"__CMSCsrfToken": authenticity_token
}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0'}
with requests.session() as s:
p = s.post(login_url, data=payload, headers=headers)
print(p.text)
不幸的是,帖子请求的文本返回显示...
<head><title>
System error
</title>
...然后是登录页面的HTML的其余部分。我已经尝试添加如上所示的标题行,仔细检查我正在使用的登录详细信息是否正确,我很高兴CMSCsrfToken是正确的,但登录不起作用。对此有任何帮助非常感谢,我一直在谷歌上搜索,但我发现类似问题的各种反应似乎没有帮助(到目前为止!)
你把你的username
和password
放在了错误的领域。此外,在有效载荷中添加的附加字段很少,如viewstategenerator
,viewstate
e.t.c.为了使脚本工作。以下脚本将使您登录,然后获取不同的配置文件项标题。
from lxml.html import fromstring
import requests
login_url = "http://www.londoncoffeeguide.com/signin?returnurl=%2fVenues"
username = "" #fill this in
password = "" #fill this in as well
with requests.session() as session:
session.headers['User-Agent'] = 'Mozilla/5.0'
result = session.get(login_url)
tree = fromstring(result.text)
auth_token = tree.xpath("//input[@id='__CMSCsrfToken']/@value")[0]
viewstate = tree.xpath("//input[@id='__VIEWSTATE']/@value")[0]
viewgen = tree.xpath("//input[@id='__VIEWSTATEGENERATOR']/@value")[0]
payload = {
"__CMSCsrfToken": auth_token,
"__VIEWSTATEGENERATOR":viewgen,
"p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$UserName": username,
"p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$Password": password,
"__VIEWSTATE":viewstate,
"p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$LoginButton": "Log on"
}
session.headers.update({'User-Agent': 'Mozilla/5.0'})
p = session.post(login_url, data=payload)
root = fromstring(p.text)
for iteminfo in root.cssselect(".ProfileItem .ProfileItemTitle"):
print(iteminfo.text)
确保在执行前填写脚本中的username
和password
字段。