我正在使用 chrome 中的网络 cookie 通过 python 请求登录网络。这在本地主机http://127.0.0.1:8000/上工作得很好,但是当我将其部署到azure应用程序服务时,这不再工作并显示“警告:登录失败!!”
class MySpider(scrapy.Spider):
def __init__(self, link, text):
self.link = link
self.cookie = 'my_cookies'
self.user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
self.text = text
def login(self, url):
headers = {
'User-Agent': self.user_agent,
'Cookie': self.cookie,
}
session = requests.Session()
response = session.get(url, headers=headers)
response.text.encode('utf-8')
if response.status_code != 200:
print('WARNING: Login Failed!!')
return response
我尝试更改 cookie 和绑定 IP,但仍然收到警告:登录失败!!
requests
模块。您在 Scrapy 蜘蛛中使用 requests.Session()
,但 Scrapy 有自己的处理请求、会话和 cookie 的机制。最好使用 Scrapy 的内置工具来处理 cookie 和会话管理,而不是将
requests
混合到代码中。 Scrapy 在 cookie
对象中提供了 Request
参数。
代码:
import scrapy
class MySpider(scrapy.Spider):
name = "my_spider"
def __init__(self, link, text):
self.link = link
self.cookies = {'my_cookies': 'cookie_value'}
self.user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
self.text = text
def start_requests(self):
headers = {
'User-Agent': self.user_agent,
}
yield scrapy.Request(
url=self.link,
headers=headers,
cookies=self.cookies,
callback=self.parse
)
def parse(self, response):
# Process the login response here
if response.status != 200:
self.logger.warning('Login Failed!!')
else:
self.logger.info('Login successful.')
# Continue with your parsing logic here
本地域不同且 Azure 域不同 检查 cookie 对于 Azure 域是否有效。尝试检索对 Azure 域有效的新 cookie。
已登录:
2024-09-23 12:45:32 - INFO - Attempting login to [testappt-c4g0abb6e3ffhkff.eastus-01.azurewebsites.net] https://testappt-c4g0abb6e3ffhkff.eastus-01.azurewebsites.net/login with headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36', 'Cookie': 'cookie_name=value; other_cookie=value2'}
2024-09-23 12:45:34 - INFO - Response Status Code: 200
2024-09-23 12:45:34 - INFO - Response Cookies: <RequestsCookieJar[<Cookie sessionid=abcdef1234567890 for .example.com/>]>
2024-09-23 12:45:34 - INFO - Response Headers: {'Date': 'Mon, 23 Sep 2024 12:45:34 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Set-Cookie': 'sessionid=abcdef12339402890; expires=Mon, 30-Sep-2024 12:45:34 GMT; HttpOnly; Path=/'}
2024-09-23 12:45:34 - INFO - Login successful.