我正在解析Instagram,它开始将我重定向到登录页面。脚本只是从主页获取内容,而没有日志记录功能。
- window._sharedData
如何阻止它重定向,并且仍然继续加载目标帐户主页?
for i in list_of_urls:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'}
responce = requests.get(i,headers=headers)
response_text = responce.text
shared_data = response_text.split('window._sharedData = ')[1].split(';</script>')[0]
etc...
我用以下方法检查了重定向:
for i in list_of_urls:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'}
responce = requests.get(i,headers=headers)
if responce.history:
print("Request was redirected")
for resp in responce.history:
print(resp.status_code, resp.url)
print("Final destination:")
print(responce.status_code, responce.url)
else:
print("Request was not redirected")
并且它确实重定向:
https://www.instagram.com/_linails_/
https://www.instagram.com/alena.nails.tallinn/
Request was redirected
302 https://www.instagram.com/_linails_/
Final destination:
200 https://www.instagram.com/accounts/login/
Request was redirected
302 https://www.instagram.com/alena.nails.tallinn/
Final destination:
200 https://www.instagram.com/accounts/login/
Request was not redirected
任何想法如何使其停止重定向或不登录就返回?
P.S。看起来Instagram想到了这是一个打开页面的脚本。这是它的完整回复-https://yadi.sk/d/2vcng8VTBDz35A
您可以尝试以下方法:
>>> r = requests.get('<url>', allow_redirects=False)
更新:2019年10月11日星期五07:21:15 UTC
我无法重现此问题,以查看为什么我没有任何内容担心,也许是基于位置的某些限制:
>>> r = requests.get('https://www.instagram.com/_linails_/')
>>> r.history
[]
>>> r.url
'https://www.instagram.com/_linails_/'