我有一个相当基本的刮板设置,可以通过homes.com上的房地产经纪人资料列表进行循环。我的刮板工作几个月了,但是随后开始零星散发的ERR_HTTP2_PROTOCOL_ERROR错误。我正在使用python/selenium/chromedriver。起初,我以为我的IP可能会以某种方式被阻止/标记,但是当我只是在另一个浏览器中击中相同的URL时,它会加载完美(该错误在用Chromedriver作为我的Scraper的一部分时首先发生)。这个错误曾经一次发生一次,也许是每一个1000 URL,现在几乎每5-10ish都会发生,以至于我的刮板无法使用。一旦我的代码遇到一次错误,循环中几乎所有其他URL都会出现相同的错误。
我尝试过的是:
import random
from selenium import webdriver
import time
urllist = ['https://www.homes.com/real-estate-agents/susanne-guthrie/x5lx8zp/',
'https://www.homes.com/real-estate-agents/sue-pearce/362zpwe/',
'https://www.homes.com/real-estate-agents/katie-mihelich/kzppjy8/',
'https://www.homes.com/real-estate-agents/matt-pittman/etryj3q/',
'https://www.homes.com/real-estate-agents/mateen-ansari/mg2qg9l/',
'https://www.homes.com/real-estate-agents/rachael-real/dk3q8gl/',
'https://www.homes.com/real-estate-agents/annamarie-moise/21qtbtb/',
'https://www.homes.com/real-estate-agents/madison-verdun/0qvxe13/',
'https://www.homes.com/real-estate-agents/david-stob/hsztzf1/',
'https://www.homes.com/real-estate-agents/samuel-chrusciel/ww3525k/',
'https://www.homes.com/real-estate-agents/cathie-smith/b32xp59/',
'https://www.homes.com/real-estate-agents/jean-reedy-baren/7vk95ky/',
'https://www.homes.com/real-estate-agents/randy-stob/y9j077s/',
'https://www.homes.com/real-estate-agents/jeanne-jordan/sh90wf7/',
'https://www.homes.com/real-estate-agents/anthony-janega/p42zbfv/']
driver = webdriver.Chrome()
for url in urllist:
delay = random.randint(1, 3)
time.sleep(delay)
driver.get(url)
# SCRAPE DATA
这是浏览器尝试失败的卷曲请求,它是由Chrome Developer工具复制的。在终端运行此操作时,它有效:
curl 'https://www.homes.com/real-estate-agents/sue-pearce/362zpwe/' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' \
-H 'accept-language: en-US,en;q=0.9' \
-b 'gp=%257b%2522k%2522%253a%257b%2522key%2522%253a%2522lqjs6ple4qbcy%2522%257d%252c%2522v%2522%253a4%252c%2522d%2522%253a%257b%2522lt%2522%253a33.392%252c%2522ln%2522%253a-84.57%257d%257d; hab=%257b%2522e%2522%253atrue%252c%2522r%2522%253a%255b%255d%257d; vr=vr-rU%2FpWRFUWU6wYzYaX5AuCg.1741881982; at=bAJyo0SDJ4EhYVyUtPvuVhUymGMGPlZLdn3tcKdBI0Xud82LQsbfeiq3TkganjvB; v=v-1; _gcl_au=1.1.1820879348.1741881984; sr=%7B%22h%22%3A849%2C%22w%22%3A1728%2C%22p%22%3A2%7D; ak_bmsc=12DA12F3DC9B2ABF4F47558632340484~000000000000000000000000000000~YAAQZP49F0HFSI2VAQAAI+wTkRsCvktt8G11jnkV+T4UIawsN55RsmH07lgCfYsaUcenprvfkFJYtQW8XJbekntDA3wSbEGFtp9/iVS5Usz1Jlb4BI7JmnEyCSVu8rhCkchU/JXrh4faaeEZo5cyb5Bb05sJNMLogmMjYOBjg4GK94+xREVCc8dc3ad6BwDROYZxK1iimOBrtLV+f2fSylGJSCJ9Yi16jyGvfPla4GNLGJ85FMItsAoZMeYcWV8kvOZ5izen2RNZ+ZlpHX4BJoCjDmQMw1oJz6DVfJC6JwK9Sn2L8cofye5Bvc9dTyocwUR6wYucqAZrXnkTBaDommnaP3BwNnbXvYRJoWeed3UrLSZFtcFVFCixOeSozKmEOJj5rXUNtisY; bm_s=YAAQbP49FxKX242VAQAAJiwykQNd7lmE3pGGW5rEW/e9aipMi2HPRTPv47MpvK1LD1mdWpD4yPN+3udsD/irFBZDQWjIHURN0JXwwkeWQZbBDXgO01UNoFbytX8qNZJOo0iaXKMkOHMUIYxQ6b4g+lw3/Co6NOQCgPX0SUqn67YbMmLoxASs0/Fa77d1+Q07ZqiXNOurWmPuDKWly9nOsvsAwst5Xz99SlRllSr4WQTN6EcfqN7n+wJhSlbcsFHl4OooRc12Tpw+pFomIOLEBT2WMaBhbIKEIdOesIKu9KjA6XTIXWJCGaOhVViH3p0rMUd3DoFVhU5UWsfB3JtqMYKwsXvEMjDphsSpsYdobIe71Xkoa+r4dRe4aFNf64oXIzocru00lPFKz2yTXxBcbekSqEkJKXQrAoSUXZFHXWN5NaFftyB4v1ajj1cJ1Te0ctlIVOEtCmE=; bm_so=ADB501C194325C53480AF6BD1A2519FE3E71A4085DF823AD806F6D4092C38F4E~YAAQbP49FxOX242VAQAAJiwykQKLX26FZYg/mwv+WVQCJRxmcyYl5XqJfcfH5WPg48BqbUCvyEmIdYXVijj9GnIA4xnp9yMGGgUAB7JUfRO7rZfctGbNvESzTZAgDc3BArgVCahd7woid4XxnxGRwsQFKKA4hYsmJZ74IxlNRaTf9Qyk13DB8/Xixt4btfCT3mcpudUNgGZmZayR+WaFVa+8lAIGkogZtuXLp9NgHRExXjqK3VULGfuMdSMAhMo5yQDI5t6ehwO17JmOckIaVgB7mM8Z5z8zN9i1EqM4CXaGhCkX9YLgEE20gZlQvmLhJ5edp/u+mN/SCKuOB1R5/LNqkVfrLVNHR9Yvchedwb46olXBLZ98cbT69LdqAT63qys6IC/tV4VWTlK0XpnOSFNKekz+E4kEMwgO9Mw5cmzoG/hxRfClnCdEdoAgzLX+7AelMv9vc8UsZxuL1Q==; bm_lso=ADB501C194325C53480AF6BD1A2519FE3E71A4085DF823AD806F6D4092C38F4E~YAAQbP49FxOX242VAQAAJiwykQKLX26FZYg/mwv+WVQCJRxmcyYl5XqJfcfH5WPg48BqbUCvyEmIdYXVijj9GnIA4xnp9yMGGgUAB7JUfRO7rZfctGbNvESzTZAgDc3BArgVCahd7woid4XxnxGRwsQFKKA4hYsmJZ74IxlNRaTf9Qyk13DB8/Xixt4btfCT3mcpudUNgGZmZayR+WaFVa+8lAIGkogZtuXLp9NgHRExXjqK3VULGfuMdSMAhMo5yQDI5t6ehwO17JmOckIaVgB7mM8Z5z8zN9i1EqM4CXaGhCkX9YLgEE20gZlQvmLhJ5edp/u+mN/SCKuOB1R5/LNqkVfrLVNHR9Yvchedwb46olXBLZ98cbT69LdqAT63qys6IC/tV4VWTlK0XpnOSFNKekz+E4kEMwgO9Mw5cmzoG/hxRfClnCdEdoAgzLX+7AelMv9vc8UsZxuL1Q==^1741897741056; AKA_A2=A; akaalb_www_homes_prd=1741902608~op=homes_Prd_Edge_US:www_homes_prd_usw2|~rv=79~m=www_homes_prd_usw2:0|~os=48c4c61a41b922746ef5062cf402e343~id=cba36d26f4d2ae56ef348d9619bda3d8' \
-H 'priority: u=0, i' \
-H 'sec-ch-ua: "Chromium";v="134", "Not:A-Brand";v="24", "Google Chrome";v="134"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'sec-gpc: 1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'
足够多的话,我的程序被轰炸后,我打了“刷新”,然后Chrome将此URL添加了好评。然后,一旦成功查看它是否更改,我就会复制卷发请求,在这里。我看到的一件事是现在浏览器的“'cache-control:max-age = 0'”的标题现在?我不足以知道为什么这会改变(或者是否相关)。您可能建议您尝试什么,以便我的Python代码可以始终如一地运行?任何帮助我朝正确方向指出的帮助都会非常有帮助,因为这使我发疯!预先感谢!
curl 'https://www.homes.com/real-estate-agents/sue-pearce/362zpwe/' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' \
-H 'accept-language: en-US,en;q=0.9' \
-H 'cache-control: max-age=0' \
-b 'gp=%257b%2522k%2522%253a%257b%2522key%2522%253a%2522lqjs6ple4qbcy%2522%257d%252c%2522v%2522%253a4%252c%2522d%2522%253a%257b%2522lt%2522%253a33.392%252c%2522ln%2522%253a-84.57%257d%257d; hab=%257b%2522e%2522%253atrue%252c%2522r%2522%253a%255b%255d%257d; vr=vr-rU%2FpWRFUWU6wYzYaX5AuCg.1741881982; at=bAJyo0SDJ4EhYVyUtPvuVhUymGMGPlZLdn3tcKdBI0Xud82LQsbfeiq3TkganjvB; v=v-1; _gcl_au=1.1.1820879348.1741881984; ak_bmsc=12DA12F3DC9B2ABF4F47558632340484~000000000000000000000000000000~YAAQZP49F0HFSI2VAQAAI+wTkRsCvktt8G11jnkV+T4UIawsN55RsmH07lgCfYsaUcenprvfkFJYtQW8XJbekntDA3wSbEGFtp9/iVS5Usz1Jlb4BI7JmnEyCSVu8rhCkchU/JXrh4faaeEZo5cyb5Bb05sJNMLogmMjYOBjg4GK94+xREVCc8dc3ad6BwDROYZxK1iimOBrtLV+f2fSylGJSCJ9Yi16jyGvfPla4GNLGJ85FMItsAoZMeYcWV8kvOZ5izen2RNZ+ZlpHX4BJoCjDmQMw1oJz6DVfJC6JwK9Sn2L8cofye5Bvc9dTyocwUR6wYucqAZrXnkTBaDommnaP3BwNnbXvYRJoWeed3UrLSZFtcFVFCixOeSozKmEOJj5rXUNtisY; AKA_A2=A; akaalb_www_homes_prd=1741902608~op=homes_Prd_Edge_US:www_homes_prd_usw2|~rv=79~m=www_homes_prd_usw2:0|~os=48c4c61a41b922746ef5062cf402e343~id=cba36d26f4d2ae56ef348d9619bda3d8; vt=vt-LvPAD0gIJ0%2Br2gUpwbd1Hg; bm_ss=ab8e18ef4e; bm_so=2CF5C8E88F5A3FDEA0BB1148DC4B3E92B6CC33F58F3FE3D3146676D320D41C49~YAAQyDhjaGttCZCVAQAA+Ah1kQIgEX4lhXLMZwfe+NKUNGvgdpESA2rhSFmMZp6fyvgRZmvV7ToF+9hu5+pdmIryMhW1fvwXRpnL+M5c6LbJgjqJLunjBOrQ0/5txCQ6lKjqy71KqIPPKF8ngY4i69GLBew1Al0IeUKXxv+L0XdmlpcfvnILoy4pLjw700cwUpZfLRMYok40ObcCy7kaB0+bU9CK69GRSmCHo7bhWV/Lutv2cCulXn081knh1lLoVfZrR21C8074pIC4zMSrHhVkrEQV5NapK8+Bd/lz7u5EM5pw9pk+X+ozvrgUrsWvmo6LOl5yVxr74SRa1IwIQmFRrjLwFxDFmHsSEzgyiXPfbI55cH4EeTwXo9XDG34GwnuNcAxN/fTtglHUZyUC08VVtbGltUb6fVUO+yGVpSRvMPuHK85mR1TPZL9WR2Ujh45Ia4QGOOsL/F/fPQ==; sr=%7B%22h%22%3A350%2C%22w%22%3A1728%2C%22p%22%3A2%7D; bm_lso=2CF5C8E88F5A3FDEA0BB1148DC4B3E92B6CC33F58F3FE3D3146676D320D41C49~YAAQyDhjaGttCZCVAQAA+Ah1kQIgEX4lhXLMZwfe+NKUNGvgdpESA2rhSFmMZp6fyvgRZmvV7ToF+9hu5+pdmIryMhW1fvwXRpnL+M5c6LbJgjqJLunjBOrQ0/5txCQ6lKjqy71KqIPPKF8ngY4i69GLBew1Al0IeUKXxv+L0XdmlpcfvnILoy4pLjw700cwUpZfLRMYok40ObcCy7kaB0+bU9CK69GRSmCHo7bhWV/Lutv2cCulXn081knh1lLoVfZrR21C8074pIC4zMSrHhVkrEQV5NapK8+Bd/lz7u5EM5pw9pk+X+ozvrgUrsWvmo6LOl5yVxr74SRa1IwIQmFRrjLwFxDFmHsSEzgyiXPfbI55cH4EeTwXo9XDG34GwnuNcAxN/fTtglHUZyUC08VVtbGltUb6fVUO+yGVpSRvMPuHK85mR1TPZL9WR2Ujh45Ia4QGOOsL/F/fPQ==^1741902123250; bm_s=YAAQyDhjaNt8CZCVAQAAnj91kQM5FGaQxhEb31qC1Khm9ee44yiodp6KEpP89IIZt7Z3ojnCfyzXc813U5XxrxoQx0xdEqRNI+0MRmW7+/9D3iNzrA8Wb/CycNmbveJ1D86DkidL5Se9ZioFHNjAmfAuVvQDbB7k0xyHVbZO17i26rzpHZzVY09CdSj5YOlnw/A9rcLO2KNsFsvJHWIQfxd/U1mOJ1vOwvdiupeVV8ughbEQneQ+JP+vJ5f9au186PIkbgGQl3EaC3pUEMlMoOeA3kduXISq0uwGMuhY7M3AJWCnStfnT3EzF2pnYi1CPAaxvUdPaEv81nZ29HKpRStmUH4yGbke3i7CIiht3vg4efff9CnYCynRvmf+zpVlt0ns+OQmpJNPjGUvB1P10y2iPyvIuj4WWStkxsyRE7oSmHj2KQugyW4HCnWGRFyTvIS5DpqfH9Q=' \
-H 'priority: u=0, i' \
-H 'sec-ch-ua: "Chromium";v="134", "Not:A-Brand";v="24", "Google Chrome";v="134"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'sec-gpc: 1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'
问题是您需要继续更新或使用的cookie。