Selenium 无法点击 www.carmax.com/cars 按钮?

问题描述 投票:0回答:2

如有任何帮助,我们将不胜感激。 截至 2022 年 5 月 10 日的新信息。

需要 Selenium 的帮助来尝试从 carmax 网站上抓取汽车列表。 url = 'https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000'

在 selenium 之外,我可以提交 URL(通过 Mac 上的 Chrome),然后多次单击“查看更多匹配项”。每次添加 22 个汽车方块。 我想要获得与过滤器匹配的全部 228 辆车。

当我使用 selenium 时,我会得到包含 22 个图块(汽车)列表的初始页面。 但是,当我手动单击“查看更多匹配”(在 Selenium 浏览器内)时,我收到“我们很抱歉,发生了错误”

因此,在 selenium 浏览器窗口中,我手动粘贴了 URL,并收到一条消息:

Access Denied
You don't have permission to access "http://www.carmax.com/cars?" on this server.
Reference #18.61f1eb8.1664947333.87596fdb

下面是我尝试运行以循环遍历所有页面以查看所有 228 个汽车方块的代码。

# The following works and I see a list of cars
# browser = chromedriver()
# browser.get('https://www.carmax.com/cars?includenontransferables=False&year=2018-2023&mileage=30000&price=18000-30000')

# following works because the "SEE MORE MATCHES" @ bottom is display in browser
e = browser.find_element(By.ID, "see-more")
eBut = e.find_element(By.XPATH, ".//a")
print(eBut.text)

# The following works because button lights up in blue
hover = ActionChains(browser).move_to_element(eBut)
hover.perform()

# following causes an error "We're sorry, An error occurred in your search."
eBut.click()
time.sleep(3)

我通过 Chrome 日志检查网络日志。 当我手动点击按钮时...

NOTE visitorID on Request URL

单击按钮上的良好请求/响应
> General 
Request URL: https://www.carmax.com/cars/api/search/run?uri=%2Fcars%2Fcrossovers%3Fyear%3D2018-2023%26mileage%3D30000%26price%3D18000-32000&skip=48&take=24&zipCode=76210&radius=radius-nationwide&shipping=-1&sort=lowest-price&scoringProfile=segment_4&visitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263
Request Method: GET
Status Code: 200 
Remote Address: [2600:1404:6400:1988::1c4e]:443
Referrer Policy: strict-origin-when-cross-origin

> Response Headers
cache-control: public,max-age=120
content-encoding: gzip
content-length: 24290
content-security-policy: upgrade-insecure-requests
content-type: application/json; charset=utf-8
date: Thu, 06 Oct 2022 03:05:31 GMT
request-context: appId=cid-v1:43e71566-b7e7-4ca6-b692-9f3f68fd9719
server: Microsoft-IIS/10.0
server-timing: cdn-cache; desc=MISS
server-timing: edge; dur=65
server-timing: origin; dur=546
set-cookie: KmxSession_0=SessionId=ef0ffdc3-143d-4dde-9e1c-d16c6ec16e2e&logOdds=0.16263300000000003&logOddsA=-1.103987916&logOddsI=0.8484898; domain=.carmax.com; path=/; expires=Thu, 06-Oct-2022 03:35:31 GMT
set-cookie: KmxVisitor_0=StoreId=6095&Zip=76210&Lat=33.1508&Lon=-97.094&ZipConfirmed=True&ZipDate=10/6/2022 3:05:31 AM&VisitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263&IsFirstVisit=False&UsingStoreProxy=false&AdCode=SEMGAAB&AdCodeDate=10/3/2022 2:54 PM&DistanceShippingTestBucket=2&sRadius=radius-nationwide&LastSearch=638006222940089089&Sort=lowest-price&Shipping=-1; domain=.carmax.com; path=/; expires=Fri, 06-Oct-2023 03:05:31 GMT
set-cookie: bm_sv=A2572EFA5D77E5B9E212CF1F5E3EA1AA~YAAQjDgvF0lum4KDAQAAWV9BqxHO/mA6UGF3uH6Sqq7uQkZArnVAbp5XVaBvnRCWuL1zIgva6mSQmfTX1laMRUXpfsxv1+r/RI7NmAocHADTrGEH5s2EmRWsYB7OXs/nDyx7KiaT+F6qzTLnrAhFKv5hAnT3cfDY2QrducB3BpE3+x/2qCUG7FXEHZZ8Y4vFob+917bdn4LW9rRUjPBvHheQ4eu2Po9mQ8fTtCEQfoTz+em4VRXDYFgmVwWsDpUkeA==~1; Domain=.carmax.com; Path=/; Expires=Thu, 06 Oct 2022 05:02:07 GMT; Max-Age=6996; Secure
strict-transport-security: max-age=31536000
timing-allow-origin: *
vary: Accept-Encoding
x-frame-options: sameorigin
x-powered-by: ASP.NE

> REQUEST HEADERS
:authority: www.carmax.com
:method: GET
:path: /cars/api/search/run?uri=%2Fcars%2Fcrossovers%3Fyear%3D2018-2023%26mileage%3D30000%26price%3D18000-32000&skip=48&take=24&zipCode=76210&radius=radius-nationwide&shipping=-1&sort=lowest-price&scoringProfile=segment_4&visitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
content-type: application/json
cookie: kndctr_0C1038B35278345B0A490D4C_AdobeOrg_identity=CiY2NDEyMjEzMzg1MTI4Njg1NTY5MTUzOTg1ODIwODUzMjcxNzEzN1IOCI-D3PK5MBgBKgNPUjLwAY-D3PK5MA==; _fbp=fb.1.1664808847128.839021144; _gcl_au=1.1.1062470335.1664808847; _gcl_aw=GCL.1664808848.Cj0KCQjwkOqZBhDNARIsAACsbfLBnIzuFAqQwL3--e31KfdmgSD6rJHg3lUTFwSJ8tfceih1AymJoW8aAutBEALw_wcB; _gcl_dc=GCL.1664808848.Cj0KCQjwkOqZBhDNARIsAACsbfLBnIzuFAqQwL3--e31KfdmgSD6rJHg3lUTFwSJ8tfceih1AymJoW8aAutBEALw_wcB; s_fid=7D110C609D492208-3EDD85A763A86C1B; ai_user=e2RLFbYVOZHSuYmJMZxkXo|2022-10-03T14:54:11.532Z; _gid=GA1.2.802830010.1664808852; KmxBestMatch=Bucket=Test; KmxStore=StoreId=6095; at_check=true; AMCVS_0C1038B35278345B0A490D4C%40AdobeOrg=1; s_cc=true; fs_cid=1.0; _clck=vizc8f|1|f5h|0; AKA_A2=A; bm_sz=56BC4464F92D8D3854014390299384A2~YAAQjDgvF0BSm4KDAQAAJkM+qxGtUaRM9kgKs3OhjlPMND6oKDS9L9JrclpSJtoVlcFyP7frV8YD1xCVgcRdw5uFc4++0cxpEv6gpgWh/CigS4uh70WMMwMrSkDHPy2JNGg1vhMIhuwUamy/wLad5DGd71D+cRQicNKzDMPyWJX7e3B4sGONFIQ8VJgq+XW07Y6inJC5kDssxm2FpuI+AqIL/WKcCQ8EWJvk2sXe2r5V8u/oxKUCI3LZ5kcp5dm3m5c2EJ9mSSeGQ34mZPVilnXKDNdt/L5RwAs0lVuW5ogBrSs=~3421507~3224888; KmxSession_0=SessionId=ef0ffdc3-143d-4dde-9e1c-d16c6ec16e2e&logOdds=0.16263300000000003&logOddsA=-1.103987916&logOddsI=0.8484898; bm_mi=F58243EB46DA811B0A46D45132FFFD84~YAAQjDgvF1tSm4KDAQAAJUc+qxGQeUt5Cp1D7OyTa+nWNRnuzi/Ci2BmD4+4Qm0W1sHJA30Ap3m6mceXOzh5wfK03HRe2phSECTcw4RJ5uZBY5eLLkAQpQq3KKGKs0PPcJfrMvauuj9k38zru/2XffC0/Zu/RmhjOvGltYTXUom0lHni/1NId4QNlZH+Dinwy+dQRQsrngcHD/7oF26xgE4ud/TqHYs9HaEeRbP9eypGSng6pEs4oN4gD37JVHz9Uwv1AQaleut5m/tW4BejdCyks9j41mdfB8AqC4+0PlXptnrYyQa5n4cbidpZ7jM=~1; _abck=F2920DB117607824AC32F9ABD87E4CF0~0~YAAQjDgvF3pSm4KDAQAA7kk+qwiI2supb0Wj6jjIVZu5Js77gCQOYAS6Cz5QkS00G8u5W4qQbAInqHTLJ2F54vEUvjFBYsnudLSolWZQ2uSRIOV3FG4VffT+zR2NDBYn+mFGr9Oi0v9ioiaE6xsjOGOwk4UtEc1Y73ft9q9ut4Dl+b1rfqGo1hEUdPSp+Ie2mefY0fFQmhtEJ722KeKJSDg/AmiCQWxrOytVt4V4fLTaDNzByMwQmBxL0GOovHnOo8xxvFpYHV3YE3+nFOBsImR3jPdMqRx833/BKU+EL4g9W87VmtdGBp3/MmBqKBTFJjcx2j59QLbqOHDXG45fLpApfi1ducqf3j9++utrry4yhEQaAr7U3td+W0XHi2xi20UuAyLMuxzwA5iQFMQn1rDlyJhy~-1~-1~1665028912; mbox=PC#40b79aa81c9a40cbaa4d6bda16734a30.35_0#1728270130|session#29f54aa1f68e41e18055c862bd4f0314#1665027190; adobeTransID=9536546868bc6444b2c19840b7ac69c0; s_ppvl=Cars%2C94%2C21%2C3503%2C853%2C805%2C1792%2C1120%2C2%2CP; gpv_v4=Cars; s_visit=1; s_vnc365=1696561330104%26vn%3D7; s_ivc=true; cto_bundle=Lnd0i19yJTJCU3BkWldGeVZGT0lPSkE1dFU1TGJqWE55Q2RBa3BTbUttaHBPWmlGcG5MZyUyQjVOaktVNzQ4eHFndWlBVlVGeiUyQm9HMEVGUWtIeGo2ZzMwWSUyQlh1dXRha2trRGdiaHI5RXZUZHhCJTJCU1Y4SnVYcTl0U3Y5bmtXakFnUjNsVG5jTm42RSUyRlpBSyUyRlpZTGZoeE51UUVXeGk2QzNBcjNPamtoN3gxR25jeFhKSU5qQ2doWTF2eEgxVXFWbllqa2hFbDF6Mg; _uetsid=3a9d8740432b11edb0f42d600c354438; _uetvid=1eca7e007c5911ec859199f79f07ee47; _ga=GA1.1.2103228906.1664808852; fs_uid=#J90WC#5786631356157952:5474652164165632:::#/1687899589; AMCV_0C1038B35278345B0A490D4C%40AdobeOrg=-1124106680%7CMCMID%7C64122133851286855691539858208532717137%7CMCIDTS%7C19271%7CMCAID%7C304705C596F1394B-6000151B443909A6%7CMCOPTOUT-1665032530s%7CNONE%7CMCAAMLH-1665630130%7C7%7CMCAAMB-1665630130%7Cj8Odv6LonN4r3an7LhD3WZrU1bUpAkFkkiY1ncBR96t2PTI%7CvVersion%7C5.2.0; QSI_HistorySession=https%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-30000~1664946926834%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-32000~1665003893328%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-30000~1665006891845%7Chttps%3A%2F%2Fwww.carmax.com%2Fcars%3Furi%3D%2Fcars%2Fcrossovers%3Fincludenontransferables%3DFalse%26year%3D2018-2023%26mileage%3D30000%26price%3D18000-32000~1665025330891; ak_bmsc=AA0B2B31F438AA727AE20B131E3F04B4~000000000000000000000000000000~YAAQjDgvF5xTm4KDAQAAdGs+qxHUP44aCXjWPXfnud5T2nWxIt03lGiHJyxa7I1CCz0VpriGdiwkPRZafBeMrrGr74RZLcTRkJxxFXlJLHIlaDlNL2C9++bvBKZvHCMekKb+3tTkH2Ik4pG05Uas/qdjnLd33R1RHvJZukc/EuIZVOs/hl7IfzrrlRgUk/FYZxpAasr8WlhB5yM6MFWiDUihvTOX7kDu03ti4HoCgoabfB6hPvqRkOiG2e5OTKGKmR13ZHbi9egXov8opwXnOzbCqvvKRJdULfCH1htnsHyJwoIMKgWwE5dF2xpjdKX55g4XE4H7KdeZOhPeVzAj1ElUvFaSALv0RH+IHysLyMpPq+bGMi74nVjwTUf1rfJiw05MpVwD/oUPjsCWZxNtBx+3rFPgF44zEVJ+LFMTHy5zeWR3E48rJCBc41s4sM+Loj+7Ox8y9bSB7GfZCUoCKLIXv8883NvuNIzapUyGLnrXpLzOiMOAJZ2qlEpzhU1ZEgOelVa9; _clsk=1mpyjk0|1665025366793|3|0|m.clarity.ms/collect; _ga_NTWN6LKPPS=GS1.1.1665025330.7.1.1665025421.0.0.0; ai_session=dwGcZfAd9Et49laIoOjdG+|1665025342556|1665025492736; KmxVisitor_0=StoreId=6095&Zip=76210&Lat=33.1508&Lon=-97.094&ZipConfirmed=True&ZipDate=10/6/2022 3:03:45 AM&VisitorID=509f6eb9-eddb-4472-b412-b7e4d73fa263&IsFirstVisit=False&UsingStoreProxy=false&AdCode=SEMGAAB&AdCodeDate=10/3/2022 2:54 PM&DistanceShippingTestBucket=2&sRadius=radius-nationwide&LastSearch=638006222940089089&Sort=lowest-price&Shipping=-1; bm_sv=A2572EFA5D77E5B9E212CF1F5E3EA1AA~YAAQjDgvFwJpm4KDAQAARsxAqxHE8+frQ64O+0FfncRNlVXCb+PpwuH3zPhQed95YyfQA7k6RmdSdyyRPy28Kh2w0pFvZqpnTi7tuolj+jSUtlS0Za3NunPBLI2e1cXOrd6kwLQ6YMOTBYeRZAvwwUxEFEm4gCa+BKfL6Wh5liEdEVPouU9MEqfK7EYrVfxPXPLNiK4yp40G3fAbZR01Tx+GgmagirDOo9fgoyGa2kjS7dQGnjESxyLKGBG6Dj8ywg==~1; s_ppv=Cars%2C99%2C22%2C9664%2C750%2C805%2C1792%2C1120%2C2%2CL; RT="z=1&dm=carmax.com&si=26ef3d4b-afa7-46e4-bc18-af86a66d0072&ss=l8wh3k9n&sl=4&tt=2jl&bcn=%2F%2F17de4c1c.akstat.io%2F&ld=1tbj&nu=9y8m6cy&cl=413k"; s_sq=carmaxadaptive%3D%2526c.%2526a.%2526activitymap.%2526page%253DCars%2526link%253DSEE%252520MORE%252520MATCHES%2526region%253Dsee-more%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253DCars%2526pidt%253D1%2526oid%253Dfunctionzr%252528%252529%25257B%25257D%2526oidt%253D2%2526ot%253DA
referer: https://www.carmax.com/cars/crossovers?year=2018-2023&mileage=30000&price=18000-32000
sec-ch-ua: "Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36
python selenium-webdriver web-scraping
2个回答
1
投票

您应该使用选项删除所有表明您是自动机器人的提示。当 JS 验证这些标志时,他们只是冻结你的会话。初始化您的机器人时,使用以下代码就可以了,

    options = Options()
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
    driver = selenium.webdriver.Chrome(driver_path, options = options)

完整的代码是:

from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
import selenium
import time
import bs4

# Spawn WebDriver:
options = Options()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36")
driver = selenium.webdriver.Chrome("chromedriver.exe", options = options)

# Go-To page:
driver.get("https://www.carmax.com/cars?includenontransferables=false&year=2018-2023&mileage=30000&price=18000-30000")
wait = WebDriverWait(driver, 600)

# Click on See More:
ef = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a')))
time.sleep(2)
ef.click()

# Get the Page with Bs4:
soup = bs4.BeautifulSoup(driver.page_source, "lxml")

# Repeat the process...

页面迭代直至结束的示例:

while True:
    
    if len(driver.find_elements_by_xpath('//*[@id="see-more"]/div/a')) > 0:
    
        # Click on See More:
        ef = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a')))
        time.sleep(2)
        ef.click()

        see_more_text = bs4.BeautifulSoup(driver.page_source, "lxml").find("span", {"class": "see-more--blue"}).get_text()
        total = int(regex.sub("[^\d+]", '', see_more_text.split(' ')[-1]))
        current = int(regex.sub("[^\d+]", '', see_more_text.split(' ')[0]))
        
        print(f"Status: Currently Viewing {current} of {total} Matches")
    
    else:
        print(f"Status: Currently Viewing {total} of {total} Matches")
        break
    

0
投票

我使用另一个自动库来解决这个问题,它可以自动化用户浏览器,而不是像selenium web驱动程序那样。

from time import sleep
from clicknium import clicknium as cc

if not cc.chrome.extension.is_installed():
    cc.chrome.extension.install_or_update()

tab = cc.chrome.open("https://www.carmax.com/cars?includenontransferables=false&year=2018-2023")

tab.wait_appear_by_xpath('//*[@id="see-more"]/div/div/span[1]', wait_timeout=5)
while tab.is_existing_by_xpath('//*[@id="see-more"]/div/a'):
    tab.find_element_by_xpath('//*[@id="see-more"]/div/a').click()
    sleep(3)
© www.soinside.com 2019 - 2024. All rights reserved.