无法从动态网站的表中抓取数据

Question

好吧，所以我正在尝试从这个网站上删除表格 - https://www.diamondsfactory.co.uk/design/combined-band-look-diamond-engagement-ring-clrn0717801

本节基本上：表中的行

到目前为止我已经尝试过这个脚本：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Initialize the Chrome driver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

# Open the webpage
url = "https://www.diamondsfactory.co.uk/design/combined-band-look-diamond-engagement-ring-clrn0717801"
driver.get(url)

# Find the table row by its id
row = driver.find_elements(By.CLASS_NAME, "tdStone  odd")

# Extract the data from each cell in the row
td_elements = row.find_elements(By.TAG_NAME, "td")
data = [td.text for td in td_elements]

# Print the extracted data
print(data)

# Close the driver
driver.quit()

但是，我收到了这个错误 - 回溯（最近一次调用最后一次）：文件“C:\Users\Payalkumavat\scrape_diamonds.py\scrape.py”，第 17 行，位于 td_elements = row.find_elements(By.TAG_NAME, "td") ^^^^^^^^^^^^^^^^^^ AttributeError：“列表”对象没有属性“find_elements”

Answer 1

注意：此答案包含实现目标的不同方法。（使用的模块：请求、JSON、时间）

根据您的问题，我认为您正在尝试获取与 html 页面中的

tdStone

类相关的所有信息。好吧，我发现有比使用硒更好的解决方案，这是我的思维导图：-

您的目标应用程序有一个名为

index.php?route=

（URL：

https://www.diamondsfactory.co.uk/index.php?route=product/product/lazyloadDiamond

）的端点，它充当某种 API 路由并从服务器获取所有详细信息（所有这些信息然后作为 HTML 源存储在

tdStone

类中），因此，如果我们向此端点发送指定目标（例如：符合您标准的钻石）的请求，我们可以借助 python 请求库和一些编码轻松获取这些数据。这是我的代码：

注意：为了避免速率限制问题，我使用

time.sleep(3)

来最小化线程。

import json
import requests
import time
from requests.packages.urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

def getData(content):
    loaded_content = json.loads(content)
    result = loaded_content['stones']
    for i in result:
        shape = i['shape']
        diamond_code = i['diamond_code']
        color = i['color']
        weight = i['weight_display']
        clarity = i['clarity']
        certificate = i['lab']
        image = i['image_url']
        polish = i['polish']
        symmetry = i['symmetry']
        price = i['csprice']
        video_url = i['video_url']
        mm = i['meas']
        depth = i['depth']
        table = i['table']

        print(f"Diamond Shape:  {shape}\nDiamond Color:  {color}\nDiamond Code:  {diamond_code}\nDiamond Weight:  {weight}\nDiamond Depth:  {depth}\nDiamond Table:  {table}\nDiamond MM:  {mm}\nDiamond Certificate:  {certificate}\nDiamond Image:  {image}\nDiamond Video:  {video_url}\nDiamond Polish:  {polish}\nDiamond Symmetry:  {symmetry}\nDiamond Price:  {price}\n====================================")

def sendRequest(url):
    headers = {
        "Content-Type": "application/x-www-form-urlencoded",
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:130.0) Gecko/20100101"
    }
    for i in range(1,100):
        data = f"stone_type=LAB&ring_size=R16_7&metal_purity=GL_18K_W&stone_carat_min=0.20&stone_carat_max=30.00&stone_price_min=100&stone_price_max=5000000&&active_diamond_tab=LAB&page={i}"
        time.sleep(3)
        res = requests.post(url, data=data, verify=False, headers=headers).text
        if '"stone_price_id":' in res:
            getData(res)
        else:
            break

sendRequest('https://www.diamondsfactory.co.uk/index.php?route=product/product/lazyloadDiamond')

如果您只对一种响应感兴趣，您可以分析这些参数以对满足您期望的响应进行排序

&stone_shape=MQS&stone_carat_min=0.20&stone_carat_max=30.00&stone_clarity=&stone_color=&stone_certificate=&stone_cut=&stone_polish=&stone_symmetry=&stone_fluorescence=&stone_price_min=100&stone_price_max=5000000&show_image=&show_video=&show_instock=&show_heart_arrows=&markup=&tax_class_id=10&design_id=49&image_stone=di&side_stone=&metal_purity=GL_18K_W&product_id=15265&ring_size=R16_7&active_diamond_tab=LAB&diamond_code=&edit_product=&order=asc&search=&page=1

一些要点：

我的脚本中没有使用任何排序方法，这取决于你。
如果您想对这些值进行排序，请随意分析我上面提到的参数。
什么是
```
product_id
```
？在对您提到的 URL 进行分析后，我猜产品 ID 是您提到的 URL 的最后五位数字。（
```
17801
```
是
```
product_id
```
的
```
clrn0717801
```
）
我使用循环来提取满足您期望的最大值。（希望如此：（）

希望这会有所帮助

谢谢

无法从动态网站的表中抓取数据

问题描述投票：0回答：1

1个回答

一些要点：

最新问题

无法从动态网站的表中抓取数据

问题描述 投票：0回答：1

1个回答

一些要点：

最新问题

问题描述投票：0回答：1