尝试从网站抓取网页,但销售未显示

问题描述 投票:0回答:1

我尝试从此网站进行网络抓取以获取艺术家歌曲、姓名和销量。但从我下面编写的代码中,我确实获得了艺术家姓名和歌曲,但销售额并未显示在我的 csv 文件中。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv

url = 'https://circlechart.kr/page_chart/album.circle?nationGbn=T&targetTime=2012&hitYear=2012&termGbn=year&yearTime=3'

# Initialize WebDriver

driver = webdriver.Chrome()
driver.get(url)

# Wait until the table is loaded

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'ChartTablePC')))

# Locate the table

table = driver.find_element(By.CLASS_NAME, 'ChartTablePC')
rows = table.find_elements(By.TAG_NAME, 'tr')

# Print structure of each row to verify

for row in rows[:5]:  # Print the first 5 rows for inspection
cells = row.find_elements(By.TAG_NAME, 'td')
print([cell.text for cell in cells])

# Define headers

HEADERS = ['Rank', 'Album Artist', 'Sales']

# Open CSV file for writing

with open('EXOalbumSales2012.csv', 'w', newline='', encoding='utf-8') as outfile:
writer = csv.writer(outfile)
writer.writerow(HEADERS)

    # Iterate over rows
    for row in rows[1:]:  # Skip the header row
        cells = row.find_elements(By.TAG_NAME, 'td')
        if len(cells) >= 5:  # Ensure there are enough columns
            rank = cells[0].text.strip()
            album_artist = cells[2].text.strip()
            sales = cells[4].text.strip()
            
            data_out = [rank, album_artist, sales]
            writer.writerow(data_out)
            print(data_out)

# Close the WebDriver

driver.quit()

预期输出应该是 csv 文件中的艺术家姓名、歌曲标题和专辑销量

python selenium-webdriver web-scraping beautifulsoup
1个回答
0
投票

更换

sales = cells[4].text.strip()

sales = cells[3].get_attribute("textContent").strip()

销售列的索引应该是 3 而不是 4,您需要使用

get_attribute("textContent")

获取文本

完整代码

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv

url = 'https://circlechart.kr/page_chart/album.circle?nationGbn=T&targetTime=2012&hitYear=2012&termGbn=year&yearTime=3'

# Initialize WebDriver

driver = webdriver.Chrome()
driver.get(url)

# Wait until the table is loaded

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'ChartTablePC')))

# Locate the table

table = driver.find_element(By.CLASS_NAME, 'ChartTablePC')
rows = table.find_elements(By.TAG_NAME, 'tr')

# Print structure of each row to verify

for row in rows[:5]:  # Print the first 5 rows for inspection
    cells = row.find_elements(By.TAG_NAME, 'td')
    print([cell.text for cell in cells])

# Define headers

HEADERS = ['Rank', 'Album Artist', 'Sales']

# Open CSV file for writing

with open('EXOalbumSales2012.csv', 'w', newline='', encoding='utf-8') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(HEADERS)

    # Iterate over rows
    for row in rows[1:]:  # Skip the header row
        cells = row.find_elements(By.TAG_NAME, 'td')
        if len(cells) >= 5:  # Ensure there are enough columns
            rank = cells[0].text.strip()
            album_artist = cells[2].text.strip()
            sales = cells[3].get_attribute("textContent").strip()
            
            data_out = [rank, album_artist, sales]
            writer.writerow(data_out)
            print(data_out)

# Close the WebDriver

driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.