使用beautifulsoup python在特定页面中获取标记时出现问题

问题描述 投票:2回答:1

我试图从这个页面www.toctoc.com获取每个帖子的信息与此代码:

page = requests.get('website_url') #website url was too long
soup = BeautifulSoup(page.content, 'html.parser')

name_box = soup.find_all('div', attrs={'class': 'item'})

输出:[]

有谁知道如何在每个类(每个帖子)中找到所有代码?

Screenshot of website with inspection tool

python web-scraping
1个回答
0
投票

Javascript必须在页面上运行。你可以硒等待这些元素都存在。然后访问特定元素。我只是为你的班级展示顶级水平

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.toctoc.com/search/index2/?dormitorios=0&banos=0&superficieDesde=0&superficieHasta=0&precioDesde=0&precioHasta=0&moneda=UF&tipoArriendo=true&tipoVentaUsado=false&tipoVentaNuevo=false&casaDepto=8&ordenarPorMoneda=UFCLP&ordenarDesc=false&ordernarPorFechaPublicacion=false&ordernarPorSuperficie=false&ordernarPorPrecio=false&pagina=1&esMobile=false&textoBusqueda=Regi%C3%B3n%20Metropolitana&textoOriginal=Regi%C3%B3n%20Metropolitana&tipoVista=lista&viewport=-71.715363%2C-34.29047%2C-69.769737%2C-32.922085&comuna=&region=Regi%C3%B3n%20Metropolitana%20de%20Santiago&atributos=&idle=true&zoom=7.053707424896949&buscando=true&vuelveBuscar=false&dibujaPoligono=true&resetMapa=true&animacion=false&idZonaHomogenea=0&esPrimeraBusqueda=false'
driver = webdriver.Chrome()
driver.get(url)
items = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".item")))
for item in items:
    print(item.text)
#driver.quit()
© www.soinside.com 2019 - 2024. All rights reserved.