有没有一种更快的方法可以在不使用硒的情况下实现此网络抓取？

Question

我有一段Python代码，用于将一段英文文本输入到网站中（https://edu.visl.dk/visl/en/parsing/automatic/trees.php），并提取语法树从中汲取。到目前为止，我一直在硒中这样做：

options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        service = Service(executable_path="c:\Program Files (x86)\chromedriver.exe")
        driver = webdriver.Chrome(service=service, options=options)
    
        driver.get("https://edu.visl.dk/visl/en/parsing/automatic/trees.php")
        form = driver.find_element(By.NAME, "theform")
        dropdown = form.find_element(By.NAME, "visual")
        drop = Select(dropdown)
        drop.select_by_visible_text("Vertical")
        search = form.find_element(By.NAME, "text")
        search.send_keys("John killed the cat with a hammer.")
        submit = form.find_element(By.TAG_NAME, "input")
        submit.click()
        results = driver.find_element(By.TAG_NAME, "pre")
        soup = str(bs(results.get_attribute("innerHTML"), "html.parser"))

但是，我需要一遍又一遍（循环）执行此操作，这对于我的目的来说太慢了。有没有更快的方法来做到这一点？

Answer 1

使用

requests

代替：

import requests
from bs4 import BeautifulSoup

payload = {
    'text': 'John killed the cat with a hammer.',
    'export': 'Export and Download',
    'parser': 'tree',
    'visual': 'vertical',
    'symbol': 'default',
}

url = 'https://edu.visl.dk/visl/en/parsing/automatic/trees.php'
response = requests.post(url, data=payload)

soup = BeautifulSoup(response.text, 'html.parser')
result = soup.body.pre.get_text(strip=True)

print(result)

有没有一种更快的方法可以在不使用硒的情况下实现此网络抓取？

问题描述投票：0回答：1

1个回答

最新问题

有没有一种更快的方法可以在不使用硒的情况下实现此网络抓取？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1