如何使用Python填写JavaScript表单?

问题描述 投票:0回答:1

我想用Python来填写this表格。

我尝试使用 Mechanize,但这是一个 Microsoft 表单,它使用 JavaScript,没有表单标签,也没有 GET/POST URL。也许 BeautifulSoup/Selenium 可以做到这一点,但我没有任何抓取 JS 表单的经验。谁能帮助我并建议如何解决这个问题?

这是我尝试过的方法,Mechanize 无法识别页面上的任何表单:

import mechanize

def main():
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.set_handle_refresh(False)
    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
    response  = br.open("https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u")
    for form in br.forms():
        print("Form name:", form.name) #prints nothing
        print(form) #prints nothing

if __name__ == '__main__':
    main()
selenium web-scraping beautifulsoup scrapy mechanize
1个回答
1
投票

硒效果很好。

您需要安装组件

  • 安装硒
    pip install selenium
  • 您需要确保为您的浏览器和操作系统版本下载正确的 chromedriver(或其他驱动程序),并将其添加到路径
然后运行:

from selenium import webdriver driver = webdriver.Chrome() url = "https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u" driver.get(url) name = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='NAME']]/following-sibling::*//input") name.send_keys("hello, World") setionSelection = "F" section = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='Section']]/following-sibling::*//input[@value='" + setionSelection + "']") section.click() date = driver.find_element_by_xpath("//input[contains(@placeholder, 'Please input date')]") date.send_keys("01/12/2020") submit = driver.find_element_by_xpath("//div[text()='Submit']") submit.click()
xapth 有点长,但它们基于问题文本,因此可能稳定

Working selenium


另一种方法 - 当你说没有 POST url 时,你检查过 devtools 了吗? - 这暴露了表单的目的地:

Request URL: https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses Request Method: POST
它还公开了有效负载...这是第一次提交:

{startDate: "2020-08-17T10:40:18.504Z", submitDate: "2020-08-17T10:40:18.507Z",…} answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"Hello, World"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-28"}]" startDate: "2020-08-17T10:40:18.504Z" submitDate: "2020-08-17T10:40:18.507Z"
那些帖子 URL UUID/GUID 问题 ID 似乎对于此表单来说是固定的。每次我跑步时,它们都不会改变。这是第二次运行:

{startDate: "2020-08-17T10:43:48.544Z", submitDate: "2020-08-17T10:43:48.546Z",…} answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"test me"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-12"}]" startDate: "2020-08-17T10:43:48.544Z" submitDate: "2020-08-17T10:43:48.546Z"
一旦您捕获了这一点,您可能就可以通过 API 来完成此操作,而无需 GUI。

...只是为了确保,我尝试了并且获得了成功...

enter image description here

import requests url = "https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses" myobj = {"startDate":"2020-08-17T10:48:40.118Z","submitDate":"2020-08-17T10:48:40.121Z","answers":"[{\"questionId\":\"r8f09d63e6f6f42feb2f8f4f8ed3f9389\",\"answer1\":\"Hello again, World\"},{\"questionId\":\"r28fe12073dfa47399f8ce95ae679dccf\",\"answer1\":\"F\"},{\"questionId\":\"r8f9e9fedcc2e410c80bfa1e0e3ef9750\",\"answer1\":\"2020-08-26\"}]"} x = requests.post(url, data = myobj)
我的答案只是硬编码到数据对象中,但它似乎有效。

如果你还没有的话记得点

install requests


    

© www.soinside.com 2019 - 2024. All rights reserved.