Web Scraping脚本无法正常工作

问题描述 投票:0回答:1

我一直在尝试构建一个Web Scraping脚本来监视网站html中的任何更改,在看到更改后,它比电子邮件和文本更改我。我遇到的问题是脚本没有看到任何更改,它只是在60秒后重新启动。根本没有错误。 idk如果我错过了代码中的某些东西,不让它搜索,只是继续前进并重新启动。

下面是代码:

import time
print('>>> Time Imported')
time.sleep(1)
from bs4 import BeautifulSoup as soup
print('>>> BeautifulSoup Imported')
time.sleep(1)
import requests
print('>>> Requests Imported')
time.sleep(1)
import ssl
print('>>> SSL Imported')
time.sleep(1)
import smtplib
print('>>> smtplib Imported')
time.sleep(1)
from lxml import html
print('>>> LMXL and HTML Imported')
time.sleep(1)
from twilio.rest import Client
print('Twilio Imported')
time.sleep(1)
# End Imports

#start Script
while True:
    url = 'http://A****.com'
    print('>>> We have connected to ' +url)
    time.sleep(1)

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    print('>>> Headers Initiating')
    time.sleep(1)

    page_response = requests.get(url, timeout=5)
    print('>>> We got a response from ' +url)
    time.sleep(1)

    page_content = soup(page_response.content, "html.parser") # Takes 1 Min 48 Seconds to run
    print('>>> Content Imported')
    time.sleep(2)

    print('>>> To prove i have connected, here is ' +url+ ' headers')
    time.sleep(2)
    print(' ')
    print(page_content.title)
    #tree = html.fromstring(page_response.content)
    #price = tree.xpath('//span[@class="bid-price-val current-bid"]/text()')
    #print(price)
    time.sleep(2)
    print(' ')
    time.sleep(1)
    print('>>> Initiating WebMonitor, If a change is found. That will be the next line')
    time.sleep(7)

    if str(soup).find('["330000"]') == -1:
        time.sleep(60)                       #The script restarts here 
                                             #never sees the change
                                             #Even tho there was one
        continue
    else:
        print('>>> Theres been a change in '+url)
        from twilio.rest import TwilioRestClient
        accountSID = 'A*******'
        authToken = 'a********'
        twilioCli = TwilioRestClient(accountSID, authToken)
        myTwilioNumber = '1******'
        myCellPhone = '7*****'
        message = client.messages.create(
            body = "There has been a change at "+url,
            from_= "+14955551234",
            to = "7862199047",
            )

        print(message.sid)

        msg = 'Subject: This is the script talking, Check '+url
        fromaddr = 'r****'
        toaddrs = ['m****','2','3']

        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server.login("r****", 'r****')

        print('From: ' + fromaddr)
        print('To: ' + str(toaddrs))
        print('Message: ' + msg)
        server.sendmail(fromaddr, toaddrs, msg)
        server.quit()
        break
    #def monitor():
python python-3.x web-scraping
1个回答
0
投票

看起来好像你的问题在这一行:

 if str(soup).find('["330000"]') == -1:

当你说str(soup)时,你正在尝试将Beautiful Soup类转换为字符串。这不会很好;它只是创建一个类似"<class 'bs4.BeautifulSoup'>"的字符串。在该字符串上使用汤的find()方法永远不会找到匹配项,因此无论是否有任何更改,结果始终为-1。

© www.soinside.com 2019 - 2024. All rights reserved.