必须从网站提取数据并安排每 15 天刷新一次。每次更新后都必须向特定人员发送有关该更新的邮件。任何建议,我应该如何解决这个问题。
我使用 scrapy 进行网页抓取并存储在 azure blob 中,并在 Azure 函数中编写该网页抓取脚本并计划刷新。并且还触发电子邮件。
您的方法是正确的,您可以使用 scrapy 来抓取数据,然后将数据存储在 Azure 存储帐户 blob 中,并使用下面的 Http 触发功能代码向特定人员发送电子邮件:-
import azure.functions as func
from scrapy import cmdline
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
def main(req: func.HttpRequest) -> func.HttpResponse:
# Scraping process
cmdline.execute("scrapy crawl your_spider_name".split())
# Saving data to Azure Blob Storage
connection_string = "DefaultEndpointsProtocol=https;AccountName=siliconstrg549;AccountKey=gz8TtaPzHsc6qsVpJsHsNyXM2ERh1qb5xtf2NN+5YNyUgIgPeYEoZkRGb3Y2hJVoAZCYUTMNYqGn+AStpqBQUw==;EndpointSuffix=core.windows.net"
container_name = "data"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)
# Upload the scraped data to Azure Blob Storage
with open("path/to/your/scraped/data", "rb") as data:
container_client.upload_blob(name="scraped_data.csv", data=data)
# Sending email notification
sender_email = "[email protected]"
receiver_email = "[email protected]"
smtp_server = "smtp.example.com"
smtp_port = 587
smtp_username = "your_smtp_username"
smtp_password = "your_smtp_password"
message = MIMEMultipart()
message["From"] = sender_email
message["To"] = receiver_email
message["Subject"] = "Data Update Notification"
body = "The data has been updated. You can access it from Azure Blob Storage."
message.attach(MIMEText(body, "plain"))
with smtplib.SMTP(smtp_server, smtp_port) as server:
server.starttls()
server.login(smtp_username, smtp_password)
server.sendmail(sender_email, receiver_email, message.as_string())
return func.HttpResponse(
"Scraping process completed. Check your email for updates.",
status_code=200
)
另一种选择是使用 Power Automate Desktop 运行 Python 脚本进行 Web 抓取,或直接使用流程中的 Web 数据提取任务从网页中提取数据并将其作为电子邮件发送:-