使用Python的href链接递归下载XML页面[关闭]

Question

我有一个带有href链接的XML页面，它引导我进入下一页，最后一个XML页面没有href元素。我需要递归下载所有XML并搜索可以帮助我快速执行此任务的相关Python代码。

任何帮助？

Answer 1

您可以使用以下代码收集，访问或执行您想要使用从后续页面获得的href的任何操作：

import xml.etree.ElementTree as ET
import os
import requests
from requests.auth import HTTPBasicAuth

def iterate_xml_automate(link):
#Parent page parsing
all_href = []
all_href.append(link)
tree = ET.fromstring(requests.get(link,
                     auth= HTTPBasicAuth('login', 'Password')).text.encode('utf-8'))   # Parser object
#accessing href component from the XML tree
href = [link.attrib['href'] for link in tree.iter('link')]
all_href.append(href) 
#Run the while loop till you find a href element in the successive xml file
while (len(href)!= 0):
    tree_1 = ET.fromstring(requests.get(str(href[0]),
                                      auth=HTTPBasicAuth('login', 'Password')).text.encode('utf-8'))
    #Update href for accessing next xml link
    href = [link.attrib['href'] for link in tree_1.iter('link')]
    all_href.appned(href)

#Returns all the href from subsequent pages
return href

使用Python的href链接递归下载XML页面[关闭]

问题描述投票：0回答：1

1个回答

最新问题

使用Python的href链接递归下载XML页面[关闭]

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1