导入xml公式帮助从网页抓取解决分页问题

问题描述 投票:0回答:2

我试图从中抓取的网站是一个市场,有许多带有输出结果的页面。 当抓取 importxml 公式对定义的页面有效时(例如 https://cars.av.by/filter?brands[0][brand]=6&brands[0][model]=1992&year[min]=2010&year[max ]=2018&页=3)

有没有办法解决importxml公式从所有页面获取数据的问题?

同样的请求在这里: 包含多个自动生成的 importxml 查询的列表

我的第 1 页的公式示例: =FLATTEN(IMPORTXML("https://cars.av.by/filter?brands[0][brand]=6&brands[0][model]=1992&year[min]=2010&year[max]=2018&page=1"," //主//h3/a"))

xml google-sheets pagination google-sheets-formula
2个回答
0
投票

尝试:

=INDEX(FLATTEN(SPLIT(QUERY(IFNA(BYROW(A1&SEQUENCE(20), 
 LAMBDA(x, "×"&TEXTJOIN("×", 1, IMPORTXML(x, "//main//h3/a"))))),,9^9), "×")))

enter image description here


0
投票

用途:

=INDEX(QUERY(REGEXREPLACE(HLOOKUP("×", {"×"; QUERY(FLATTEN(IFERROR(SPLIT(FLATTEN({
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=1", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=2", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=3", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=4", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""}); 
 IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=5", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""})}), ", ", ))), 
 "where Col1 is not null and not Col1 ends with 'VIN' and not Col1 contains 'ТОП'", )}, 
 SEQUENCE(5*ROWS(B:B)/11, 11, 2), ), " г.| \$$|^≈ ", ), 
 "select Col1,Col3,Col8,Col5,Col6,Col7,Col10 where Col1 is not null and not Col1 = '#REF!'", ))

0

演示表

© www.soinside.com 2019 - 2024. All rights reserved.