我试图从中抓取的网站是一个市场,有许多带有输出结果的页面。 当抓取 importxml 公式对定义的页面有效时(例如 https://cars.av.by/filter?brands[0][brand]=6&brands[0][model]=1992&year[min]=2010&year[max ]=2018&页=3)
有没有办法解决importxml公式从所有页面获取数据的问题?
同样的请求在这里: 包含多个自动生成的 importxml 查询的列表
我的第 1 页的公式示例: =FLATTEN(IMPORTXML("https://cars.av.by/filter?brands[0][brand]=6&brands[0][model]=1992&year[min]=2010&year[max]=2018&page=1"," //主//h3/a"))
用途:
=INDEX(QUERY(REGEXREPLACE(HLOOKUP("×", {"×"; QUERY(FLATTEN(IFERROR(SPLIT(FLATTEN({
IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=1", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""});
IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=2", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""});
IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=3", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""});
IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=4", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""});
IFERROR(ARRAY_CONSTRAIN(IMPORTXML(B1&"&page=5", "//div[@class='listing__items']/div/div/div"), 9^9, 3), {"","",""})}), ", ", ))),
"where Col1 is not null and not Col1 ends with 'VIN' and not Col1 contains 'ТОП'", )},
SEQUENCE(5*ROWS(B:B)/11, 11, 2), ), " г.| \$$|^≈ ", ),
"select Col1,Col3,Col8,Col5,Col6,Col7,Col10 where Col1 is not null and not Col1 = '#REF!'", ))