无法使用querySelector在文档中找到节点

问题描述 投票:1回答:1

难以捉摸的NodeList


我已经在[[3个星期内]停留在此上了>。我确实尝试过数百次尝试使它起作用。我根本不知道出了什么问题。我已经用尽了所有可能的解决方案。

话虽如此...

应用程序快速摘要

我最近接受了一项辅助工作,开发一个应用程序,每当新的吉普牧马人发布到Facebook Marketplace时,该应用程序都会通知客户。听起来很简单,我接受了这份工作。我估计最多三天就会完成。那是上个月。下面是该应用程序如何工作的流程图:

Facebook Marketplace Scraper Flowchart

技术

[经过大量研究,我决定选择NodeJsPuppeteer来开发该应用程序。

步骤1:刮擦

我必须熟悉保存所需信息的页面。这是搜索结果页面:

https://www.facebook.com/marketplace/tampa/search/?query=jeep%20wrangler&sortBy=creation_time_descend&exact=false

您将看到,结果以网格格式列出。因此,我需要抓住存放物品清单的容器...

Facebook Search Results Container

<!-- This is the DIV that is the container nodelist --> <div class="bq4bzpyk j83agx80 btwxx1t3 lhclo0ds jifvfom9 muag1w35 dlv3wnog enqfppq2 rl04r1d5">

我通过检查控制台发现了这一点:

Inspecting

我使用相同的方法来查找包含

title

priceimage的元素。该信息是上面包含元素的子节点。
Sample Listing

第2步:代码

经过大量的试验和错误,我能够获得一个最低的工作版本,该版本将搜寻“ iPhone X”作为搜索查询。效果很好!但是,当我尝试修改此代码时,只需将搜索关键字更改为“ Jeep Wrangler”,它就会失败。这是工作代码的要点:

https://gist.github.com/johnsdeveloper/1a7d02554dbfd682ee274b2ef0696f00

我的代码

通过该工作代码,我在下面提出了我的源代码。它不起作用。我每次都会收到此错误:

(node:4928)UnhandledPromiseRejectionWarning:TypeError:无法读取removeDupes上未定义的属性“长度”(C:\ wamp64 \ www \ puppeteer \ index.js:56:32)在initScraper(C:\ wamp64 \ www \ puppeteer \ index.js:86:27)(节点:4928)UnhandledPromiseRejectionWarning:未处理的承诺被拒绝。

看来,新的列表结果重新显示为空,这意味着“抓取”无效。

这是我的完整代码:

const puppeteer = require('puppeteer'); const jsonfile = require("jsonfile"); var fileName = "./public/data/newjeeps.json"; var changed = false; var browser; const getItems = async searchTerm => { //{headless: false, defaultViewport: null} --> put this in launch() method below as parameter for developtment purposes --> opens up browser window //const browser = await puppeteer.launch({headless: false, defaultViewport: null}); browser = await puppeteer.launch({ headless: true, args: ["--no-sandbox"] }); const page = await browser.newPage(); await page.goto(`https://www.facebook.com/marketplace/tampa/search/?query=jeep%20wrangler&sortBy=creation_time_descend&exact=false`); await autoScroll(page); const itemList = await page.waitForSelector('#mount_0_0 > div > div > div.rq0escxv.l9j0dhe7.du4w35lb > div > div > div.j83agx80.cbu4d94t.d6urw2fd.dp1hu0rb.l9j0dhe7.du4w35lb > div > div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.cbu4d94t.d2edcug0.rj1gh0hx.buofh1pr.g5gj957u.hpfvmrgz.dp1hu0rb > div') .then(() => page.evaluate(() => { const itemArray = []; const itemNodeList = document.querySelectorAll('#mount_0_0 > div > div > div.rq0escxv.l9j0dhe7.du4w35lb > div > div > div.j83agx80.cbu4d94t.d6urw2fd.dp1hu0rb.l9j0dhe7.du4w35lb > div > div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.cbu4d94t.d2edcug0.rj1gh0hx.buofh1pr.g5gj957u.hpfvmrgz.dp1hu0rb > div > div > div > div > div > div > span > div > a'); itemNodeList.forEach(item => { console.log(item); }); return itemArray; })) .catch(() => console.log("Selector error.")); return itemList; } /** * Remove any duplicates from JSON files * * @param {*} existingResults * @param {*} newResults * @returns array Returns new array of unique listings */ const removeDupes = async function (existingResults, newResults) { var existingTitle; var newTitle; var newResults; /* Loop through EXISTING (marketplacebot.json) */ for (var i = 0; i < existingResults.length; i++) { /* Existing Title & Price TODO*/ existingTitle = existingResults[i].itemTitle; /* Loop through NEW data */ for (y = 0; y < newResults.length; y++) { /* New Title */ newTitle = newResults[y].itemTitle; /* Do we have a match? */ if (existingTitle == newTitle) { console.log("match"); // Remove from new results newResults.splice(y, 1); // Change detected? changed = true; } } } return newResults; } const initScraper = async() => { // Get currently listed items on Marketplace const items = await getItems('Jeep Wrangler'); //items.sort(function(a, b){return a.itemPrice - b.itemPrice}); // Get existing JSON from file const existing = await jsonfile.readFile(fileName); // Compare the two, save only the differences const results = await removeDupes(existing,items); //console.log(results); // Now save the differences back to the JSON files, the // web page will pick up and display. var success = await jsonfile.writeFile(fileName, results); // If there were any differences, notify me if(changed == true){ const page2 = await browser.newPage(); await page2.goto('http://sendmail.com/mail.php'); } } initScraper(); // This takes care of the auto scrolling problem async function autoScroll(page) { await page.evaluate(async () => { await new Promise(resolve => { var totalHeight = 0; var distance = 100; var timer = setInterval(() => { var scrollHeight = document.body.scrollHeight; window.scrollBy(0, distance); totalHeight += distance; if (totalHeight >= scrollHeight || scrollHeight>7000) { clearInterval(timer); resolve(); } }, 100); }); }); }

难以捉摸的NodeList,我已经坚持了3周。实际上,我已经尝试了数百种尝试来使它起作用。我根本无法弄清楚出了什么问题。我用完了...
javascript node.js dom puppeteer
1个回答
0
投票
那些类是动态的。您要按页面上的其他属性或文本显示。
© www.soinside.com 2019 - 2024. All rights reserved.