我已经在[[3个星期内]停留在此上了>。我确实尝试过数百次尝试使它起作用。我根本不知道出了什么问题。我已经用尽了所有可能的解决方案。
话虽如此...
应用程序快速摘要
我最近接受了一项辅助工作,开发一个应用程序,每当新的吉普牧马人发布到Facebook Marketplace时,该应用程序都会通知客户。听起来很简单,我接受了这份工作。我估计最多三天就会完成。那是上个月。下面是该应用程序如何工作的流程图:技术
[经过大量研究,我决定选择NodeJs和Puppeteer来开发该应用程序。步骤1:刮擦
我必须熟悉保存所需信息的页面。这是搜索结果页面:您将看到,结果以网格格式列出。因此,我需要抓住存放物品清单的容器...
<!-- This is the DIV that is the container nodelist -->
<div class="bq4bzpyk j83agx80 btwxx1t3 lhclo0ds jifvfom9 muag1w35 dlv3wnog enqfppq2 rl04r1d5">
我通过检查控制台发现了这一点:
我使用相同的方法来查找包含
title
,price和image的元素。该信息是上面包含元素的子节点。第2步:代码
经过大量的试验和错误,我能够获得一个最低的工作版本,该版本将搜寻“ iPhone X”作为搜索查询。效果很好!但是,当我尝试修改此代码时,只需将搜索关键字更改为“ Jeep Wrangler”,它就会失败。这是工作代码的要点:https://gist.github.com/johnsdeveloper/1a7d02554dbfd682ee274b2ef0696f00
我的代码
通过该工作代码,我在下面提出了我的源代码。它不起作用。我每次都会收到此错误:看来,新的列表结果重新显示为空,这意味着“抓取”无效。(node:4928)UnhandledPromiseRejectionWarning:TypeError:无法读取removeDupes上未定义的属性“长度”(C:\ wamp64 \ www \ puppeteer \ index.js:56:32)在initScraper(C:\ wamp64 \ www \ puppeteer \ index.js:86:27)(节点:4928)UnhandledPromiseRejectionWarning:未处理的承诺被拒绝。
这是我的完整代码:
const puppeteer = require('puppeteer');
const jsonfile = require("jsonfile");
var fileName = "./public/data/newjeeps.json";
var changed = false;
var browser;
const getItems = async searchTerm => {
//{headless: false, defaultViewport: null} --> put this in launch() method below as parameter for developtment purposes --> opens up browser window
//const browser = await puppeteer.launch({headless: false, defaultViewport: null});
browser = await puppeteer.launch({
headless: true,
args: ["--no-sandbox"]
});
const page = await browser.newPage();
await page.goto(`https://www.facebook.com/marketplace/tampa/search/?query=jeep%20wrangler&sortBy=creation_time_descend&exact=false`);
await autoScroll(page);
const itemList = await page.waitForSelector('#mount_0_0 > div > div > div.rq0escxv.l9j0dhe7.du4w35lb > div > div > div.j83agx80.cbu4d94t.d6urw2fd.dp1hu0rb.l9j0dhe7.du4w35lb > div > div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.cbu4d94t.d2edcug0.rj1gh0hx.buofh1pr.g5gj957u.hpfvmrgz.dp1hu0rb > div')
.then(() => page.evaluate(() => {
const itemArray = [];
const itemNodeList = document.querySelectorAll('#mount_0_0 > div > div > div.rq0escxv.l9j0dhe7.du4w35lb > div > div > div.j83agx80.cbu4d94t.d6urw2fd.dp1hu0rb.l9j0dhe7.du4w35lb > div > div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.cbu4d94t.d2edcug0.rj1gh0hx.buofh1pr.g5gj957u.hpfvmrgz.dp1hu0rb > div > div > div > div > div > div > span > div > a');
itemNodeList.forEach(item => {
console.log(item);
});
return itemArray;
}))
.catch(() => console.log("Selector error."));
return itemList;
}
/**
* Remove any duplicates from JSON files
*
* @param {*} existingResults
* @param {*} newResults
* @returns array Returns new array of unique listings
*/
const removeDupes = async function (existingResults, newResults) {
var existingTitle;
var newTitle;
var newResults;
/* Loop through EXISTING (marketplacebot.json) */
for (var i = 0; i < existingResults.length; i++) {
/* Existing Title & Price TODO*/
existingTitle = existingResults[i].itemTitle;
/* Loop through NEW data */
for (y = 0; y < newResults.length; y++) {
/* New Title */
newTitle = newResults[y].itemTitle;
/* Do we have a match? */
if (existingTitle == newTitle) {
console.log("match");
// Remove from new results
newResults.splice(y, 1);
// Change detected?
changed = true;
}
}
}
return newResults;
}
const initScraper = async() => {
// Get currently listed items on Marketplace
const items = await getItems('Jeep Wrangler');
//items.sort(function(a, b){return a.itemPrice - b.itemPrice});
// Get existing JSON from file
const existing = await jsonfile.readFile(fileName);
// Compare the two, save only the differences
const results = await removeDupes(existing,items);
//console.log(results);
// Now save the differences back to the JSON files, the
// web page will pick up and display.
var success = await jsonfile.writeFile(fileName, results);
// If there were any differences, notify me
if(changed == true){
const page2 = await browser.newPage();
await page2.goto('http://sendmail.com/mail.php');
}
}
initScraper();
// This takes care of the auto scrolling problem
async function autoScroll(page) {
await page.evaluate(async () => {
await new Promise(resolve => {
var totalHeight = 0;
var distance = 100;
var timer = setInterval(() => {
var scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight || scrollHeight>7000) {
clearInterval(timer);
resolve();
}
}, 100);
});
});
}
难以捉摸的NodeList,我已经坚持了3周。实际上,我已经尝试了数百种尝试来使它起作用。我根本无法弄清楚出了什么问题。我用完了...