这是我当前的代码,它很简单。只需一次读取文件,每行就会打印到一个新文件,该文件是原始名称,但要附加_part,每5万行增加一次,一旦完成读取,便会将每个文件名运行到用于处理文件的函数中。但是由于某种原因,它只是抓住每行的末端并打印出10000次(原始文件中的行)。它起初是有效的,我更改了一些内容,开始执行此操作,然后即使我取消了这些更改,它仍会继续执行此操作
const fs = require('fs');
const csv = require('csv-parser');
//File containing unprocessed addresses
let fileName = ("Refinitiv_Address_GBR_10000.csv");
//Country we are looking at address of
let country = "UK";
let fileRead;
let fileWrite;
let fileNum = 1;
DivideFile();
async function DivideFile() {
let lineNum = 0;
fileWrite = fs.createWriteStream(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`);
fileRead = fs.createReadStream(`./Originals/${fileName}`)
.pipe(csv())
//Indicate start of reading
.on('resume', () => {
console.log("Processing file");
})
.on('data', (data) => {
lineNum++;
console.log(Object.values(data).toString());
fs.appendFile(`./Originals/${fileName.split('.')[0]}_part${fileNum}.${fileName.split('.')[1]}`, Object.values(data).toString() + '\n', () => {
//Nothing to go here at the moment
});
if (lineNum == 50000) {
fileNum++;
lineNum = 0;
}
})
.on('end', () => {
for (var file in fileNum) {
RunFunc(`${fileName.split('.')[0]}_part${file}.${fileName.split('.')[1]}`);
}
});
}
这是原始数据的样本。所有信息均来自公共来源,而非私人信息
,,,,GBR,
"Todd Campus, West of Scotland Science Park,Maryhill Road",GLASGOW,UNITED KINGDOM-NA,G20 0UA,GBR,GBR
,,,,GBR,GBR
,,,,GBR,
"Horsfield Way,, Bredbury Industrial Park",STOCKPORT,CHESHIRE,SK6 2SU,GBR,GBR
"Brunel Way, The Nucleus",Dartford,KENT,DA1 5GA,GBR,
,,,,GBR,
,,,,GBR,
5 New Street Square,London,London,EC4A 3TW,GBR,
"Pentwyn Farm, Huntingdon",,,HR5 3PQ,GBR,GBR
124 Horseferry Road,LONDON,UNITED KINGDOM-NA,SW1P 2TX,GBR,GBR
,,,,GBR,
Unit 700 Fareham Reach Fareham Road,,,,GBR,GBR
"Eastwood House, Glebe Road",CHELMSFORD,ESSEX,CM1 1RS,GBR,GBR
Fineshade Abbey,CORBY,NORTHAMPTONSHIRE,NN17 3BA,GBR,GBR
,,,,,GBR
,,,,GBR,
3 Hempstead Close,,ESSEX,IG9 5JQ,GBR,GBR
,,,,GBR,
,,,,,GBR
,,,,GBR,
,,,,GBR,
25 Farringdon Street,LONDON,UNITED KINGDOM-NA,EC4A 4AB,GBR,GBR
100 Wigmore St,London,X0,,GBR,GBR
,,,,GBR,
这是前25行,打印到_part1
GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
,GBR
GBR,GBR
GBR,GBR
GBR,
,GBR
,GBR
GBR,GBR
GBR,
,GBR
GBR,GBR
GBR,GBR
,GBR
,GBR
GBR,GBR
我什至去掉了代码,只打印了每一行,并且一直在这样做
这不是理想的方法,但基本上这是您的代码将其分成漂亮的小块。您应该使用csv-parse
库而不是使用csv-parser
,并在每次循环迭代时更新文件引用。正如其他人提到的,split
Unix函数将是一个不错的选择。我用file.csv