以下是在 hdfs 中生成的 CSV 文件的内容(运行 Spark 作业后),然后移动到远程服务器。
H1|EDHSADB2|2022-08-11 11:10:23|||
H2|RMT_ACC_NM|RMT_ACC_SRT_CD|PDD_TYP_CD|PDD_IGT_VL|REC_LOD_TS
DL|08510968|771103|FPFQP |{"insights":[{"ACC_NM":"00000000","BNC_CD":"772900","BNM":"LAND SERVICES","TRN_REF_TX":"5404633777075358","PMT_AMT":33.0,"LST_PMT_DT":"2022-08-05","AGG_PMT_AMT":67.18,"PMT_CNT":3},
{"ACC_NM":"00213765","BNC_CD":"802045","BNM":"AQUA MASTERCARD","TRN_REF_TX":"5404633753077758","LST_PMT_AMT":56.78,"LST_PMT_DT":"2022-08-06","AGG_PMT_AMT":272.16,"PMT_CNT":6},
{"ACC_NM":"00213765","BNC_CD":"802045","BNM":"AFIX CARD SERVICES","TRN_REF_TX":"5434298992481650","LST_PMT_AMT":56.78,"LST_PMT_DT":"2022-08-06","AGG_PMT_AMT":272.16,"PMT_CNT":6}]}|
T1|0000000003||||
如何提取上面的 0000000003 的 Tail 信息并使用 Java 来验证预期数据。
此方法以反向模式读取文本文件,仅打印所需的字段。
它计算字段分隔符 |并获取第四个和第五个分隔符之间的字节。
结果
300亿00000
来源
package com.github.btafarelo;
import java.io.File;
import java.io.RandomAccessFile;
public class TailFile {
public static void main(String[] args) throws Exception {
File file = new File("/home/btafarelo/tail.txt");
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
long fileLength = file.length() - 1;
randomAccessFile.seek(fileLength);
int delimiters = 0;
char c;
StringBuilder data = new StringBuilder();
for(long pointer = fileLength; pointer >= 0; pointer--){
randomAccessFile.seek(pointer);
c = (char)randomAccessFile.read();
if (c == '|') {
delimiters++;
continue;
}
if (delimiters == 4)
data.append(c);
else if (delimiters == 5)
break;
}
System.out.println(data.toString());
randomAccessFile.close();
}
}