解析 Apache 日志文件

问题描述 投票:0回答:1

我有一个如下所示的 Apache 日志文件:

64.242.88.10 - - [08/Mar/2004:07:56:34 -0800] "GET /twiki/bin/search/Main/SearchResult?scope=text&regex=on&search=Web%20*Index[^A-Za-z] HTTP/1.1" 200 4163

64.242.88.10 - - [08/Mar/2004:08:04:46 -0800] "GET /SpamAssassin.html HTTP/1.1" 200 7368
p5083cd5d.dip0.t-ipconnect.de - - [08/Mar/2004:08:09:32 -0800] "GET /SpamAssassin.html HTTP/1.0" 200 7368

64.242.88.10 - - [08/Mar/2004:08:12:50 -0800] "GET /twiki/bin/view/TWiki/ChangePassword?rev=r1.6 HTTP/1.1" 200 5181

64.242.88.10 - - [08/Mar/2004:08:14:15 -0800] "GET /twiki/bin/edit/TWiki/HaroldGottschalk?t=1078717948 HTTP/1.1" 401 12846

64.242.88.10 - - [08/Mar/2004:08:15:21 -0800] "GET /twiki/bin/edit/Main/Expand_owner_alias?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846

我试图获取每行的总字节数(每行末尾的最后一个数字序列)并将它们相加以获得传输的总字节数

我的代码如下所示:

static String logEntryFormat = "^(\\S+) (\\S+) (\\S+) \\\[(.*?)\\\] "(.*?)" (\\S+) (\\S+)( "(.*?)" "(.*?)")?";

public static void countTotalBytes() throws IOException {

    // First read file into an arrayList
    Scanner scanner = new Scanner(accessLog);
    ArrayList\<String\> stringList = new ArrayList\<String\>();
    while (scanner.hasNextLine()){
    stringList.add(scanner.nextLine());

    scanner.close();

    //Convert to string
    String listString = String.join(" ", stringList);

    Pattern findBytes = Pattern.compile(logEntryFormat);
    Matcher matchBytes = findBytes.matcher(listString);

    // Test
    if(matchBytes.matches()) {
    System.out.println(matchBytes.group(7));
    }
}

我最终想抓取这个值并将其转换为int,然后计算总计。但是在我的输出中,我只收到“7368”的字节值。

我不确定我的做法是否完全错误,但我已经坚持了几个小时了。

java regex apache parsing
1个回答
0
投票

由于方法

countTotalBytes()
的目标是对位于每个文件数据行末尾的所有字节值求和,因此也许它应该返回该结果,例如:
int totalBytes = countTotalBytes("LogFile.log");
。现在您可以使用
totalBytes
做任何您喜欢的事情。

此外,您确实不需要将文件加载到 ArrayList 中来执行此任务。只需在读取文件时对字节值求和,并在文件完全读取后返回总数。

这是一个例子:

public static long countTotalBytes(String accessLog) throws IOException {
    int totalByteCount = 0;
    // Try With Resources used here to auto-close the reader:
    try (Scanner reader = new Scanner(new File(accessLog))) {
        String line;
        while (reader.hasNextLine()) {
            line = reader.nextLine().trim();
            // Skip IF current line is blank:
            if (!line.isEmpty()) {
                // Split the data line based on a whitespace delimiter (one or more):
                String[] lineParts = line.split("\\s+");
                /* Get the bytes value from the end of the data line:
                   It would be the last element in the above generated
                   array:                    */
                String numBytes = lineParts[lineParts.length - 1];
                /* Is the numerical value we pulled out a string representation
                   of a signed or unsigned integer value. If so, convert it to 
                   a Integer data type and add it to totalByteCount:        */
                if (numBytes.matches("-?\\d+")) {
                    totalByteCount += Integer.parseInt(numBytes);
                }
            }
        }
    }
    return totalByteCount;
}

使用上述方法:

try {
    System.out.println(countTotalBytes("AccessLog.log"));
}
catch (IOException ex) {
    System.err.println(ex.getMessage());
}

根据您提供的数据文件信息,控制台窗口应显示:

49772

© www.soinside.com 2019 - 2024. All rights reserved.