如何在Java中加速大文件的读写和压缩解压缩

Question

任务是压缩/解压缩非常大的数据> 2G，这些数据不能由单个String或ByteArray保存。我的解决方案是将压缩/解压缩的数据块逐块写入文件中。它可以工作，但是不够快。

Compress：纯文本文件-> gzip-> base64编码->压缩文件解压缩：压缩文件-> base64解码-> gunzip->纯文本文件

在具有16G内存的笔记本电脑上的测试结果。

Created compressed file, takes 571346 millis
Created decompressed file, takes 378441 millis

代码块

public static void compress(final InputStream inputStream, final Path outputFile) throws IOException {
    try (final OutputStream outputStream = new FileOutputStream(outputFile.toString());
        final OutputStream base64Output = Base64.getEncoder().wrap(outputStream);
        final GzipCompressorOutputStream gzipOutput = new GzipCompressorOutputStream(base64Output);
        final BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {

      reader.lines().forEach(line -> {
        try {
          gzipOutput.write(line.getBytes());
          gzipOutput.write(System.getProperty("line.separator").getBytes());
        } catch (final IOException e) {
          e.printStackTrace();
        }
      });
    }
  }

  public static void decompress(final InputStream inputStream, final Path outputFile) throws IOException {
    try (final OutputStream outputStream = new FileOutputStream(outputFile.toString());
        final GzipCompressorInputStream gzipStream = new GzipCompressorInputStream(Base64.getDecoder().wrap(inputStream));
        final BufferedReader reader = new BufferedReader(new InputStreamReader(gzipStream))) {

      reader.lines().forEach(line -> {
        try {
          outputStream.write(line.getBytes());
          outputStream.write(System.getProperty("line.separator").getBytes());
        } catch (final IOException e) {
          e.printStackTrace();
        }
      });
    }
  }

此外，我尝试在将数据发送到文件时进行批量写入，但没有看到太大的改进。

StringBuilder stringBuilder = new StringBuilder();
final int chunkSize = Integer.MAX_VALUE / 1000;

String line;
int counter = 0;
while((line = reader.readLine()) != null) {
  counter++;
  stringBuilder.append(line).append(System.getProperty("line.separator"));
  if(counter >= chunkSize) {
    gzipOutput.write(stringBuilder.toString().getBytes());
    counter = 0;
    stringBuilder = new StringBuilder();
  }
}

if (counter > 0) {
  gzipOutput.write(stringBuilder.toString().getBytes());
}

问题：1.寻找有关如何加快整体流程的建议2.瓶颈将是什么？

Answer 1

大文件总是需要一些时间，但是我看到了两个重要的机会：

如果可能，请删除Base64步骤。它使文件更大，并且更大的数据花费更多的时间来读取/写入。还有base64转换本身的成本。
请勿使用基于line的IO。实际上根本不使用字符串。搜索换行符并在纯字节和string对象之间转换数据会花费时间，并且在这里没有用：撤消工作，而且数据实际上不是line形式的事实使用，这只是一种任意的方式来分割数据。

为了获得更快的流到流副本，您可以使用例如IOUtils.copy(in, out)（在Apache Commons中看起来也已经在使用），或者自己实施类似的策略：读取数据块放入byte[]（几个KB，不是很小的东西），然后将其写到输出流，直到输入全部被读取。

如何在Java中加速大文件的读写和压缩解压缩

问题描述投票：0回答：1

1个回答

最新问题

如何在Java中加速大文件的读写和压缩解压缩

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1