我有一个任务要在oracle数据库表中插入800万行。我把这个任务扩展到CPU的8个核心上,这里是示例代码。
final long start = System.currentTimeMillis();
final int batchSize = 800000;
final int nCore = Runtime.getRuntime().availableProcessors();
final int batchPerCore = batchSize / nCore;
final CountDownLatch taskCountDown = new CountDownLatch(nCore);
final CountDownLatch kickoffLatch = new CountDownLatch(1);
final ExecutorService es = Executors.newFixedThreadPool(nCore);
Class.forName("oracle.jdbc.driver.OracleDriver");
for (int n = 0; n < nCore; n++) {
es.submit(() -> {
log.info("Thread {} starts working on {} insertion jobs.", Thread.currentThread().getName(), batchPerCore);
try (Connection conn = DriverManager.getConnection(DB_URL, USER_NAME, PASSWORD)) {
Assert.assertNotNull("connection established", conn);
conn.setAutoCommit(false);
PreparedStatement pstmt = conn.prepareStatement(PREP_STATEMENT);
for (int i = 0; i < batchPerCore; i++) {
initPrepTranStatement(pstmt);
pstmt.addBatch();
}
int[] resultTran = pstmt.executeBatch();
Assert.assertEquals(resultTran.length, batchPerCore);
conn.commit();
log.info("Thread {} completed its job!", Thread.currentThread().getName());
taskCountDown.countDown();
} catch (SQLException e) {
e.printStackTrace();
}
});
} // ~ for ~
es.shutdown();
kickoffLatch.countDown();
taskCountDown.await();
log.info("It takes {} milliseconds to complete all {} tasks for {} insertions each. ",
(System.currentTimeMillis()-start), nCore, batchPerCore);
当它进行时,它显示3个任务完成了 在下午2: 09运行课程, 但其他的仍然是待定的 在写这篇文章时(3: 29PM)。
这里是日志。
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-7 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-2 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-5 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-8 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-4 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-3 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-6 starts working on 100000 insertion jobs.
Apr 27, 2020 1:25:17 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-1 starts working on 100000 insertion jobs.
Apr 27, 2020 2:09:53 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-4 completed its job!
Apr 27, 2020 2:09:53 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-6 completed its job!
Apr 27, 2020 2:09:53 PM com.db.loader.DBFlexDataLoadingPerfTest lambda$4
INFO: Thread pool-1-thread-7 completed its job!
在这种情况下使用DriverManager并不是一个好主意,我应该使用DataSource实现,尽管如此,它并不相关,也不影响本次测试的目的。
鉴于所有的任务在工作性质和运行方式上都是一样的,我无法理解为什么有些任务严重落后于其他任务,更有可能只是挂在那里。
没有设置特别的JVM参数。机器是Windows 10 pro,8核i7-6700,32G内存。
线程是不确定的。它不能保证所有的任务都会被所有线程均匀地执行。在最悲观的情况下,这些任务甚至可以被序列化(不太可能,但不是不可能)。这一切都取决于JVM的实现和操作系统的调度。
除此之外,还有很多实现上的依赖性因素,这些因素在提供的代码片段中是不可见的(例如,驱动或数据库处理并行批次提交)。