我运行了一些基准测试,将我的新 M3 Pro Macbook Pro 与我的旧 2019 Intel Macbook Pro 进行比较,并且我得到了一些非常糟糕的矩阵运算计时结果。对此有解释吗,要么是由于硬件本身,要么是我设置计算机的方式?
例如:
2,500 x 2,500 cross-product matrix (b = a' * a): 0.0902 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 7.53 (sec).
2.3 GHz 8 核英特尔酷睿 i9
> devtools::session_info(info="all")
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Sonoma 14.4.1
system x86_64, darwin20
[1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library
BLAS /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
lapack /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib
lapack_version 3.11.0
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.4.1
system aarch64, darwin20
BLAS /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
lapack /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib
lapack_version 3.12.0
> benchmarkme::benchmark_std(runs = 10)
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.174 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.577 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.216 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.07 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 3.05 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.503 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.146 (sec).
Sorting of 7,000,000 random values: 0.627 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 0.0902 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.047 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 0.413 (sec).
Determinant of a 2,500 x 2,500 random matrix: 0.495 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.672 (sec).
FFT over 2,500,000 random values: 0.233 (sec).
Inverse of a 1,600 x 1,600 random matrix: 0.575 (sec).
> benchmarkme::benchmark_std(runs = 10)
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.0826 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.192 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.101 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 0.488 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 0.406 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.162 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.0794 (sec).
Sorting of 7,000,000 random values: 0.481 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 7.53 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.629 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 4.13 (sec).
Determinant of a 2,500 x 2,500 random matrix: 1.48 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.349 (sec).
FFT over 2,500,000 random values: 0.0689 (sec).
Inverse of a 1,600 x 1,600 random matrix: 1.16 (sec).
问题是
devtools::session_info
错误地告诉我,当我在 RStudio 中时,我正在使用 Acclerate BLAS,但随后又告诉我,当我在 RStudio 之外运行脚本时,我正在使用标准 R BLAS。
当我使用此处的说明重新链接 Accelerate BLAS 时:https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Which-BLAS-is-used-and-how-can-it -be-changed_003f
糟糕的矩阵性能消失了。