matrix-multiplication 相关问题

与矩阵乘法有关的问题，尤其是实现。数学问题应该考虑线性代数标签。

为什么C和Fortran之间的性能差异很大？我正在对矩阵操作的Fortran和C编程语言进行比较。这次，我编写了两个都在做同一件事的文件（matmul.c and matmul.f90），即乘以matri ...

在我已经实现的两个文件中，我考虑了有效的内存访问以提高性能。但是，对于矩阵A和B的繁殖，FORTRAN的速度明显更快。当N等于1024时，Fortran的速度约为三倍。在这种情况下，C需要大约0.57秒才能完成矩阵乘法，而Fortran需要0.18秒才能完成矩阵乘法。

c matrix fortran matrix-multiplication

回答 1 投票 0

为什么C和Fortran之间的矩阵乘法有很大的差异？我正在对矩阵操作的Fortran和C编程语言进行比较。这次，我编写了两个都在做同一件事的文件（matmul.c and matmul.f90），即乘以matri ...

c matrix fortran matrix-multiplication

回答 1 投票 0

HLSL/GLDL FLOAT2X2 MUL（）操作

此HLSL/GLSL代码的结果是什么（它们有所不同）？ float2x2 m2x2 = {a，b，c，d}; float2 xy = {x，y}; float2结果= mul（m2x2，xy）; 是结果= float2（a*x + b*y，c*x ...

matrix-multiplication hlsl

回答 1 投票 0

eigen矩阵乘法效率远低于环路遍历

＃包括 #include #include #include int main（） { const int num_points = 300000000; eigen ::matrix #include <Eigen/Dense> #include <iostream> #include <random> #include <chrono> int main() { const int num_points = 300000000; Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> init_points(num_points, 3); Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic> transformed_points(num_points, 3); std::mt19937 rng(42); std::uniform_real_distribution<float> dist(-100.0f, 100.0f); for (int i = 0; i < num_points; ++i) { init_points(i, 0) = dist(rng); // x init_points(i, 1) = dist(rng); // y init_points(i, 2) = dist(rng); // z } float theta = 3.14159265358 / 4; // pi/4 Eigen::Matrix3f rotation; rotation = Eigen::AngleAxisf(theta, Eigen::Vector3f::UnitZ()); Eigen::Vector3f translation(10.0f, 20.0f, 30.0f); auto start_time = std::chrono::high_resolution_clock::now(); //transformed_points = init_points * rotation; //uncomment this line to use the Matrix multiply version //transformed_points.rowwise() += translation.transpose(); //uncomment this line to use the Matrix multiply version for (int i = 0; i < num_points; ++i) //comment this for loop to use the Matrix multiply version { Eigen::Vector3f v = init_points.row(i).transpose(); v = rotation * v; v += translation; transformed_points.row(i) = v.transpose(); } auto end_time = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> duration_ms = end_time - start_time; std::cout << "total consume: " << duration_ms.count() << "ms" << std::endl; std::cout << "first 5 points:(x,y,z)" << std::endl; for (int i = 0; i < 5; ++i) { std::cout << "("<< transformed_points(i, 0) << ","<< transformed_points(i, 1) << ", " << transformed_points(i, 2) << ")" << std::endl; } return 0; } I当前在通过点云采样获得的对象上执行坐标转换。我需要先旋转一组点，然后翻译它们以获得转换的点云。这个点云数据总共包含300,000,000点，我正在使用eigen的动态阵列进行存储。上面的代码是一个测试示例，我在其中初始数字初始化了300,000,000点，然后执行旋转和翻译操作。该代码既包括循环遍历版本和矩阵乘法版本。要使用矩阵乘法，您只需要删除我发表评论的两行并评论前循环部分。 the是对循环遍历和矩阵乘法版本所消耗的时间的比较： 1。对于旋转和翻译操作，我尝试将所有300,000,000点乘以旋转矩阵，然后添加翻译向量。此操作在我的i7-13700k cpu上大约需要2044ms。 2。我写了一个循环遍历每个点并逐一旋转和翻译。总时间仅为600ms。我知道eigen使用许多CPU指令集来优化矩阵乘法，并且我正在使用启用了各种SIMD和AVX优化的英特尔ICC编译器。为什么在这里的前横梁比矩阵乘法快3倍？加上，是否有任何空间可以进一步优化上述代码？有人可以帮助分析吗？我非常感谢它。 PS：我确实有优化的打开，因为在优化的情况下关闭了所花费的时间比现在要高得多。我的意思是，当我打开优化时，for循环比单独进行矩阵乘法更快我刚刚进行了一些时间测量。由于我没有与您相同的CPU，并且您没有指定所使用的确切编译器选项，因此效果可能会或可能不会有所不同。我测试了： Intel（R）Oneapi DPC ++/C ++编译器2025.0.4 编译器标志：-dndebug -o3 -march =i7-9850H上的本机以下所有值是3个测量值的中间：逐行的弯曲率花费了1247.91ms。矩阵乘版本最初采用2583.93ms。使用Noalias（如transformed_points.noalias() = init_points * rotation;中）时，矩阵乘版本为1451.23ms。 noalias有助于摆脱临时性，请参阅https：//eigen.tuxfamily.org/dox/group__topicaliasing.html 当将行和列切换时，将固定尺寸从Eigen::Dynamic切换到3时，iThe矩阵乘法版本花费了1278.85ms。由于eigen默认为列是列 - 这意味着每个点的x，y和z现在彼此隔绝。其他优化可能是可能的，但至少现在是初始的逐行环和矩阵乘法所需（几乎）运行时间的时间。这是最终代码： const int num_points = 300000000; Eigen::Matrix<float, 3, Eigen::Dynamic> init_points(3, num_points); Eigen::Matrix<float, 3, Eigen::Dynamic> transformed_points(3, num_points); std::mt19937 rng(42); std::uniform_real_distribution<float> dist(-100.0f, 100.0f); for (int i = 0; i < num_points; ++i) { init_points(0, i) = dist(rng); // x init_points(1, i) = dist(rng); // y init_points(2, i) = dist(rng); // z } float theta = -3.14159265358 / 4; // minus to transpose/invert rotation Eigen::Matrix3f rotation; rotation = Eigen::AngleAxisf(theta, Eigen::Vector3f::UnitZ()); Eigen::Vector3f translation(10.0f, 20.0f, 30.0f); auto start_time = std::chrono::high_resolution_clock::now(); transformed_points.noalias() = rotation * init_points; transformed_points.colwise() += translation; auto end_time = std::chrono::high_resolution_clock::now(); std::chrono::duration<double, std::milli> duration_ms = end_time - start_time; std::cout << "total consume: " << duration_ms.count() << "ms" << std::endl; std::cout << "first 5 points:(x,y,z)" << std::endl; for (int i = 0; i < 5; ++i) { std::cout << "("<< transformed_points(0, i) << ","<< transformed_points(1, i) << ", " << transformed_points(2, i) << ")" << std::endl; }

c++ eigen matrix-multiplication eigen3

回答 1 投票 0

有一种方法仅计算Numpy Matmul的实际部分？ llet说，我有两个阵列A和B，均为dtype np.clex128，我想计算c = np.matmul（a，b）.real。也就是说，我不在乎虚构的部分，而只是实际部分。有赌注...

和

python numpy matrix-multiplication complex-numbers numpy-einsum

回答 1 投票 0

在大型数据集上计算外部产品时，polar流的记忆问题”

python streaming matrix-multiplication python-polars polars

回答 1 投票 0

如何证明对数组项目的修改成功？我正在验证矩阵乘法过程：这是我的代码：谓词abvalid（a：array

我正在验证矩阵乘法过程：这是我的代码： predicate abvalid(a:array<array<int>>, b:array<array<int>>) reads a, b { a.Length > 0 && b.Length > 0 && (forall i, j ::(0 <= i < a.Length && 0 <= j < a.Length) ==> (a[i].Length == a[j].Length == b.Length)) && forall i, j :: (0 <= i < b.Length && 0 <= j < b.Length) ==> (b[i].Length == b[j].Length>0) } function rowcolmulAux(a:array<array<int>>, b:array<array<int>>, row:int, col:int, k :int) :int requires abvalid(a, b) requires 0<=row < a.Length requires 0<=col < b[0].Length requires 0 <= k <= a[0].Length decreases a[0].Length - k reads a[..] reads b[..] reads a, b ensures rowcolmulAux(a, b, row, col, k) == if k == a[0].Length then 0 else a[row][k] * b[k][col] + rowcolmulAux(a, b, row, col, k+1) ensures (k == a[0].Length ==> rowcolmulAux(a, b, row, col, k) == 0) && (k < a[0].Length ==> rowcolmulAux(a, b, row, col, k) == a[row][k] * b[k][col] + rowcolmulAux(a, b, row, col, k+1)) { if k == a[0].Length then 0 else a[row][k] * b[k][col] + rowcolmulAux(a, b, row, col, k+1) } function rowcolmul(a:array<array<int>>, b:array<array<int>>, row:int, col:int) :int reads a reads b reads a[..] reads b[..] requires abvalid(a, b) requires 0<=row < a.Length requires 0<=col < b[0].Length ensures rowcolmul(a, b, row, col) == rowcolmulAux(a, b, row, col, 0) ensures rowcolmul(a, b, row, col) == rowcolmul(a, b, row, col) { rowcolmulAux(a, b, row, col, 0) } method rowmul(a:array<array<int>>, b:array<array<int>>, c1:array<int>, index:int, indexc: int) requires abvalid(a, b) requires 0 <= index < a.Length requires c1.Length == b[0].Length requires 0 <= indexc < c1.Length modifies c1 ensures forall i :: 0 <= i < c1.Length && i != indexc ==> c1[i] == old(c1[i]) ensures c1[indexc] == rowcolmul(a, b, index, indexc) { c1[indexc] := rowcolmul(a, b, index, indexc); }

matrix-multiplication verification dafny

回答 0 投票 0

矩阵的矩阵，而不是变量而不是值

<- matrix(c(0,0,0,0), nrow = 2, byrow = TRUE) b <- matrix(c("b1","b2","b3","b4"), nrow = 2, byrow = TRUE) a %*% b

r rstudio linear-algebra matrix-multiplication

回答 1 投票 0

如何将 2x3x3x3 矩阵乘以 2x3 矩阵得到 2x3 矩阵

我正在尝试计算神经网络输出的一些导数。准确地说，我需要由神经网络表示的函数的雅可比矩阵和

python pytorch matrix-multiplication autograd

回答 1 投票 0

二维和一维矩阵乘法

示例：将 numpy 导入为 np x = np.array([[1., 2., 3.], [4., 5., 6.]]) y = np.ones(3) np.dot(x, y) 结果：数组([ 6., 15.]) 当我有一个 (2x3) x (1x3) 的矩阵时，这怎么可能？它...