测试代码所用的带宽与perftest测试的带宽不一样

问题描述 投票:0回答:1

为了学习RDMA,在网上找到了一个example,和MELLANOX提供的类似,但是当我用两台机器运行时,发现了以下问题:

1.测试的代码带宽与Perftest测试的带宽有很大差距。

2.除此之外,在两台机器中的一台上使用GID 0或2会显着减少带宽。

机器A:

配置:

hca_id: mlx5_bond_0
        transport:                      InfiniBand (0)
        fw_ver:                         20.39.3004
        node_guid:                      1070:fd03:00e5:f118
        sys_image_guid:                 1070:fd03:00e5:f118
        vendor_id:                      0x02c9
        vendor_part_id:                 4123
        hw_ver:                         0x0
        board_id:                       MT_0000000224
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_bond_0     1       0       fe80:0000:0000:0000:b0fc:4eff:feb3:1112                 v1      bond0
mlx5_bond_0     1       1       fe80:0000:0000:0000:b0fc:4eff:feb3:1112                 v2      bond0
mlx5_bond_0     1       2       0000:0000:0000:0000:0000:ffff:0a77:2e3d 10.119.46.61    v1      bond0
mlx5_bond_0     1       3       0000:0000:0000:0000:0000:ffff:0a77:2e3d 10.119.46.61    v2      bond0

在 GID 1 上进行 perftest 测试

---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
RX depth:               1
post_list:              1
inline_size:            0
 Dual-port       : OFF          Device         : mlx5_bond_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x1659 PSN 0xd4858a OUT 0x10 RKey 0x203e00 VAddr 0x007f38d0d07000
 GID: 254:128:00:00:00:00:00:00:176:252:78:255:254:179:17:18
 remote address: LID 0000 QPN 0x1c86 PSN 0xc2e51a OUT 0x10 RKey 0x013f00 VAddr 0x007f123fc62000
 GID: 254:128:00:00:00:00:00:00:100:155:154:255:254:172:09:41
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 65536      1000             10829.53            10829.17       0.173267
---------------------------------------------------------------------------------------

机器B:

hca_id: mlx5_bond_0
        transport:                      InfiniBand (0)
        fw_ver:                         20.39.3004
        node_guid:                      e8eb:d303:0032:b212
        sys_image_guid:                 e8eb:d303:0032:b212
        vendor_id:                      0x02c9
        vendor_part_id:                 4123
        hw_ver:                         0x0
        board_id:                       MT_0000000224
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

DEV     PORT    INDEX   GID                                     IPv4              VER     DEV
---     ----    -----   ---                                     ------------      ---     ---
mlx5_bond_0     1       0       fe80:0000:0000:0000:649b:9aff:feac:0929                   v1      bond0
mlx5_bond_0     1       1       fe80:0000:0000:0000:649b:9aff:feac:0929                   v2      bond0
mlx5_bond_0     1       2       0000:0000:0000:0000:0000:ffff:0a77:2e3e   10.119.46.62    v1      bond0
mlx5_bond_0     1       3       0000:0000:0000:0000:0000:ffff:0a77:2e3e   10.119.46.62    v2      bond0
n_gids_found=4

在 GID 0 上进行 perftest 测试

                    RDMA_Read BW Test
RX depth:               1
post_list:              1
inline_size:            0
 Dual-port       : OFF          Device         : mlx5_bond_0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x1659 PSN 0xd4858a OUT 0x10 RKey 0x203e00 VAddr 0x007f38d0d07000
 GID: 254:128:00:00:00:00:00:00:176:252:78:255:254:179:17:18
 remote address: LID 0000 QPN 0x1c86 PSN 0xc2e51a OUT 0x10 RKey 0x013f00 VAddr 0x007f123fc62000
 GID: 254:128:00:00:00:00:00:00:100:155:154:255:254:172:09:41
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MiB/sec]    BW average[MiB/sec]   MsgRate[Mpps]
 65536      1000             10829.53            10829.17       0.173267
---------------------------------------------------------------------------------------

如果我对示例代码进行测试,当M1使用GID0并且M2使用GID0/GID1时,带宽约为0.0124GB/s。当M1使用GID1,M2使用GID1时,带宽约为6GB/s。我想知道perftest代码做了哪些优化,或者上面例子中的代码有哪些缺陷导致了测试的带宽差异很大。

networking rdma infiniband mellanox
1个回答
0
投票

原因是示例代码无法解锁硬件的全部能力,即消息太小! 相比小消息传输(准备时间不可忽视),大消息传输足够大,足以支持全速传输。

使用

--all
程序尝试参数
perftest
,看看 2 和 2^23 字节大小的消息之间的速度差异。

© www.soinside.com 2019 - 2024. All rights reserved.