RDMA 访问标志验证

问题描述 投票:0回答:1

如果我们收到 Send 或 Send with Immediate 数据包,但在 RC、UC 或 UD 中没有本地写访问权限,我们应该期望什么行为?

我认为目标端存在本地保护错误,发起端会遇到故障

rdma infiniband
1个回答
0
投票

首先,我们需要区分QP类型内存访问标志

RC、UC、UD 是

ibv_qp_init_attr
中包含的 QP 类型,可以在创建 QP 之前指定。以下是
ibv_qp_init_attr
的结构。

struct ibv_qp_init_attr {
    void                   *qp_context;     /* Associated context of the QP */
    struct ibv_cq          *send_cq;        /* CQ to be associated with the Send Queue (SQ) */
    struct ibv_cq          *recv_cq;        /* CQ to be associated with the Receive Queue (RQ) */
    struct ibv_srq         *srq;            /* SRQ handle if QP is to be associated with an SRQ, otherwise NULL */
    struct ibv_qp_cap       cap;            /* QP capabilities */
    enum ibv_qp_type        qp_type;        /* QP Transport Service Type: IBV_QPT_RC, IBV_QPT_UC, IBV_QPT_UD, IBV_QPT_RAW_PACKET or IBV_QPT_DRIVER */
    int                     sq_sig_all;     /* If set, each Work Request (WR) submitted to the SQ generates a completion entry */
};

struct ibv_qp_cap {
    uint32_t                max_send_wr;    /* Requested max number of outstanding WRs in the SQ */
    uint32_t                max_recv_wr;    /* Requested max number of outstanding WRs in the RQ */
    uint32_t                max_send_sge;   /* Requested max number of scatter/gather (s/g) elements in a WR in the SQ */
    uint32_t                max_recv_sge;   /* Requested max number of s/g elements in a WR in the RQ */
    uint32_t                max_inline_data;/* Requested max number of data (bytes) that can be posted inline to the SQ, otherwise 0 */
};

但是内存访问标志是我们通过RDMA卡注册一块内存时想要给予的权限。可以通过设置函数

access
中的
ibv_reg_mr
参数来指定。

struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access);

参数

access
描述了所需的内存保护属性;它是 0 或以下一个或多个标志的按位或:

   IBV_ACCESS_LOCAL_WRITE  Enable Local Write Access

   IBV_ACCESS_REMOTE_WRITE  Enable Remote Write Access

   IBV_ACCESS_REMOTE_READ Enable Remote Read Access

   IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access (if supported)

   IBV_ACCESS_MW_BIND Enable Memory Window Binding

   IBV_ACCESS_ZERO_BASED Use byte offset from beginning of MR to access this MR, instead of a pointer address

   IBV_ACCESS_ON_DEMAND Create an on-demand paging MR

   IBV_ACCESS_HUGETLB Huge pages are guaranteed to be used for this MR, applicable with IBV_ACCESS_ON_DEMAND in explicit mode only

   IBV_ACCESS_RELAXED_ORDERING Allow system to reorder accesses to the MR to improve performance

   If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, then IBV_ACCESS_LOCAL_WRITE must be set too.

   Local read access is always enabled for the MR.

   ... excerpt from `man ibv_reg_mr`

因此,“没有本地写访问权限”意味着我们注册一块内存并赋予它相等的访问权限

0
。我们可以通过检查工作完成情况来检查完成状态和供应商错误
ibv_wc

 struct ibv_wc {
    uint64_t                wr_id;          /* ID of the completed Work Request (WR) */
    enum ibv_wc_status      status;         /* Status of the operation */
    enum ibv_wc_opcode      opcode;         /* Operation type specified in the completed WR */
    uint32_t                vendor_err;     /* Vendor error syndrome */
    uint32_t                byte_len;       /* Number of bytes transferred */
    union {
               __be32                  imm_data;         /* Immediate data (in network byte order) */
               uint32_t                invalidated_rkey; /* Local RKey that was invalidated */
    };
    uint32_t                qp_num;         /* Local QP number of completed WR */
    uint32_t                src_qp;         /* Source QP number (remote QP number) of completed WR (valid only for UD QPs) */
    unsigned int            wc_flags;       /* Flags of the completed WR */
    uint16_t                pkey_index;     /* P_Key index (valid only for GSI QPs) */
    uint16_t                slid;           /* Source LID */
    uint8_t                 sl;             /* Service Level */
    uint8_t                 dlid_path_bits; /* DLID path bits (not applicable for multicast messages) */
};

我们可以通过打印

ibv_wr.status
ibv_mr.vendor_err
来查看发生了什么。查看这篇文章(https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/)了解更多信息。

一个简单的测试(QP类型= RC)

  • 服务器(正常)

  • 客户端:已注册内存,

    access
    等于
    0

结果是:

服务器供应商_err = 137,操作状态 = 11,这意味着

IBV_WC_REM_OP_ERR
— 远程操作错误:响应方无法成功完成操作。可能的原因包括阻止响应者完成请求的响应者 QP 相关错误或接收队列上格式错误的 WQE。与 RC QP 相关

客户端供应商_err = 51,操作状态 = 4,这意味着

IBV_WC_LOC_PROT_ERR
- 本地保护错误:分散/聚集列表中本地发布的工作请求缓冲区未引用对请求的操作有效的内存区域。

© www.soinside.com 2019 - 2024. All rights reserved.