如果我们收到 Send 或 Send with Immediate 数据包,但在 RC、UC 或 UD 中没有本地写访问权限,我们应该期望什么行为?
我认为目标端存在本地保护错误,发起端会遇到故障
首先,我们需要区分QP类型和内存访问标志。
RC、UC、UD 是
ibv_qp_init_attr
中包含的 QP 类型,可以在创建 QP 之前指定。以下是ibv_qp_init_attr
的结构。
struct ibv_qp_init_attr {
void *qp_context; /* Associated context of the QP */
struct ibv_cq *send_cq; /* CQ to be associated with the Send Queue (SQ) */
struct ibv_cq *recv_cq; /* CQ to be associated with the Receive Queue (RQ) */
struct ibv_srq *srq; /* SRQ handle if QP is to be associated with an SRQ, otherwise NULL */
struct ibv_qp_cap cap; /* QP capabilities */
enum ibv_qp_type qp_type; /* QP Transport Service Type: IBV_QPT_RC, IBV_QPT_UC, IBV_QPT_UD, IBV_QPT_RAW_PACKET or IBV_QPT_DRIVER */
int sq_sig_all; /* If set, each Work Request (WR) submitted to the SQ generates a completion entry */
};
struct ibv_qp_cap {
uint32_t max_send_wr; /* Requested max number of outstanding WRs in the SQ */
uint32_t max_recv_wr; /* Requested max number of outstanding WRs in the RQ */
uint32_t max_send_sge; /* Requested max number of scatter/gather (s/g) elements in a WR in the SQ */
uint32_t max_recv_sge; /* Requested max number of s/g elements in a WR in the RQ */
uint32_t max_inline_data;/* Requested max number of data (bytes) that can be posted inline to the SQ, otherwise 0 */
};
但是内存访问标志是我们通过RDMA卡注册一块内存时想要给予的权限。可以通过设置函数
access
中的ibv_reg_mr
参数来指定。
struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, int access);
参数
access
描述了所需的内存保护属性;它是 0 或以下一个或多个标志的按位或:
IBV_ACCESS_LOCAL_WRITE Enable Local Write Access
IBV_ACCESS_REMOTE_WRITE Enable Remote Write Access
IBV_ACCESS_REMOTE_READ Enable Remote Read Access
IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access (if supported)
IBV_ACCESS_MW_BIND Enable Memory Window Binding
IBV_ACCESS_ZERO_BASED Use byte offset from beginning of MR to access this MR, instead of a pointer address
IBV_ACCESS_ON_DEMAND Create an on-demand paging MR
IBV_ACCESS_HUGETLB Huge pages are guaranteed to be used for this MR, applicable with IBV_ACCESS_ON_DEMAND in explicit mode only
IBV_ACCESS_RELAXED_ORDERING Allow system to reorder accesses to the MR to improve performance
If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, then IBV_ACCESS_LOCAL_WRITE must be set too.
Local read access is always enabled for the MR.
... excerpt from `man ibv_reg_mr`
因此,“没有本地写访问权限”意味着我们注册一块内存并赋予它相等的访问权限
0
。我们可以通过检查工作完成情况来检查完成状态和供应商错误ibv_wc
。
struct ibv_wc {
uint64_t wr_id; /* ID of the completed Work Request (WR) */
enum ibv_wc_status status; /* Status of the operation */
enum ibv_wc_opcode opcode; /* Operation type specified in the completed WR */
uint32_t vendor_err; /* Vendor error syndrome */
uint32_t byte_len; /* Number of bytes transferred */
union {
__be32 imm_data; /* Immediate data (in network byte order) */
uint32_t invalidated_rkey; /* Local RKey that was invalidated */
};
uint32_t qp_num; /* Local QP number of completed WR */
uint32_t src_qp; /* Source QP number (remote QP number) of completed WR (valid only for UD QPs) */
unsigned int wc_flags; /* Flags of the completed WR */
uint16_t pkey_index; /* P_Key index (valid only for GSI QPs) */
uint16_t slid; /* Source LID */
uint8_t sl; /* Service Level */
uint8_t dlid_path_bits; /* DLID path bits (not applicable for multicast messages) */
};
我们可以通过打印
ibv_wr.status
和 ibv_mr.vendor_err
来查看发生了什么。查看这篇文章(https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/)了解更多信息。
一个简单的测试(QP类型= RC)
服务器(正常)
客户端:已注册内存,
access
等于0
结果是:
服务器供应商_err = 137,操作状态 = 11,这意味着
IBV_WC_REM_OP_ERR
— 远程操作错误:响应方无法成功完成操作。可能的原因包括阻止响应者完成请求的响应者 QP 相关错误或接收队列上格式错误的 WQE。与 RC QP 相关
客户端供应商_err = 51,操作状态 = 4,这意味着
IBV_WC_LOC_PROT_ERR
- 本地保护错误:分散/聚集列表中本地发布的工作请求缓冲区未引用对请求的操作有效的内存区域。