2018-08-14 22:35:24

by Howell, Seth

[permalink] [raw]
Subject: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

By making this enum accessible to the userspace API, we can avoid
conflicts between the kernel-space ib_wr_opcode and the user-space
ibv_wr_opcode enums.

When using the rxe software driver interface between kernel and user
space, the enum value IBV_WR_SEND_WITH_INV was being improperly
translated to IB_WR_READ_WITH_INV causing the userspace application to
fail.

Signed-off-by: Seth Howell <[email protected]>
---
include/rdma/ib_verbs.h | 31 -------------------------------
include/uapi/rdma/ib_user_verbs.h | 31 +++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6c003995347a..02fe3960378d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1295,37 +1295,6 @@ struct ib_qp_attr {
u32 rate_limit;
};

-enum ib_wr_opcode {
- IB_WR_RDMA_WRITE,
- IB_WR_RDMA_WRITE_WITH_IMM,
- IB_WR_SEND,
- IB_WR_SEND_WITH_IMM,
- IB_WR_RDMA_READ,
- IB_WR_ATOMIC_CMP_AND_SWP,
- IB_WR_ATOMIC_FETCH_AND_ADD,
- IB_WR_LSO,
- IB_WR_SEND_WITH_INV,
- IB_WR_RDMA_READ_WITH_INV,
- IB_WR_LOCAL_INV,
- IB_WR_REG_MR,
- IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
- IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
- IB_WR_REG_SIG_MR,
- /* reserve values for low level drivers' internal use.
- * These values will not be used at all in the ib core layer.
- */
- IB_WR_RESERVED1 = 0xf0,
- IB_WR_RESERVED2,
- IB_WR_RESERVED3,
- IB_WR_RESERVED4,
- IB_WR_RESERVED5,
- IB_WR_RESERVED6,
- IB_WR_RESERVED7,
- IB_WR_RESERVED8,
- IB_WR_RESERVED9,
- IB_WR_RESERVED10,
-};
-
enum ib_send_flags {
IB_SEND_FENCE = 1,
IB_SEND_SIGNALED = (1<<1),
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 4f9991de8e3a..7c1132f953d8 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -46,6 +46,37 @@
#define IB_USER_VERBS_ABI_VERSION 6
#define IB_USER_VERBS_CMD_THRESHOLD 50

+enum ib_wr_opcode {
+ IB_WR_RDMA_WRITE,
+ IB_WR_RDMA_WRITE_WITH_IMM,
+ IB_WR_SEND,
+ IB_WR_SEND_WITH_IMM,
+ IB_WR_RDMA_READ,
+ IB_WR_ATOMIC_CMP_AND_SWP,
+ IB_WR_ATOMIC_FETCH_AND_ADD,
+ IB_WR_LSO,
+ IB_WR_SEND_WITH_INV,
+ IB_WR_RDMA_READ_WITH_INV,
+ IB_WR_LOCAL_INV,
+ IB_WR_REG_MR,
+ IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
+ IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+ IB_WR_REG_SIG_MR,
+ /* reserve values for low level drivers' internal use.
+ * These values will not be used at all in the ib core layer.
+ */
+ IB_WR_RESERVED1 = 0xf0,
+ IB_WR_RESERVED2,
+ IB_WR_RESERVED3,
+ IB_WR_RESERVED4,
+ IB_WR_RESERVED5,
+ IB_WR_RESERVED6,
+ IB_WR_RESERVED7,
+ IB_WR_RESERVED8,
+ IB_WR_RESERVED9,
+ IB_WR_RESERVED10,
+};
+
enum {
IB_USER_VERBS_CMD_GET_CONTEXT,
IB_USER_VERBS_CMD_QUERY_DEVICE,
--
2.14.4



2018-08-15 23:35:26

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Tue, Aug 14, 2018 at 03:33:02PM -0700, Seth Howell wrote:
> By making this enum accessible to the userspace API, we can avoid
> conflicts between the kernel-space ib_wr_opcode and the user-space
> ibv_wr_opcode enums.
>
> When using the rxe software driver interface between kernel and user
> space, the enum value IBV_WR_SEND_WITH_INV was being improperly
> translated to IB_WR_READ_WITH_INV causing the userspace application to
> fail.
>
> Signed-off-by: Seth Howell <[email protected]>
> include/rdma/ib_verbs.h | 31 -------------------------------
> include/uapi/rdma/ib_user_verbs.h | 31 +++++++++++++++++++++++++++++++
> 2 files changed, 31 insertions(+), 31 deletions(-)

I've looked at this for a bit now, and I think we should just do the
below and not change userspace:

From e3c1b1b81373601fa6cbad2ba5fadd2c8cfdfaed Mon Sep 17 00:00:00 2001
From: Seth Howell <[email protected]>
Date: Tue, 14 Aug 2018 15:33:02 -0700
Subject: [PATCH] IB/rxe: Revise the ib_wr_opcode enum

This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this
enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of
libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC
values over the years, but has remained stable on IB_WR_LOCAL_INV and
below.

Based on this we can conclude that there is no real user space user of the
values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
and implements the latter operations. rxe has clearly never worked with
these attributes from userspace. Other drivers that support these opcodes
implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
choose to renumber the IB_WR enum in the kernel to match the uABI that
userspace has bee using since before Soft RoCE was merged. This is an
overall simpler configuration for the whole software stack, and obviously
can't break anything existing.

Reported-by: Seth Howell <[email protected]>
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Cc: <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
---
include/rdma/ib_verbs.h | 34 ++++++++++++++++++-------------
include/uapi/rdma/ib_user_verbs.h | 20 +++++++++++++++++-
2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5d404c20b49f27..5e51ea9832a0c8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1278,21 +1278,27 @@ struct ib_qp_attr {
};

enum ib_wr_opcode {
- IB_WR_RDMA_WRITE,
- IB_WR_RDMA_WRITE_WITH_IMM,
- IB_WR_SEND,
- IB_WR_SEND_WITH_IMM,
- IB_WR_RDMA_READ,
- IB_WR_ATOMIC_CMP_AND_SWP,
- IB_WR_ATOMIC_FETCH_AND_ADD,
- IB_WR_LSO,
- IB_WR_SEND_WITH_INV,
- IB_WR_RDMA_READ_WITH_INV,
- IB_WR_LOCAL_INV,
- IB_WR_REG_MR,
- IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
- IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+ /* These are shared with userspace */
+ IB_WR_RDMA_WRITE = IB_UVERBS_WR_RDMA_WRITE,
+ IB_WR_RDMA_WRITE_WITH_IMM = IB_UVERBS_WR_RDMA_WRITE_WITH_IMM,
+ IB_WR_SEND = IB_UVERBS_WR_SEND,
+ IB_WR_SEND_WITH_IMM = IB_UVERBS_WR_SEND_WITH_IMM,
+ IB_WR_RDMA_READ = IB_UVERBS_WR_RDMA_READ,
+ IB_WR_ATOMIC_CMP_AND_SWP = IB_UVERBS_WR_ATOMIC_CMP_AND_SWP,
+ IB_WR_ATOMIC_FETCH_AND_ADD = IB_UVERBS_WR_ATOMIC_FETCH_AND_ADD,
+ IB_WR_LSO = IB_UVERBS_WR_TSO,
+ IB_WR_SEND_WITH_INV = IB_UVERBS_WR_SEND_WITH_INV,
+ IB_WR_RDMA_READ_WITH_INV = IB_UVERBS_WR_RDMA_READ_WITH_INV,
+ IB_WR_LOCAL_INV = IB_UVERBS_WR_LOCAL_INV,
+ IB_WR_MASKED_ATOMIC_CMP_AND_SWP =
+ IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP,
+ IB_WR_MASKED_ATOMIC_FETCH_AND_ADD =
+ IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+
+ /* These are kernel only and can not be issued by userspace */
+ IB_WR_REG_MR = 0x20,
IB_WR_REG_SIG_MR,
+
/* reserve values for low level drivers' internal use.
* These values will not be used at all in the ib core layer.
*/
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25a16760de2ad1..1254b51a551a1c 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -763,10 +763,28 @@ struct ib_uverbs_sge {
__u32 lkey;
};

+enum ib_uverbs_wr_opcode {
+ IB_UVERBS_WR_RDMA_WRITE = 0,
+ IB_UVERBS_WR_RDMA_WRITE_WITH_IMM = 1,
+ IB_UVERBS_WR_SEND = 2,
+ IB_UVERBS_WR_SEND_WITH_IMM = 3,
+ IB_UVERBS_WR_RDMA_READ = 4,
+ IB_UVERBS_WR_ATOMIC_CMP_AND_SWP = 5,
+ IB_UVERBS_WR_ATOMIC_FETCH_AND_ADD = 6,
+ IB_UVERBS_WR_LOCAL_INV = 7,
+ IB_UVERBS_WR_BIND_MW = 8,
+ IB_UVERBS_WR_SEND_WITH_INV = 9,
+ IB_UVERBS_WR_TSO = 10,
+ IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
+ IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
+ IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+ /* Review enum ib_wr_opcode before modifying this */
+};
+
struct ib_uverbs_send_wr {
__aligned_u64 wr_id;
__u32 num_sge;
- __u32 opcode;
+ __u32 opcode; /* see enum ib_uverbs_wr_opcode */
__u32 send_flags;
union {
__be32 imm_data;
--
2.18.0


2018-08-16 00:10:50

by Howell, Seth

[permalink] [raw]
Subject: RE: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

Thank you for taking the time to review this issue. The patch you submitted will definitely fix the problem I was facing.

-----Original Message-----
From: Jason Gunthorpe [mailto:[email protected]]
Sent: Wednesday, August 15, 2018 3:50 PM
To: Howell, Seth <[email protected]>
Cc: Walker, Benjamin <[email protected]>; Doug Ledford <[email protected]>; [email protected]; [email protected]; Moni Shoua <[email protected]>
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Tue, Aug 14, 2018 at 03:33:02PM -0700, Seth Howell wrote:
> By making this enum accessible to the userspace API, we can avoid
> conflicts between the kernel-space ib_wr_opcode and the user-space
> ibv_wr_opcode enums.
>
> When using the rxe software driver interface between kernel and user
> space, the enum value IBV_WR_SEND_WITH_INV was being improperly
> translated to IB_WR_READ_WITH_INV causing the userspace application to
> fail.
>
> Signed-off-by: Seth Howell <[email protected]>
> include/rdma/ib_verbs.h | 31 -------------------------------
> include/uapi/rdma/ib_user_verbs.h | 31
> +++++++++++++++++++++++++++++++
> 2 files changed, 31 insertions(+), 31 deletions(-)

I've looked at this for a bit now, and I think we should just do the below and not change userspace:

From e3c1b1b81373601fa6cbad2ba5fadd2c8cfdfaed Mon Sep 17 00:00:00 2001
From: Seth Howell <[email protected]>
Date: Tue, 14 Aug 2018 15:33:02 -0700
Subject: [PATCH] IB/rxe: Revise the ib_wr_opcode enum

This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV, IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC values over the years, but has remained stable on IB_WR_LOCAL_INV and below.

Based on this we can conclude that there is no real user space user of the values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via rdma-core. This is confirmed by inspection, only rxe uses the kernel enum and implements the latter operations. rxe has clearly never worked with these attributes from userspace. Other drivers that support these opcodes implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we choose to renumber the IB_WR enum in the kernel to match the uABI that userspace has bee using since before Soft RoCE was merged. This is an overall simpler configuration for the whole software stack, and obviously can't break anything existing.

Reported-by: Seth Howell <[email protected]>
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Cc: <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
---
include/rdma/ib_verbs.h | 34 ++++++++++++++++++-------------
include/uapi/rdma/ib_user_verbs.h | 20 +++++++++++++++++-
2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5d404c20b49f27..5e51ea9832a0c8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1278,21 +1278,27 @@ struct ib_qp_attr { };

enum ib_wr_opcode {
- IB_WR_RDMA_WRITE,
- IB_WR_RDMA_WRITE_WITH_IMM,
- IB_WR_SEND,
- IB_WR_SEND_WITH_IMM,
- IB_WR_RDMA_READ,
- IB_WR_ATOMIC_CMP_AND_SWP,
- IB_WR_ATOMIC_FETCH_AND_ADD,
- IB_WR_LSO,
- IB_WR_SEND_WITH_INV,
- IB_WR_RDMA_READ_WITH_INV,
- IB_WR_LOCAL_INV,
- IB_WR_REG_MR,
- IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
- IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+ /* These are shared with userspace */
+ IB_WR_RDMA_WRITE = IB_UVERBS_WR_RDMA_WRITE,
+ IB_WR_RDMA_WRITE_WITH_IMM = IB_UVERBS_WR_RDMA_WRITE_WITH_IMM,
+ IB_WR_SEND = IB_UVERBS_WR_SEND,
+ IB_WR_SEND_WITH_IMM = IB_UVERBS_WR_SEND_WITH_IMM,
+ IB_WR_RDMA_READ = IB_UVERBS_WR_RDMA_READ,
+ IB_WR_ATOMIC_CMP_AND_SWP = IB_UVERBS_WR_ATOMIC_CMP_AND_SWP,
+ IB_WR_ATOMIC_FETCH_AND_ADD = IB_UVERBS_WR_ATOMIC_FETCH_AND_ADD,
+ IB_WR_LSO = IB_UVERBS_WR_TSO,
+ IB_WR_SEND_WITH_INV = IB_UVERBS_WR_SEND_WITH_INV,
+ IB_WR_RDMA_READ_WITH_INV = IB_UVERBS_WR_RDMA_READ_WITH_INV,
+ IB_WR_LOCAL_INV = IB_UVERBS_WR_LOCAL_INV,
+ IB_WR_MASKED_ATOMIC_CMP_AND_SWP =
+ IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP,
+ IB_WR_MASKED_ATOMIC_FETCH_AND_ADD =
+ IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD,
+
+ /* These are kernel only and can not be issued by userspace */
+ IB_WR_REG_MR = 0x20,
IB_WR_REG_SIG_MR,
+
/* reserve values for low level drivers' internal use.
* These values will not be used at all in the ib core layer.
*/
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25a16760de2ad1..1254b51a551a1c 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -763,10 +763,28 @@ struct ib_uverbs_sge {
__u32 lkey;
};

+enum ib_uverbs_wr_opcode {
+ IB_UVERBS_WR_RDMA_WRITE = 0,
+ IB_UVERBS_WR_RDMA_WRITE_WITH_IMM = 1,
+ IB_UVERBS_WR_SEND = 2,
+ IB_UVERBS_WR_SEND_WITH_IMM = 3,
+ IB_UVERBS_WR_RDMA_READ = 4,
+ IB_UVERBS_WR_ATOMIC_CMP_AND_SWP = 5,
+ IB_UVERBS_WR_ATOMIC_FETCH_AND_ADD = 6,
+ IB_UVERBS_WR_LOCAL_INV = 7,
+ IB_UVERBS_WR_BIND_MW = 8,
+ IB_UVERBS_WR_SEND_WITH_INV = 9,
+ IB_UVERBS_WR_TSO = 10,
+ IB_UVERBS_WR_RDMA_READ_WITH_INV = 11,
+ IB_UVERBS_WR_MASKED_ATOMIC_CMP_AND_SWP = 12,
+ IB_UVERBS_WR_MASKED_ATOMIC_FETCH_AND_ADD = 13,
+ /* Review enum ib_wr_opcode before modifying this */ };
+
struct ib_uverbs_send_wr {
__aligned_u64 wr_id;
__u32 num_sge;
- __u32 opcode;
+ __u32 opcode; /* see enum ib_uverbs_wr_opcode */
__u32 send_flags;
union {
__be32 imm_data;
--
2.18.0


2018-08-16 18:49:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Wed, Aug 15, 2018 at 11:27:36PM +0000, Howell, Seth wrote:

> Thank you for taking the time to review this issue. The patch you
> submitted will definitely fix the problem I was facing.

Can you test it to be certain?

Thanks,
Jason

2018-08-20 21:34:51

by Howell, Seth

[permalink] [raw]
Subject: RE: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

Hi Jason,

I apologize for the few days of radio silence on this one. I was able to apply your patch on my local configuration and can confirm that it fixes the issue of send with invalidate being improperly mapped between a user-space process and the kernel rxe driver.

Thanks again for your time and help.

Seth Howell

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jason Gunthorpe
Sent: Thursday, August 16, 2018 11:01 AM
To: Howell, Seth <[email protected]>
Cc: Walker, Benjamin <[email protected]>; Doug Ledford <[email protected]>; [email protected]; [email protected]; Moni Shoua <[email protected]>
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Wed, Aug 15, 2018 at 11:27:36PM +0000, Howell, Seth wrote:

> Thank you for taking the time to review this issue. The patch you
> submitted will definitely fix the problem I was facing.

Can you test it to be certain?

Thanks,
Jason

2018-09-17 20:39:42

by Ben Walker

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Mon, 2018-08-20 at 14:32 -0700, Howell, Seth wrote:
> Hi Jason,
>
> I apologize for the few days of radio silence on this one. I was able to apply
> your patch on my local configuration and can confirm that it fixes the issue
> of send with invalidate being improperly mapped between a user-space process
> and the kernel rxe driver.
>
> Thanks again for your time and help.
>
> Seth Howell
>
> -----Original Message-----
> From: [email protected] [mailto:
> [email protected]] On Behalf Of Jason Gunthorpe
> Sent: Thursday, August 16, 2018 11:01 AM
> To: Howell, Seth <[email protected]>
> Cc: Walker, Benjamin <[email protected]>; Doug Ledford <
> [email protected]>; [email protected]; [email protected]
> ; Moni Shoua <[email protected]>
> Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi
>
> On Wed, Aug 15, 2018 at 11:27:36PM +0000, Howell, Seth wrote:
>
> > Thank you for taking the time to review this issue. The patch you
> > submitted will definitely fix the problem I was facing.
>
> Can you test it to be certain?
>
> Thanks,
> Jason

We've recently run into this same issue on i40iw, which appears to make the same
mistake of using the kernel version of the enum instead of the userspace
version. This proposed patch will address it there too. What's the current
status here? Can it be merged? I just checked and do not see it merged to Linux
master.

Running a user-space NVMe-oF target with RDMA and a recent Linux kernel
initiator is not currently possible on rxe or i40iw because it requires send
with invalidate support.

Thanks,
Ben

2018-09-17 21:08:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Mon, Sep 17, 2018 at 08:38:16PM +0000, Walker, Benjamin wrote:
> On Mon, 2018-08-20 at 14:32 -0700, Howell, Seth wrote:
> > Hi Jason,
> >
> > I apologize for the few days of radio silence on this one. I was able to apply
> > your patch on my local configuration and can confirm that it fixes the issue
> > of send with invalidate being improperly mapped between a user-space process
> > and the kernel rxe driver.
> >
> > Thanks again for your time and help.
> >
> > Seth Howell
> >
> > From: [email protected] [mailto:
> > [email protected]] On Behalf Of Jason Gunthorpe
> > Sent: Thursday, August 16, 2018 11:01 AM
> > To: Howell, Seth <[email protected]>
> > Cc: Walker, Benjamin <[email protected]>; Doug Ledford <
> > [email protected]>; [email protected]; [email protected]
> > ; Moni Shoua <[email protected]>
> > Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi
> >
> > On Wed, Aug 15, 2018 at 11:27:36PM +0000, Howell, Seth wrote:
> >
> > > Thank you for taking the time to review this issue. The patch you
> > > submitted will definitely fix the problem I was facing.
> >
> > Can you test it to be certain?
> >
> > Thanks,
> > Jason
>
> We've recently run into this same issue on i40iw, which appears to make the same
> mistake of using the kernel version of the enum instead of the userspace
> version.

Confused by this?? i40iw_upost_send does not handle the kernel numbers
at all, as far as I can see? How does it develop a kernel dependency??

> What's the current status here? Can it be merged? I just checked and
> do not see it merged to Linux master.

Oh! This apparently got lost, thanks for bringing it up again.

> Running a user-space NVMe-oF target with RDMA and a recent Linux kernel
> initiator is not currently possible on rxe or i40iw because it requires send
> with invalidate support.

Okay, but i40iw doesn't seem to support send with invalidate at all in
userspace?

i40iw_upost_send() swithces on opcode, doesn't handle SEND_INV and
then blows up in the default clause - how does this patch make any
difference???

Jason

2018-09-17 22:30:28

by Ben Walker

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Mon, 2018-09-17 at 15:08 -0600, Jason Gunthorpe wrote:
> On Mon, Sep 17, 2018 at 08:38:16PM +0000, Walker, Benjamin wrote:
> > We've recently run into this same issue on i40iw, which appears to make the
> > same
> > mistake of using the kernel version of the enum instead of the userspace
> > version.
>
> Confused by this?? i40iw_upost_send does not handle the kernel numbers
> at all, as far as I can see? How does it develop a kernel dependency??
>
> > What's the current status here? Can it be merged? I just checked and
> > do not see it merged to Linux master.
>
> Oh! This apparently got lost, thanks for bringing it up again.
>
> > Running a user-space NVMe-oF target with RDMA and a recent Linux kernel
> > initiator is not currently possible on rxe or i40iw because it requires send
> > with invalidate support.
>
> Okay, but i40iw doesn't seem to support send with invalidate at all in
> userspace?
>
> i40iw_upost_send() swithces on opcode, doesn't handle SEND_INV and
> then blows up in the default clause - how does this patch make any
> difference???

It appears I've read the error message incorrectly and was looking at the kernel
version (i40iw_post_send) as opposed to the user version (i40iw_upost_send).
Indeed, the NIC does not support SEND_WTIH_INVAL at all in that function. The
NIC does support SEND_WITH_INVAL in the kernel i40iw_post_send.

What is the correct way for a user space application to check whether a NIC
supports SEND_WITH_INVAL? We are currently examining the device_cap_flags in the
structure returned by ibv_query_device. Specifically, we're looking at
IBV_DEVICE_MEM_MGT_EXTENSIONS. However, for i40iw, that flag is set. I'm
concerned that the feature support flags are common between user space and the
kernel, but the actual support differs in this case.

Thanks,
Ben

>
> Jason

2018-09-17 22:49:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

On Mon, Sep 17, 2018 at 10:29:18PM +0000, Walker, Benjamin wrote:
> On Mon, 2018-09-17 at 15:08 -0600, Jason Gunthorpe wrote:
> > On Mon, Sep 17, 2018 at 08:38:16PM +0000, Walker, Benjamin wrote:
> > > We've recently run into this same issue on i40iw, which appears to make the
> > > same
> > > mistake of using the kernel version of the enum instead of the userspace
> > > version.
> >
> > Confused by this?? i40iw_upost_send does not handle the kernel numbers
> > at all, as far as I can see? How does it develop a kernel dependency??
> >
> > > What's the current status here? Can it be merged? I just checked and
> > > do not see it merged to Linux master.
> >
> > Oh! This apparently got lost, thanks for bringing it up again.
> >
> > > Running a user-space NVMe-oF target with RDMA and a recent Linux kernel
> > > initiator is not currently possible on rxe or i40iw because it requires send
> > > with invalidate support.
> >
> > Okay, but i40iw doesn't seem to support send with invalidate at all in
> > userspace?
> >
> > i40iw_upost_send() swithces on opcode, doesn't handle SEND_INV and
> > then blows up in the default clause - how does this patch make any
> > difference???
>
> It appears I've read the error message incorrectly and was looking at the kernel
> version (i40iw_post_send) as opposed to the user version (i40iw_upost_send).
> Indeed, the NIC does not support SEND_WTIH_INVAL at all in that function. The
> NIC does support SEND_WITH_INVAL in the kernel i40iw_post_send.

OK

> What is the correct way for a user space application to check whether a NIC
> supports SEND_WITH_INVAL? We are currently examining the device_cap_flags in the
> structure returned by ibv_query_device. Specifically, we're looking at
> IBV_DEVICE_MEM_MGT_EXTENSIONS. However, for i40iw, that flag is set. I'm
> concerned that the feature support flags are common between user space and the
> kernel, but the actual support differs in this case.

That is the correct thing to do. Sadly the i40iw driver is broken, it
should be fixed to either mask those flags or implement the
functionality.

Jason

2018-09-17 23:09:55

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

> >From e3c1b1b81373601fa6cbad2ba5fadd2c8cfdfaed Mon Sep 17 00:00:00 2001
> From: Seth Howell <[email protected]>
> Date: Tue, 14 Aug 2018 15:33:02 -0700
> Subject: [PATCH] IB/rxe: Revise the ib_wr_opcode enum
>
> This enum has become part of the uABI, as both RXE and the
> ib_uverbs_post_send() command expect userspace to supply values from
> this enum. So it should be properly placed in include/uapi/rdma.
>
> In userspace this enum is called 'enum ibv_wr_opcode' as part of
> libibverbs.h. That enum defines different values for
> IB_WR_LOCAL_INV, IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were
> introduced (incorrectly, it turns out) into libiberbs in 2015.
>
> The kernel has changed its mind on the numbering for several of the
> IB_WC values over the years, but has remained stable on
> IB_WR_LOCAL_INV and below.
>
> Based on this we can conclude that there is no real user space user
> of the values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never
> worked via rdma-core. This is confirmed by inspection, only rxe uses
> the kernel enum and implements the latter operations. rxe has
> clearly never worked with these attributes from userspace. Other
> drivers that support these opcodes implement the functionality
> without calling out to the kernel.
>
> To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
> choose to renumber the IB_WR enum in the kernel to match the uABI
> that userspace has bee using since before Soft RoCE was merged. This
> is an overall simpler configuration for the whole software stack,
> and obviously can't break anything existing.
>
> Reported-by: Seth Howell <[email protected]>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Cc: <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> include/rdma/ib_verbs.h | 34 ++++++++++++++++++-------------
> include/uapi/rdma/ib_user_verbs.h | 20 +++++++++++++++++-
> 2 files changed, 39 insertions(+), 15 deletions(-)

Applied to for-next

Thanks,
Jason

2018-09-17 23:16:05

by Howell, Seth

[permalink] [raw]
Subject: RE: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

Thank you Jason!

-----Original Message-----
From: Jason Gunthorpe [mailto:[email protected]]
Sent: Monday, September 17, 2018 4:08 PM
To: Howell, Seth <[email protected]>
Cc: Walker, Benjamin <[email protected]>; Doug Ledford <[email protected]>; [email protected]; [email protected]; Moni Shoua <[email protected]>
Subject: Re: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

> >From e3c1b1b81373601fa6cbad2ba5fadd2c8cfdfaed Mon Sep 17 00:00:00
> >2001
> From: Seth Howell <[email protected]>
> Date: Tue, 14 Aug 2018 15:33:02 -0700
> Subject: [PATCH] IB/rxe: Revise the ib_wr_opcode enum
>
> This enum has become part of the uABI, as both RXE and the
> ib_uverbs_post_send() command expect userspace to supply values from
> this enum. So it should be properly placed in include/uapi/rdma.
>
> In userspace this enum is called 'enum ibv_wr_opcode' as part of
> libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
> IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced
> (incorrectly, it turns out) into libiberbs in 2015.
>
> The kernel has changed its mind on the numbering for several of the
> IB_WC values over the years, but has remained stable on
> IB_WR_LOCAL_INV and below.
>
> Based on this we can conclude that there is no real user space user of
> the values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never
> worked via rdma-core. This is confirmed by inspection, only rxe uses
> the kernel enum and implements the latter operations. rxe has clearly
> never worked with these attributes from userspace. Other drivers that
> support these opcodes implement the functionality without calling out
> to the kernel.
>
> To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
> choose to renumber the IB_WR enum in the kernel to match the uABI that
> userspace has bee using since before Soft RoCE was merged. This is an
> overall simpler configuration for the whole software stack, and
> obviously can't break anything existing.
>
> Reported-by: Seth Howell <[email protected]>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Cc: <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
> include/rdma/ib_verbs.h | 34 ++++++++++++++++++-------------
> include/uapi/rdma/ib_user_verbs.h | 20 +++++++++++++++++-
> 2 files changed, 39 insertions(+), 15 deletions(-)

Applied to for-next

Thanks,
Jason

2018-09-19 23:04:43

by Nikolova, Tatyana E

[permalink] [raw]
Subject: RE: [PATCH] rdma: move the ib_wr_opcode enum to include/uapi

Hi Jason,

> > What is the correct way for a user space application to check whether
> > a NIC supports SEND_WITH_INVAL? We are currently examining the
> > device_cap_flags in the structure returned by ibv_query_device.
> > Specifically, we're looking at IBV_DEVICE_MEM_MGT_EXTENSIONS.
> However,
> > for i40iw, that flag is set. I'm concerned that the feature support
> > flags are common between user space and the kernel, but the actual
> support differs in this case.
>
> That is the correct thing to do. Sadly the i40iw driver is broken, it should be
> fixed to either mask those flags or implement the functionality.
>

A fix to add SEND_WITH_INVAL support to the user space i40iw is coming.
It is currently in regression testing.

Thank you,
Tatyana

> Jason