2019-07-08 04:49:27

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: build failure after merge of the rdma tree

Hi all,

After merging the rdma tree, today's linux-next build (x86_64
allmodconfig) failed like this:

In file included from <command-line>:32:
./usr/include/rdma/rvt-abi.h:13:10: fatal error: rdma/ib_verbs.h: No such file or directory
#include <rdma/ib_verbs.h>
^~~~~~~~~~~~~~~~~

Caused by commits

dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")

interacting with commit

0c422a3d4e1b ("kbuild: compile-test exported headers to ensure they are self-contained")

from the kbuild tree.

You can't reference the include/linux headers from uapi headers ...

I have used the rmda tree from 20190628 again today (given the previous
errors).
--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2019-07-08 22:42:19

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rdma tree

On Mon, Jul 08, 2019 at 12:57:25PM +1000, Stephen Rothwell wrote:
> Hi all,
>
> After merging the rdma tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> In file included from <command-line>:32:
> ./usr/include/rdma/rvt-abi.h:13:10: fatal error: rdma/ib_verbs.h: No such file or directory
> #include <rdma/ib_verbs.h>
> ^~~~~~~~~~~~~~~~~
>
> Caused by commits
>
> dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")
>
> interacting with commit
>
> 0c422a3d4e1b ("kbuild: compile-test exported headers to ensure they are self-contained")
>
> from the kbuild tree.
>
> You can't reference the include/linux headers from uapi headers ...
>
> I have used the rmda tree from 20190628 again today (given the previous
> errors).

This is a bug that will break our userspace package too, we must fix
it, very happy to see the functionality in "kbuild: compile-test
exported headers to ensure they are self-contained"

Dennis, you must put stuff in rdma-core and run the rdma-core CI if
you are messing with the uapi headers.

I'm adding this fixup so we can progress with the merge window. Please
check it right away.

From f10ff380fd7dfba4a36d40f8dd00fe17da8a1a10 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <[email protected]>
Date: Mon, 8 Jul 2019 12:17:48 -0300
Subject: [PATCH] RDMA/rvt: Do not use a kernel header in the ABI

rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
a new struct with the same layout and use it instead.

Fixes: dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
---
drivers/infiniband/sw/rdmavt/qp.c | 32 ++++++++++++++++++++++++++-----
include/uapi/rdma/rvt-abi.h | 9 +++++++--
2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 11b4d3c1efd486..0b0a241c57ff37 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -1847,8 +1847,11 @@ int rvt_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr,
wqe = rvt_get_rwqe_ptr(&qp->r_rq, wq->head);
wqe->wr_id = wr->wr_id;
wqe->num_sge = wr->num_sge;
- for (i = 0; i < wr->num_sge; i++)
- wqe->sg_list[i] = wr->sg_list[i];
+ for (i = 0; i < wr->num_sge; i++) {
+ wqe->sg_list[i].addr = wr->sg_list[i].addr;
+ wqe->sg_list[i].length = wr->sg_list[i].length;
+ wqe->sg_list[i].lkey = wr->sg_list[i].lkey;
+ }
/*
* Make sure queue entry is written
* before the head index.
@@ -2250,8 +2253,11 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
wqe = rvt_get_rwqe_ptr(&srq->rq, wq->head);
wqe->wr_id = wr->wr_id;
wqe->num_sge = wr->num_sge;
- for (i = 0; i < wr->num_sge; i++)
- wqe->sg_list[i] = wr->sg_list[i];
+ for (i = 0; i < wr->num_sge; i++) {
+ wqe->sg_list[i].addr = wr->sg_list[i].addr;
+ wqe->sg_list[i].length = wr->sg_list[i].length;
+ wqe->sg_list[i].lkey = wr->sg_list[i].lkey;
+ }
/* Make sure queue entry is written before the head index. */
smp_store_release(&wq->head, next);
spin_unlock_irqrestore(&srq->rq.kwq->p_lock, flags);
@@ -2259,6 +2265,22 @@ int rvt_post_srq_recv(struct ib_srq *ibsrq, const struct ib_recv_wr *wr,
return 0;
}

+/*
+ * rvt used the internal kernel struct as part of its ABI, for now make sure
+ * the kernel struct does not change layout. FIXME: rvt should never cast the
+ * user struct to a kernel struct.
+ */
+static struct ib_sge *rvt_cast_sge(struct rvt_wqe_sge *sge)
+{
+ BUILD_BUG_ON(offsetof(struct ib_sge, addr) !=
+ offsetof(struct rvt_wqe_sge, addr));
+ BUILD_BUG_ON(offsetof(struct ib_sge, length) !=
+ offsetof(struct rvt_wqe_sge, length));
+ BUILD_BUG_ON(offsetof(struct ib_sge, lkey) !=
+ offsetof(struct rvt_wqe_sge, lkey));
+ return (struct ib_sge *)sge;
+}
+
/*
* Validate a RWQE and fill in the SGE state.
* Return 1 if OK.
@@ -2282,7 +2304,7 @@ static int init_sge(struct rvt_qp *qp, struct rvt_rwqe *wqe)
continue;
/* Check LKEY */
ret = rvt_lkey_ok(rkt, pd, j ? &ss->sg_list[j - 1] : &ss->sge,
- NULL, &wqe->sg_list[i],
+ NULL, rvt_cast_sge(&wqe->sg_list[i]),
IB_ACCESS_LOCAL_WRITE);
if (unlikely(ret <= 0))
goto bad_lkey;
diff --git a/include/uapi/rdma/rvt-abi.h b/include/uapi/rdma/rvt-abi.h
index d2e35d24f1a9e6..7328293c715cfb 100644
--- a/include/uapi/rdma/rvt-abi.h
+++ b/include/uapi/rdma/rvt-abi.h
@@ -10,11 +10,16 @@

#include <linux/types.h>
#include <rdma/ib_user_verbs.h>
-#include <rdma/ib_verbs.h>
#ifndef RDMA_ATOMIC_UAPI
#define RDMA_ATOMIC_UAPI(_type, _name) struct{ _type val; } _name
#endif

+struct rvt_wqe_sge {
+ __aligned_u64 addr;
+ __u32 length;
+ __u32 lkey;
+};
+
/*
* This structure is used to contain the head pointer, tail pointer,
* and completion queue entries as a single memory allocation so
@@ -39,7 +44,7 @@ struct rvt_rwqe {
__u64 wr_id;
__u8 num_sge;
__u8 padding[7];
- struct ib_sge sg_list[];
+ struct rvt_wqe_sge sg_list[];
};

/*
--
2.21.0


2019-07-09 03:14:40

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rdma tree

Hi Jason,

On Mon, 8 Jul 2019 16:08:27 +0000 Jason Gunthorpe <[email protected]> wrote:
>
> From f10ff380fd7dfba4a36d40f8dd00fe17da8a1a10 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <[email protected]>
> Date: Mon, 8 Jul 2019 12:17:48 -0300
> Subject: [PATCH] RDMA/rvt: Do not use a kernel header in the ABI
>
> rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
> a new struct with the same layout and use it instead.
>
> Fixes: dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")
> Reported-by: Stephen Rothwell <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>

I applied that to linux-next today.

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2019-07-25 18:33:54

by Dennis Dalessandro

[permalink] [raw]
Subject: Re: linux-next: build failure after merge of the rdma tree

On 7/8/2019 12:08 PM, Jason Gunthorpe wrote:
> On Mon, Jul 08, 2019 at 12:57:25PM +1000, Stephen Rothwell wrote:
>> Hi all,
>>
>> After merging the rdma tree, today's linux-next build (x86_64
>> allmodconfig) failed like this:
>>
>> In file included from <command-line>:32:
>> ./usr/include/rdma/rvt-abi.h:13:10: fatal error: rdma/ib_verbs.h: No such file or directory
>> #include <rdma/ib_verbs.h>
>> ^~~~~~~~~~~~~~~~~
>>
>> Caused by commits
>>
>> dabac6e460ce ("IB/hfi1: Move receive work queue struct into uapi directory")
>>
>> interacting with commit
>>
>> 0c422a3d4e1b ("kbuild: compile-test exported headers to ensure they are self-contained")
>>
>> from the kbuild tree.
>>
>> You can't reference the include/linux headers from uapi headers ...
>>
>> I have used the rmda tree from 20190628 again today (given the previous
>> errors).
>
> This is a bug that will break our userspace package too, we must fix
> it, very happy to see the functionality in "kbuild: compile-test
> exported headers to ensure they are self-contained"
>
> Dennis, you must put stuff in rdma-core and run the rdma-core CI if
> you are messing with the uapi headers.

Sorry for the delay, I've been on vacation the past few weeks, just now
seeing this...

I'm pretty sure Kamenee did when she prepared the patches in the first
place and sent the PR. Not sure where things went off the rails but
we'll be more careful in the future. Thanks for fixing.

-Denny