2018-02-20 21:00:40

by Arnd Bergmann

[permalink] [raw]
Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

The ib_wc structure has grown to much that putting 16 of them on the stack
hits the warning limit for dangerous kernel stack consumption:

drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Using half that number brings us comfortably below that limit again.

Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Arnd Bergmann <[email protected]>
---
drivers/infiniband/core/cq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..2626adbb978e 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -16,7 +16,7 @@
#include <rdma/ib_verbs.h>

/* # of WCs to poll for with a single call to ib_poll_cq */
-#define IB_POLL_BATCH 16
+#define IB_POLL_BATCH 8

/* # of WCs to iterate over before yielding */
#define IB_POLL_BUDGET_IRQ 256
--
2.9.0



2018-02-20 21:15:32

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

Hi Arnd Bergmann,

> -----Original Message-----
> From: [email protected] [mailto:linux-rdma-
> [email protected]] On Behalf Of Arnd Bergmann
> Sent: Tuesday, February 20, 2018 2:59 PM
> To: Doug Ledford <[email protected]>; Jason Gunthorpe <[email protected]>
> Cc: Arnd Bergmann <[email protected]>; Leon Romanovsky
> <[email protected]>; Sagi Grimberg <[email protected]>; Bart Van Assche
> <[email protected]>; [email protected]; linux-
> [email protected]
> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>
> The ib_wc structure has grown to much that putting 16 of them on the stack hits
> the warning limit for dangerous kernel stack consumption:
>
> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
> than 1024 bytes [-Werror=frame-larger-than=]
>
> Using half that number brings us comfortably below that limit again.
>
> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
> RDMA resources")

It is not clear to me how above commit 02d8883f520e introduced this stack issue.

Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?

[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/

2018-02-20 21:17:39

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
> /* # of WCs to poll for with a single call to ib_poll_cq */
> -#define IB_POLL_BATCH 16
> +#define IB_POLL_BATCH 8

The purpose of batch polling is to minimize contention on the cq spinlock.
Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
the performance impact of this change been verified for all affected drivers
(ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
over RDMA, ...)?

Bart.




2018-02-20 21:50:35

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <[email protected]> wrote:
>
> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>> /* # of WCs to poll for with a single call to ib_poll_cq */
>> -#define IB_POLL_BATCH 16
>> +#define IB_POLL_BATCH 8
>
> The purpose of batch polling is to minimize contention on the cq spinlock.
> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
> the performance impact of this change been verified for all affected drivers
> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
> over RDMA, ...)?

Only the users of the DIRECT polling method use an on-stack
array of ib_wc's. This is only the SRP drivers.

The other two modes have use of a dynamically allocated array
of ib_wc's that hangs off the ib_cq. These shouldn't need any
reduction in the size of this array, and they are the common
case.

IMO a better solution would be to change ib_process_cq_direct
to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

--
Chuck Lever




2018-02-20 21:55:35

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <[email protected]> wrote:
> Hi Arnd Bergmann,
>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-rdma-
>> [email protected]] On Behalf Of Arnd Bergmann
>> Sent: Tuesday, February 20, 2018 2:59 PM
>> To: Doug Ledford <[email protected]>; Jason Gunthorpe <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>; Leon Romanovsky
>> <[email protected]>; Sagi Grimberg <[email protected]>; Bart Van Assche
>> <[email protected]>; [email protected]; linux-
>> [email protected]
>> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>>
>> The ib_wc structure has grown to much that putting 16 of them on the stack hits
>> the warning limit for dangerous kernel stack consumption:
>>
>> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
>> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
>> than 1024 bytes [-Werror=frame-larger-than=]
>>
>> Using half that number brings us comfortably below that limit again.
>>
>> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
>> RDMA resources")
>
> It is not clear to me how above commit 02d8883f520e introduced this stack issue.

My mistake, I misread the git history.

I did a proper bisection now and ended up with the commit that added the
IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq:
Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")

> Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
> Did you hit this error after/before applying patch [2]?
>
> [1] https://www.spinics.net/lists/linux-rdma/msg50754.html
> [2] https://patchwork.kernel.org/patch/10159623/

I did the analysis a few weeks ago when I first hit the problem but
didn't send it
out at the time. Today I saw the problem still persists on mainline (4.16-rc2),
which does contain the patch from [2].

What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes
of padding, so 16 of them gets us exactly the warning limit, and then there
are a few bytes for the function itself.

Arnd

2018-02-21 09:49:01

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



On 2/20/2018 11:47 PM, Chuck Lever wrote:
>
>
>> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <[email protected]> wrote:
>>
>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH 16
>>> +#define IB_POLL_BATCH 8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
>
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
>
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
>
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

Yup, good idea.
you can define IB_DIRECT_POLL_BATCH to be 8 and use it in
ib_process_cq_direct. *but* please make sure to use the right value in
ib_poll_cq since the wcs array should be able to hold the requested
amount of wcs.

-Max.

>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2018-02-21 14:48:29

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



On 2/21/2018 3:44 PM, Sagi Grimberg wrote:
>
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH            16
>>>> +#define IB_POLL_BATCH            8
>>>
>>> The purpose of batch polling is to minimize contention on the cq
>>> spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance
>>> negatively. Has
>>> the performance impact of this change been verified for all affected
>>> drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB
>>> Direct, NFS
>>> over RDMA, ...)?
>>
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>>
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>>
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
>
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
>
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?

but we tried to avoid cuncurrent access to cq->wc.
Why can't we use the solution I wrote above ?

>
> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
>  #define IB_POLL_FLAGS \
>         (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
>
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
>  {
>         int i, n, completed = 0;
> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
> +       struct ib_wc *wcs = cq->wc;
>
>         /*
>          * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
> budget, struct ib_wc *poll_wc)
>   */
>  int ib_process_cq_direct(struct ib_cq *cq, int budget)
>  {
> -       struct ib_wc wcs[IB_POLL_BATCH];
> -
> -       return __ib_process_cq(cq, budget, wcs);
> +       if (unlikely(WARN_ON_ONCE(!cq->wc)))
> +               return 0;
> +       return __ib_process_cq(cq, budget);
>  }
>  EXPORT_SYMBOL(ib_process_cq_direct);
>
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
> budget)
>         struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
>         int completed;
>
> -       completed = __ib_process_cq(cq, budget, NULL);
> +       completed = __ib_process_cq(cq, budget);
>         if (completed < budget) {
>                 irq_poll_complete(&cq->iop);
>                 if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
>         struct ib_cq *cq = container_of(work, struct ib_cq, work);
>         int completed;
>
> -       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> +       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
>         if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>             ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>                 queue_work(ib_comp_wq, &cq->work);
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

2018-02-21 18:46:18

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant


>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH 16
>>> +#define IB_POLL_BATCH 8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
>
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
>
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
>
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

The only reason why I added this array on-stack was to allow consumers
that did not use ib_alloc_cq api to call it, but that seems like a
wrong decision when thinking it over again (as probably these users
did not set the wr_cqe correctly).

How about we make ib_process_cq_direct use the cq wc array and add
a WARN_ON statement (and fail it gracefully) if the caller used this
API without calling ib_alloc_cq?

--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..cd3e9e124834 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -25,10 +25,10 @@
#define IB_POLL_FLAGS \
(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget)
{
int i, n, completed = 0;
- struct ib_wc *wcs = poll_wc ? : cq->wc;
+ struct ib_wc *wcs = cq->wc;

/*
* budget might be (-1) if the caller does not
@@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)
*/
int ib_process_cq_direct(struct ib_cq *cq, int budget)
{
- struct ib_wc wcs[IB_POLL_BATCH];
-
- return __ib_process_cq(cq, budget, wcs);
+ if (unlikely(WARN_ON_ONCE(!cq->wc)))
+ return 0;
+ return __ib_process_cq(cq, budget);
}
EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
budget)
struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
int completed;

- completed = __ib_process_cq(cq, budget, NULL);
+ completed = __ib_process_cq(cq, budget);
if (completed < budget) {
irq_poll_complete(&cq->iop);
if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
struct ib_cq *cq = container_of(work, struct ib_cq, work);
int completed;

- completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+ completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
queue_work(ib_comp_wq, &cq->work);
--

2018-02-21 18:56:03

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



> On Feb 21, 2018, at 8:44 AM, Sagi Grimberg <[email protected]> wrote:
>
>
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH 16
>>>> +#define IB_POLL_BATCH 8
>>>
>>> The purpose of batch polling is to minimize contention on the cq spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>>> the performance impact of this change been verified for all affected drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>>> over RDMA, ...)?
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
>
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
>
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?

Agreed, I prefer that all three modes use dynamically allocated
memory for that array.


> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
> #define IB_POLL_FLAGS \
> (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
>
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
> {
> int i, n, completed = 0;
> - struct ib_wc *wcs = poll_wc ? : cq->wc;
> + struct ib_wc *wcs = cq->wc;
>
> /*
> * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
> */
> int ib_process_cq_direct(struct ib_cq *cq, int budget)
> {
> - struct ib_wc wcs[IB_POLL_BATCH];
> -
> - return __ib_process_cq(cq, budget, wcs);
> + if (unlikely(WARN_ON_ONCE(!cq->wc)))
> + return 0;
> + return __ib_process_cq(cq, budget);
> }
> EXPORT_SYMBOL(ib_process_cq_direct);
>
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)
> struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
> int completed;
>
> - completed = __ib_process_cq(cq, budget, NULL);
> + completed = __ib_process_cq(cq, budget);
> if (completed < budget) {
> irq_poll_complete(&cq->iop);
> if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
> struct ib_cq *cq = container_of(work, struct ib_cq, work);
> int completed;
>
> - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
> if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
> ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> queue_work(ib_comp_wq, &cq->work);
> --

--
Chuck Lever




2018-02-22 15:41:08

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant


>> The only reason why I added this array on-stack was to allow consumers
>> that did not use ib_alloc_cq api to call it, but that seems like a
>> wrong decision when thinking it over again (as probably these users
>> did not set the wr_cqe correctly).
>>
>> How about we make ib_process_cq_direct use the cq wc array and add
>> a WARN_ON statement (and fail it gracefully) if the caller used this
>> API without calling ib_alloc_cq?
>
> but we tried to avoid cuncurrent access to cq->wc.

Not sure its a valid use-case. But if there is a compelling
reason to keep it as is, then we can do smaller on-stack
array.

2018-02-27 22:10:59

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
>
> >>The only reason why I added this array on-stack was to allow consumers
> >>that did not use ib_alloc_cq api to call it, but that seems like a
> >>wrong decision when thinking it over again (as probably these users
> >>did not set the wr_cqe correctly).
> >>
> >>How about we make ib_process_cq_direct use the cq wc array and add
> >>a WARN_ON statement (and fail it gracefully) if the caller used this
> >>API without calling ib_alloc_cq?
> >
> >but we tried to avoid cuncurrent access to cq->wc.
>
> Not sure its a valid use-case. But if there is a compelling
> reason to keep it as is, then we can do smaller on-stack
> array.

Did we come to a conclusion what to do here?

Jason

2018-02-27 22:17:37

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



On 2/28/2018 12:09 AM, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
>>
>>>> The only reason why I added this array on-stack was to allow consumers
>>>> that did not use ib_alloc_cq api to call it, but that seems like a
>>>> wrong decision when thinking it over again (as probably these users
>>>> did not set the wr_cqe correctly).
>>>>
>>>> How about we make ib_process_cq_direct use the cq wc array and add
>>>> a WARN_ON statement (and fail it gracefully) if the caller used this
>>>> API without calling ib_alloc_cq?
>>>
>>> but we tried to avoid cuncurrent access to cq->wc.
>>
>> Not sure its a valid use-case. But if there is a compelling
>> reason to keep it as is, then we can do smaller on-stack
>> array.
>
> Did we come to a conclusion what to do here?

guys,
what do you think about the following solution (untested):


diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8..59d2835 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -17,6 +17,7 @@

/* # of WCs to poll for with a single call to ib_poll_cq */
#define IB_POLL_BATCH 16
+#define IB_POLL_BATCH_DIRECT 8

/* # of WCs to iterate over before yielding */
#define IB_POLL_BUDGET_IRQ 256
@@ -25,17 +26,25 @@
#define IB_POLL_FLAGS \
(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc,
+ int batch)
{
- int i, n, completed = 0;
- struct ib_wc *wcs = poll_wc ? : cq->wc;
-
+ int i, n, ib_poll_batch, completed = 0;
+ struct ib_wc *wcs;
+
+ if (poll_wc) {
+ wcs = poll_wc;
+ ib_poll_batch = batch;
+ } else {
+ wcs = cq->wc;
+ ib_poll_batch = IB_POLL_BATCH;
+ }
/*
* budget might be (-1) if the caller does not
* want to bound this call, thus we need unsigned
* minimum here.
*/
- while ((n = ib_poll_cq(cq, min_t(u32, IB_POLL_BATCH,
+ while ((n = ib_poll_cq(cq, min_t(u32, ib_poll_batch,
budget - completed), wcs)) > 0) {
for (i = 0; i < n; i++) {
struct ib_wc *wc = &wcs[i];
@@ -48,7 +57,7 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)

completed += n;

- if (n != IB_POLL_BATCH ||
+ if (n != ib_poll_batch ||
(budget != -1 && completed >= budget))
break;
}
@@ -72,9 +81,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)
*/
int ib_process_cq_direct(struct ib_cq *cq, int budget)
{
- struct ib_wc wcs[IB_POLL_BATCH];
+ struct ib_wc wcs[IB_POLL_BATCH_DIRECT];

- return __ib_process_cq(cq, budget, wcs);
+ return __ib_process_cq(cq, budget, wcs, IB_POLL_BATCH_DIRECT);
}
EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +97,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
budget)
struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
int completed;

- completed = __ib_process_cq(cq, budget, NULL);
+ completed = __ib_process_cq(cq, budget, NULL, 0);
if (completed < budget) {
irq_poll_complete(&cq->iop);
if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +117,7 @@ static void ib_cq_poll_work(struct work_struct *work)
struct ib_cq *cq = container_of(work, struct ib_cq, work);
int completed;

- completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+ completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL, 0);
if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
queue_work(ib_comp_wq, &cq->work);



>
> Jason
>

2018-02-28 00:22:20

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

On 02/27/18 14:15, Max Gurtovoy wrote:
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc,
> +????????????????????????? int batch)
> ?{
> -?????? int i, n, completed = 0;
> -?????? struct ib_wc *wcs = poll_wc ? : cq->wc;
> + int i, n, ib_poll_batch, completed = 0;
> + struct ib_wc *wcs;
> +
> + if (poll_wc) {
> + wcs = poll_wc;
> + ib_poll_batch = batch;
> + } else {
> + wcs = cq->wc;
> + ib_poll_batch = IB_POLL_BATCH;
> + }

Since this code has to be touched I think that we can use this
opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
instead use what the caller passes. That will require to update all
__ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
pass ib_poll_batch instead of figuring it out in this function.
Otherwise the approach of this patch looks fine to me.

Thanks,

Bart.

2018-02-28 09:52:51

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> On 02/27/18 14:15, Max Gurtovoy wrote:
>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>> *poll_wc)
>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>> *poll_wc,
>> +????????????????????????? int batch)
>> ??{
>> -?????? int i, n, completed = 0;
>> -?????? struct ib_wc *wcs = poll_wc ? : cq->wc;
>> +?????? int i, n, ib_poll_batch, completed = 0;
>> +?????? struct ib_wc *wcs;
>> +
>> +?????? if (poll_wc) {
>> +?????????????? wcs = poll_wc;
>> +?????????????? ib_poll_batch = batch;
>> +?????? } else {
>> +?????????????? wcs = cq->wc;
>> +?????????????? ib_poll_batch = IB_POLL_BATCH;
>> +?????? }
>
> Since this code has to be touched I think that we can use this
> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
> instead use what the caller passes. That will require to update all
> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
> pass ib_poll_batch instead of figuring it out in this function.
> Otherwise the approach of this patch looks fine to me.

Thanks Bart.
I'll make these changes and submit.

>
> Thanks,
>
> Bart.

-Max.

2018-02-28 18:56:41

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
>
> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> > On 02/27/18 14:15, Max Gurtovoy wrote:
> > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> > > *poll_wc)
> > > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> > > *poll_wc,
> > > + int batch)
> > > {
> > > - int i, n, completed = 0;
> > > - struct ib_wc *wcs = poll_wc ? : cq->wc;
> > > + int i, n, ib_poll_batch, completed = 0;
> > > + struct ib_wc *wcs;
> > > +
> > > + if (poll_wc) {
> > > + wcs = poll_wc;
> > > + ib_poll_batch = batch;
> > > + } else {
> > > + wcs = cq->wc;
> > > + ib_poll_batch = IB_POLL_BATCH;
> > > + }
> >
> > Since this code has to be touched I think that we can use this
> > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
> > instead use what the caller passes. That will require to update all
> > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
> > pass ib_poll_batch instead of figuring it out in this function.
> > Otherwise the approach of this patch looks fine to me.
>
> Thanks Bart.
> I'll make these changes and submit.

That sounds reasonable to me too, thanks for reworking and resubmitting.

--
Doug Ledford <[email protected]>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD


Attachments:
signature.asc (849.00 B)
This is a digitally signed message part

2018-03-01 09:38:15

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant



On 2/28/2018 8:55 PM, Doug Ledford wrote:
> On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
>>
>> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
>>> On 02/27/18 14:15, Max Gurtovoy wrote:
>>>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc)
>>>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc,
>>>> + int batch)
>>>> {
>>>> - int i, n, completed = 0;
>>>> - struct ib_wc *wcs = poll_wc ? : cq->wc;
>>>> + int i, n, ib_poll_batch, completed = 0;
>>>> + struct ib_wc *wcs;
>>>> +
>>>> + if (poll_wc) {
>>>> + wcs = poll_wc;
>>>> + ib_poll_batch = batch;
>>>> + } else {
>>>> + wcs = cq->wc;
>>>> + ib_poll_batch = IB_POLL_BATCH;
>>>> + }
>>>
>>> Since this code has to be touched I think that we can use this
>>> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
>>> instead use what the caller passes. That will require to update all
>>> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
>>> pass ib_poll_batch instead of figuring it out in this function.
>>> Otherwise the approach of this patch looks fine to me.
>>
>> Thanks Bart.
>> I'll make these changes and submit.
>
> That sounds reasonable to me too, thanks for reworking and resubmitting.
>

Sure, NP.
We've run NVMe-oF and SRP with the new patch.
I'll send it through Mellanox maintainers pull request.

Thanks for reporting and reviewing.

-Max.