(No filename) (5.94 kB)
signature.asc (819.00 B)
Digital signature Download all attachments

On Sun, Jun 19, 2016 at 04:02:23PM -0400, Chuck Lever wrote:
>

Thanks Chuck and Sagi for the help.

<...>

>
> > I
> > didn't see such note along the thread, basically, I think this is
> > where we should be starting, thoughts? I also added the mlx4 core/IB
> > maintainer.
>
> Yishai was notified about this issue on May 25:
>
> http://marc.info/?l=linux-rdma&m=146419192913960&w=2

Yishai and me follow this thread closely and we work on finding the
root cause of this issue.

Thanks

>
>
> --
> Chuck Lever
>
>
>

Attachments:

(No filename) (526.00 B)
signature.asc (819.00 B)
Digital signature Download all attachments

2016-06-20 06:35:11

by Sagi Grimberg

[permalink] [raw]

Subject: Re: [PATCH v2 01/24] mlx4-ib: Use coherent memory for priv pages

> Yishai and me follow this thread closely and we work on finding the
> root cause of this issue.

Thanks Leon and Yishai, let me know if you need any help with this.

Do you agree we should move forward with the original patch until
we get this resolved?

Also, did anyone find out if this is happening in mlx5 as well?
(Chuck?), if not then this would trim the root-cause to be a mlx4
specific issue.

2016-06-20 07:02:03

----- Original Message -----
> From: "Sagi Grimberg" <[email protected]>
> To: "Yishai Hadas" <[email protected]>, "Chuck Lever" <[email protected]>
> Cc: [email protected], "Or Gerlitz" <[email protected]>, "Yishai Hadas" <[email protected]>, "linux-rdma"
> <[email protected]>, "Linux NFS Mailing List" <[email protected]>, "Majd Dibbiny"
> <[email protected]>
> Sent: Tuesday, June 21, 2016 9:56:44 AM
> Subject: Re: [PATCH v2 01/24] mlx4-ib: Use coherent memory for priv pages
>
>
> > Just found the root cause of the problem, it was found to be a hardware
> > limitation that is described as part of the PRM. The driver code had to
> > be written accordingly, confirmed that internally with the relevant people.
> >
> > From PRM:
> > "The PBL should be physically contiguous, must reside in a
> > 64-byte-aligned address, and must not include the last 8 bytes of a page."
> >
> > The last sentence pointed that only one page can be used as the last 8
> > bytes should not be included. That's why there is a hard limit in the
> > code for 511 entries.
> >
> > Re the candidate fix that you sent, from initial review it makes sense,
> > we'll formally confirm it soon after finalizing the regression testing
> > in our side.
> >
> > Thanks Chuck and Sagi for evaluating and working on a solution.
>
> Thanks Yishai,
>
> That clears up the root-cause.
>
> Does the same holds for mlx5? or we can leave it alone?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Also wondering about mlx5 because the default is coherent and increasing the allowed queue depth got me into the swiotlb error situation.
Backing the queue depth down per Bart's suggestion to 32 avoids the swiotlb errors.
Likley 128 is too high anyway, but the weird part of my testing as already mentioned was that its only seen during reconnect activity.

Thanks
Laurence