2021-09-08 06:19:21

by Shunsuke Mie

[permalink] [raw]
Subject: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

To share memory space using dma-buf, a API of the dma-buf requires dma
device, but devices such as rxe do not have a dma device. For those case,
change to specify a device of struct ib instead of the dma device.

Signed-off-by: Shunsuke Mie <[email protected]>
---
drivers/infiniband/core/umem_dmabuf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index c6e875619fac..58d6ac9cb51a 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -146,7 +146,7 @@ struct ib_umem_dmabuf *ib_umem_dmabuf_get(struct ib_device *device,

umem_dmabuf->attach = dma_buf_dynamic_attach(
dmabuf,
- device->dma_device,
+ device->dma_device ? device->dma_device : &device->dev,
ops,
umem_dmabuf);
if (IS_ERR(umem_dmabuf->attach)) {
--
2.17.1


2021-09-08 07:05:53

by Shunsuke Mie

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

Thank you for your comment.
>
> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > To share memory space using dma-buf, a API of the dma-buf requires dma
> > device, but devices such as rxe do not have a dma device. For those case,
> > change to specify a device of struct ib instead of the dma device.
>
> So if dma-buf doesn't actually need a device to dma map why do we ever
> pass the dma_device here? Something does not add up.
As described in the dma-buf api guide [1], the dma_device is used by dma-buf
exporter to know the device buffer constraints of importer.
[1] https://lwn.net/Articles/489703/

2021-09-08 07:47:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> To share memory space using dma-buf, a API of the dma-buf requires dma
> device, but devices such as rxe do not have a dma device. For those case,
> change to specify a device of struct ib instead of the dma device.

So if dma-buf doesn't actually need a device to dma map why do we ever
pass the dma_device here? Something does not add up.

2021-09-08 08:42:58

by Shunsuke Mie

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
>
> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > Thank you for your comment.
> > >
> > > On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > device, but devices such as rxe do not have a dma device. For those case,
> > > > change to specify a device of struct ib instead of the dma device.
> > >
> > > So if dma-buf doesn't actually need a device to dma map why do we ever
> > > pass the dma_device here? Something does not add up.
> > As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > exporter to know the device buffer constraints of importer.
> > [1] https://lwn.net/Articles/489703/
>
> Which means for rxe you'd also have to pass the one for the underlying
> net device.
I thought of that way too. In that case, the memory region is constrained by the
net device, but rxe driver copies data using CPU. To avoid the constraints, I
decided to use the ib device.

Thanks,

2021-09-08 10:04:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> Thank you for your comment.
> >
> > On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > To share memory space using dma-buf, a API of the dma-buf requires dma
> > > device, but devices such as rxe do not have a dma device. For those case,
> > > change to specify a device of struct ib instead of the dma device.
> >
> > So if dma-buf doesn't actually need a device to dma map why do we ever
> > pass the dma_device here? Something does not add up.
> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> exporter to know the device buffer constraints of importer.
> [1] https://lwn.net/Articles/489703/

Which means for rxe you'd also have to pass the one for the underlying
net device.

2021-09-08 11:20:56

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> >
> > On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > Thank you for your comment.
> > > >
> > > > On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > > To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > > device, but devices such as rxe do not have a dma device. For those case,
> > > > > change to specify a device of struct ib instead of the dma device.
> > > >
> > > > So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > pass the dma_device here? Something does not add up.
> > > As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > exporter to know the device buffer constraints of importer.
> > > [1] https://lwn.net/Articles/489703/
> >
> > Which means for rxe you'd also have to pass the one for the underlying
> > net device.
> I thought of that way too. In that case, the memory region is constrained by the
> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> decided to use the ib device.

Well, that is the whole problem.

We can't mix the dmabuf stuff people are doing that doesn't fill in
the CPU pages in the SGL with RXE - it is simply impossible as things
currently are for RXE to acess this non-struct page memory.

Jason

2021-09-08 13:42:45

by Christian König

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
>> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
>>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
>>>> Thank you for your comment.
>>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
>>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
>>>>>> device, but devices such as rxe do not have a dma device. For those case,
>>>>>> change to specify a device of struct ib instead of the dma device.
>>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
>>>>> pass the dma_device here? Something does not add up.
>>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
>>>> exporter to know the device buffer constraints of importer.
>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
>>> Which means for rxe you'd also have to pass the one for the underlying
>>> net device.
>> I thought of that way too. In that case, the memory region is constrained by the
>> net device, but rxe driver copies data using CPU. To avoid the constraints, I
>> decided to use the ib device.
> Well, that is the whole problem.
>
> We can't mix the dmabuf stuff people are doing that doesn't fill in
> the CPU pages in the SGL with RXE - it is simply impossible as things
> currently are for RXE to acess this non-struct page memory.

Yeah, agree that doesn't make much sense.

When you want to access the data with the CPU then why do you want to
use DMA-buf in the first place?

Please keep in mind that there is work ongoing to replace the sg table
with an DMA address array and so make the underlying struct page
inaccessible for importers.

Regards,
Christian.

>
> Jason

2021-09-08 20:09:44

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> >>>> Thank you for your comment.
> >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> >>>>>> change to specify a device of struct ib instead of the dma device.
> >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> >>>>> pass the dma_device here? Something does not add up.
> >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> >>>> exporter to know the device buffer constraints of importer.
> >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> >>> Which means for rxe you'd also have to pass the one for the underlying
> >>> net device.
> >> I thought of that way too. In that case, the memory region is constrained by the
> >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> >> decided to use the ib device.
> > Well, that is the whole problem.
> >
> > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > the CPU pages in the SGL with RXE - it is simply impossible as things
> > currently are for RXE to acess this non-struct page memory.
>
> Yeah, agree that doesn't make much sense.
>
> When you want to access the data with the CPU then why do you want to
> use DMA-buf in the first place?
>
> Please keep in mind that there is work ongoing to replace the sg table
> with an DMA address array and so make the underlying struct page
> inaccessible for importers.

Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
for cpu access. Which intentionally does not require any device. No
idea why there's a dma_buf_attach involved. Now not all exporters
support this, but that's fixable, and you must call
dma_buf_begin/end_cpu_access for cache management if the allocation
isn't cpu coherent. But it's all there, no need to apply hacks of
allowing a wrong device or other fun things.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-09-08 23:35:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > >>>> Thank you for your comment.
> > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > >>>>>> change to specify a device of struct ib instead of the dma device.
> > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > >>>>> pass the dma_device here? Something does not add up.
> > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > >>>> exporter to know the device buffer constraints of importer.
> > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > >>> Which means for rxe you'd also have to pass the one for the underlying
> > >>> net device.
> > >> I thought of that way too. In that case, the memory region is constrained by the
> > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > >> decided to use the ib device.
> > > Well, that is the whole problem.
> > >
> > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > currently are for RXE to acess this non-struct page memory.
> >
> > Yeah, agree that doesn't make much sense.
> >
> > When you want to access the data with the CPU then why do you want to
> > use DMA-buf in the first place?
> >
> > Please keep in mind that there is work ongoing to replace the sg table
> > with an DMA address array and so make the underlying struct page
> > inaccessible for importers.
>
> Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> for cpu access. Which intentionally does not require any device. No
> idea why there's a dma_buf_attach involved. Now not all exporters
> support this, but that's fixable, and you must call
> dma_buf_begin/end_cpu_access for cache management if the allocation
> isn't cpu coherent. But it's all there, no need to apply hacks of
> allowing a wrong device or other fun things.

Can rxe leave the vmap in place potentially forever?

Jason

2021-09-09 09:28:51

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > >>>> Thank you for your comment.
> > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > >>>>> pass the dma_device here? Something does not add up.
> > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > >>>> exporter to know the device buffer constraints of importer.
> > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > >>> net device.
> > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > >> decided to use the ib device.
> > > > Well, that is the whole problem.
> > > >
> > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > currently are for RXE to acess this non-struct page memory.
> > >
> > > Yeah, agree that doesn't make much sense.
> > >
> > > When you want to access the data with the CPU then why do you want to
> > > use DMA-buf in the first place?
> > >
> > > Please keep in mind that there is work ongoing to replace the sg table
> > > with an DMA address array and so make the underlying struct page
> > > inaccessible for importers.
> >
> > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > for cpu access. Which intentionally does not require any device. No
> > idea why there's a dma_buf_attach involved. Now not all exporters
> > support this, but that's fixable, and you must call
> > dma_buf_begin/end_cpu_access for cache management if the allocation
> > isn't cpu coherent. But it's all there, no need to apply hacks of
> > allowing a wrong device or other fun things.
>
> Can rxe leave the vmap in place potentially forever?

Yeah, it's like perma-pinning the buffer into system memory for
non-p2p dma-buf sharing. We just squint and pretend that can't be
abused too badly :-) On 32bit you'll run out of vmap space rather
quickly, but that's not something anyone cares about here either. We
have a bunch of more sw modesetting drivers in drm which use
dma_buf_vmap() like this, so it's all fine.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-09-10 01:48:40

by Shunsuke Mie

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

2021年9月9日(木) 18:26 Daniel Vetter <[email protected]>:
>
> On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> > On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > > >>>> Thank you for your comment.
> > > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > >>>>> pass the dma_device here? Something does not add up.
> > > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > > >>>> exporter to know the device buffer constraints of importer.
> > > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > > >>> net device.
> > > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > > >> decided to use the ib device.
> > > > > Well, that is the whole problem.
> > > > >
> > > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > > currently are for RXE to acess this non-struct page memory.
> > > >
> > > > Yeah, agree that doesn't make much sense.
> > > >
> > > > When you want to access the data with the CPU then why do you want to
> > > > use DMA-buf in the first place?
> > > >
> > > > Please keep in mind that there is work ongoing to replace the sg table
> > > > with an DMA address array and so make the underlying struct page
> > > > inaccessible for importers.
> > >
> > > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > > for cpu access. Which intentionally does not require any device. No
> > > idea why there's a dma_buf_attach involved. Now not all exporters
> > > support this, but that's fixable, and you must call
> > > dma_buf_begin/end_cpu_access for cache management if the allocation
> > > isn't cpu coherent. But it's all there, no need to apply hacks of
> > > allowing a wrong device or other fun things.
> >
> > Can rxe leave the vmap in place potentially forever?
>
> Yeah, it's like perma-pinning the buffer into system memory for
> non-p2p dma-buf sharing. We just squint and pretend that can't be
> abused too badly :-) On 32bit you'll run out of vmap space rather
> quickly, but that's not something anyone cares about here either. We
> have a bunch of more sw modesetting drivers in drm which use
> dma_buf_vmap() like this, so it's all fine.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

Thanks for your comments.

In the first place, the CMA region cannot be used for RDMA because the
region has no struct page. In addition, some GPU drivers use CMA and share
the region as dma-buf. As a result, RDMA cannot transfer for the region. To
solve this problem, rxe dma-buf support is better I thought.

I'll consider and redesign the rxe dma-buf support using the dma_buf_vmap()
instead of the dma_buf_dynamic_attach().

Regards,
Shunsuke

2021-09-14 00:57:46

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Fri, Sep 10, 2021 at 3:46 AM Shunsuke Mie <[email protected]> wrote:
>
> 2021年9月9日(木) 18:26 Daniel Vetter <[email protected]>:
> >
> > On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> > > On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > > > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > > > >>>> Thank you for your comment.
> > > > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > > >>>>> pass the dma_device here? Something does not add up.
> > > > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > > > >>>> exporter to know the device buffer constraints of importer.
> > > > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > > > >>> net device.
> > > > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > > > >> decided to use the ib device.
> > > > > > Well, that is the whole problem.
> > > > > >
> > > > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > > > currently are for RXE to acess this non-struct page memory.
> > > > >
> > > > > Yeah, agree that doesn't make much sense.
> > > > >
> > > > > When you want to access the data with the CPU then why do you want to
> > > > > use DMA-buf in the first place?
> > > > >
> > > > > Please keep in mind that there is work ongoing to replace the sg table
> > > > > with an DMA address array and so make the underlying struct page
> > > > > inaccessible for importers.
> > > >
> > > > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > > > for cpu access. Which intentionally does not require any device. No
> > > > idea why there's a dma_buf_attach involved. Now not all exporters
> > > > support this, but that's fixable, and you must call
> > > > dma_buf_begin/end_cpu_access for cache management if the allocation
> > > > isn't cpu coherent. But it's all there, no need to apply hacks of
> > > > allowing a wrong device or other fun things.
> > >
> > > Can rxe leave the vmap in place potentially forever?
> >
> > Yeah, it's like perma-pinning the buffer into system memory for
> > non-p2p dma-buf sharing. We just squint and pretend that can't be
> > abused too badly :-) On 32bit you'll run out of vmap space rather
> > quickly, but that's not something anyone cares about here either. We
> > have a bunch of more sw modesetting drivers in drm which use
> > dma_buf_vmap() like this, so it's all fine.
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
> Thanks for your comments.
>
> In the first place, the CMA region cannot be used for RDMA because the
> region has no struct page. In addition, some GPU drivers use CMA and share
> the region as dma-buf. As a result, RDMA cannot transfer for the region. To
> solve this problem, rxe dma-buf support is better I thought.
>
> I'll consider and redesign the rxe dma-buf support using the dma_buf_vmap()
> instead of the dma_buf_dynamic_attach().

btw for next version please cc dri-devel. get_maintainers.pl should
pick it up for these patches.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-09-14 07:13:51

by Shunsuke Mie

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

2021年9月14日(火) 4:23 Daniel Vetter <[email protected]>:
>
> On Fri, Sep 10, 2021 at 3:46 AM Shunsuke Mie <[email protected]> wrote:
> >
> > 2021年9月9日(木) 18:26 Daniel Vetter <[email protected]>:
> > >
> > > On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> > > > On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > > > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > > > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > > > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > > > > >>>> Thank you for your comment.
> > > > > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > > > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > > > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > > > >>>>> pass the dma_device here? Something does not add up.
> > > > > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > > > > >>>> exporter to know the device buffer constraints of importer.
> > > > > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > > > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > > > > >>> net device.
> > > > > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > > > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > > > > >> decided to use the ib device.
> > > > > > > Well, that is the whole problem.
> > > > > > >
> > > > > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > > > > currently are for RXE to acess this non-struct page memory.
> > > > > >
> > > > > > Yeah, agree that doesn't make much sense.
> > > > > >
> > > > > > When you want to access the data with the CPU then why do you want to
> > > > > > use DMA-buf in the first place?
> > > > > >
> > > > > > Please keep in mind that there is work ongoing to replace the sg table
> > > > > > with an DMA address array and so make the underlying struct page
> > > > > > inaccessible for importers.
> > > > >
> > > > > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > > > > for cpu access. Which intentionally does not require any device. No
> > > > > idea why there's a dma_buf_attach involved. Now not all exporters
> > > > > support this, but that's fixable, and you must call
> > > > > dma_buf_begin/end_cpu_access for cache management if the allocation
> > > > > isn't cpu coherent. But it's all there, no need to apply hacks of
> > > > > allowing a wrong device or other fun things.
> > > >
> > > > Can rxe leave the vmap in place potentially forever?
> > >
> > > Yeah, it's like perma-pinning the buffer into system memory for
> > > non-p2p dma-buf sharing. We just squint and pretend that can't be
> > > abused too badly :-) On 32bit you'll run out of vmap space rather
> > > quickly, but that's not something anyone cares about here either. We
> > > have a bunch of more sw modesetting drivers in drm which use
> > > dma_buf_vmap() like this, so it's all fine.
> > > -Daniel
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> >
> > Thanks for your comments.
> >
> > In the first place, the CMA region cannot be used for RDMA because the
> > region has no struct page. In addition, some GPU drivers use CMA and share
> > the region as dma-buf. As a result, RDMA cannot transfer for the region. To
> > solve this problem, rxe dma-buf support is better I thought.
> >
> > I'll consider and redesign the rxe dma-buf support using the dma_buf_vmap()
> > instead of the dma_buf_dynamic_attach().
>
> btw for next version please cc dri-devel. get_maintainers.pl should
> pick it up for these patches.
A CC list of these patches is generated by get_maintainers.pl but it
didn't pick up the dri-devel. Should I add the dri-devel to the cc
manually?

Regards,
Shunsuke

2021-09-14 09:42:34

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

On Tue, Sep 14, 2021 at 9:11 AM Shunsuke Mie <[email protected]> wrote:
>
> 2021年9月14日(火) 4:23 Daniel Vetter <[email protected]>:
> >
> > On Fri, Sep 10, 2021 at 3:46 AM Shunsuke Mie <[email protected]> wrote:
> > >
> > > 2021年9月9日(木) 18:26 Daniel Vetter <[email protected]>:
> > > >
> > > > On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> > > > > On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > > > > > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > > > > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > > > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > > > > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > > > > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > > > > > >>>> Thank you for your comment.
> > > > > > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > > > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > > > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > > > > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > > > > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > > > > >>>>> pass the dma_device here? Something does not add up.
> > > > > > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > > > > > >>>> exporter to know the device buffer constraints of importer.
> > > > > > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > > > > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > > > > > >>> net device.
> > > > > > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > > > > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > > > > > >> decided to use the ib device.
> > > > > > > > Well, that is the whole problem.
> > > > > > > >
> > > > > > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > > > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > > > > > currently are for RXE to acess this non-struct page memory.
> > > > > > >
> > > > > > > Yeah, agree that doesn't make much sense.
> > > > > > >
> > > > > > > When you want to access the data with the CPU then why do you want to
> > > > > > > use DMA-buf in the first place?
> > > > > > >
> > > > > > > Please keep in mind that there is work ongoing to replace the sg table
> > > > > > > with an DMA address array and so make the underlying struct page
> > > > > > > inaccessible for importers.
> > > > > >
> > > > > > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > > > > > for cpu access. Which intentionally does not require any device. No
> > > > > > idea why there's a dma_buf_attach involved. Now not all exporters
> > > > > > support this, but that's fixable, and you must call
> > > > > > dma_buf_begin/end_cpu_access for cache management if the allocation
> > > > > > isn't cpu coherent. But it's all there, no need to apply hacks of
> > > > > > allowing a wrong device or other fun things.
> > > > >
> > > > > Can rxe leave the vmap in place potentially forever?
> > > >
> > > > Yeah, it's like perma-pinning the buffer into system memory for
> > > > non-p2p dma-buf sharing. We just squint and pretend that can't be
> > > > abused too badly :-) On 32bit you'll run out of vmap space rather
> > > > quickly, but that's not something anyone cares about here either. We
> > > > have a bunch of more sw modesetting drivers in drm which use
> > > > dma_buf_vmap() like this, so it's all fine.
> > > > -Daniel
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> > >
> > > Thanks for your comments.
> > >
> > > In the first place, the CMA region cannot be used for RDMA because the
> > > region has no struct page. In addition, some GPU drivers use CMA and share
> > > the region as dma-buf. As a result, RDMA cannot transfer for the region. To
> > > solve this problem, rxe dma-buf support is better I thought.
> > >
> > > I'll consider and redesign the rxe dma-buf support using the dma_buf_vmap()
> > > instead of the dma_buf_dynamic_attach().
> >
> > btw for next version please cc dri-devel. get_maintainers.pl should
> > pick it up for these patches.
> A CC list of these patches is generated by get_maintainers.pl but it
> didn't pick up the dri-devel. Should I add the dri-devel to the cc
> manually?

Hm yes, on rechecking the regex doesn't match since you're not
touching any dma-buf code directly. Or not directly enough for
get_maintainers.pl to pick it up.

DMA BUFFER SHARING FRAMEWORK
M: Sumit Semwal <[email protected]>
M: Christian König <[email protected]>
L: [email protected]
L: [email protected]
L: [email protected] (moderated for non-subscribers)
S: Maintained
T: git git://anongit.freedesktop.org/drm/drm-misc
F: Documentation/driver-api/dma-buf.rst
F: drivers/dma-buf/
F: include/linux/*fence.h
F: include/linux/dma-buf*
F: include/linux/dma-resv.h
K: \bdma_(?:buf|fence|resv)\b

Above is the MAINTAINERS entry that's always good to cc for anything
related to dma_buf/fence/resv and any of these related things.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-09-14 10:15:02

by Shunsuke Mie

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] RDMA/umem: Change for rdma devices has not dma device

2021年9月14日(火) 18:38 Daniel Vetter <[email protected]>:
>
> On Tue, Sep 14, 2021 at 9:11 AM Shunsuke Mie <[email protected]> wrote:
> >
> > 2021年9月14日(火) 4:23 Daniel Vetter <[email protected]>:
> > >
> > > On Fri, Sep 10, 2021 at 3:46 AM Shunsuke Mie <[email protected]> wrote:
> > > >
> > > > 2021年9月9日(木) 18:26 Daniel Vetter <[email protected]>:
> > > > >
> > > > > On Thu, Sep 9, 2021 at 1:33 AM Jason Gunthorpe <[email protected]> wrote:
> > > > > > On Wed, Sep 08, 2021 at 09:22:37PM +0200, Daniel Vetter wrote:
> > > > > > > On Wed, Sep 8, 2021 at 3:33 PM Christian König <[email protected]> wrote:
> > > > > > > > Am 08.09.21 um 13:18 schrieb Jason Gunthorpe:
> > > > > > > > > On Wed, Sep 08, 2021 at 05:41:39PM +0900, Shunsuke Mie wrote:
> > > > > > > > >> 2021年9月8日(水) 16:20 Christoph Hellwig <[email protected]>:
> > > > > > > > >>> On Wed, Sep 08, 2021 at 04:01:14PM +0900, Shunsuke Mie wrote:
> > > > > > > > >>>> Thank you for your comment.
> > > > > > > > >>>>> On Wed, Sep 08, 2021 at 03:16:09PM +0900, Shunsuke Mie wrote:
> > > > > > > > >>>>>> To share memory space using dma-buf, a API of the dma-buf requires dma
> > > > > > > > >>>>>> device, but devices such as rxe do not have a dma device. For those case,
> > > > > > > > >>>>>> change to specify a device of struct ib instead of the dma device.
> > > > > > > > >>>>> So if dma-buf doesn't actually need a device to dma map why do we ever
> > > > > > > > >>>>> pass the dma_device here? Something does not add up.
> > > > > > > > >>>> As described in the dma-buf api guide [1], the dma_device is used by dma-buf
> > > > > > > > >>>> exporter to know the device buffer constraints of importer.
> > > > > > > > >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flwn.net%2FArticles%2F489703%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C4d18470a94df4ed24c8108d972ba5591%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637666967356417448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=ARwQyo%2BCjMohaNbyREofToHIj2bndL5L0HaU9cOrYq4%3D&amp;reserved=0
> > > > > > > > >>> Which means for rxe you'd also have to pass the one for the underlying
> > > > > > > > >>> net device.
> > > > > > > > >> I thought of that way too. In that case, the memory region is constrained by the
> > > > > > > > >> net device, but rxe driver copies data using CPU. To avoid the constraints, I
> > > > > > > > >> decided to use the ib device.
> > > > > > > > > Well, that is the whole problem.
> > > > > > > > >
> > > > > > > > > We can't mix the dmabuf stuff people are doing that doesn't fill in
> > > > > > > > > the CPU pages in the SGL with RXE - it is simply impossible as things
> > > > > > > > > currently are for RXE to acess this non-struct page memory.
> > > > > > > >
> > > > > > > > Yeah, agree that doesn't make much sense.
> > > > > > > >
> > > > > > > > When you want to access the data with the CPU then why do you want to
> > > > > > > > use DMA-buf in the first place?
> > > > > > > >
> > > > > > > > Please keep in mind that there is work ongoing to replace the sg table
> > > > > > > > with an DMA address array and so make the underlying struct page
> > > > > > > > inaccessible for importers.
> > > > > > >
> > > > > > > Also if you do have a dma-buf, you can just dma_buf_vmap() the buffer
> > > > > > > for cpu access. Which intentionally does not require any device. No
> > > > > > > idea why there's a dma_buf_attach involved. Now not all exporters
> > > > > > > support this, but that's fixable, and you must call
> > > > > > > dma_buf_begin/end_cpu_access for cache management if the allocation
> > > > > > > isn't cpu coherent. But it's all there, no need to apply hacks of
> > > > > > > allowing a wrong device or other fun things.
> > > > > >
> > > > > > Can rxe leave the vmap in place potentially forever?
> > > > >
> > > > > Yeah, it's like perma-pinning the buffer into system memory for
> > > > > non-p2p dma-buf sharing. We just squint and pretend that can't be
> > > > > abused too badly :-) On 32bit you'll run out of vmap space rather
> > > > > quickly, but that's not something anyone cares about here either. We
> > > > > have a bunch of more sw modesetting drivers in drm which use
> > > > > dma_buf_vmap() like this, so it's all fine.
> > > > > -Daniel
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > > >
> > > > Thanks for your comments.
> > > >
> > > > In the first place, the CMA region cannot be used for RDMA because the
> > > > region has no struct page. In addition, some GPU drivers use CMA and share
> > > > the region as dma-buf. As a result, RDMA cannot transfer for the region. To
> > > > solve this problem, rxe dma-buf support is better I thought.
> > > >
> > > > I'll consider and redesign the rxe dma-buf support using the dma_buf_vmap()
> > > > instead of the dma_buf_dynamic_attach().
> > >
> > > btw for next version please cc dri-devel. get_maintainers.pl should
> > > pick it up for these patches.
> > A CC list of these patches is generated by get_maintainers.pl but it
> > didn't pick up the dri-devel. Should I add the dri-devel to the cc
> > manually?
>
> Hm yes, on rechecking the regex doesn't match since you're not
> touching any dma-buf code directly. Or not directly enough for
> get_maintainers.pl to pick it up.
>
> DMA BUFFER SHARING FRAMEWORK
> M: Sumit Semwal <[email protected]>
> M: Christian König <[email protected]>
> L: [email protected]
> L: [email protected]
> L: [email protected] (moderated for non-subscribers)
> S: Maintained
> T: git git://anongit.freedesktop.org/drm/drm-misc
> F: Documentation/driver-api/dma-buf.rst
> F: drivers/dma-buf/
> F: include/linux/*fence.h
> F: include/linux/dma-buf*
> F: include/linux/dma-resv.h
> K: \bdma_(?:buf|fence|resv)\b
>
> Above is the MAINTAINERS entry that's always good to cc for anything
> related to dma_buf/fence/resv and any of these related things.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Yes, the dma-buf was not directly included in my changes. However, this is
related to dma-buf. So I'll add the dma-buf related ML and members
to cc using
`./scripts/get_maintainer.pl -f drivers/infiniband/core/umem_dmabuf.c`.
I think it is enough to list the email addresses.

Thank you for letting me know that.

Regards,
Shunsuke,