2023-11-15 09:21:22

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 2023/11/14 21:16, Jason Gunthorpe wrote:
> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>
>> Actually because you put the 'strtuct page for devmem' in
>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>> using skb_frag_page() then call things like page_address(), kmap,
>> get_page, put_page, etc, etc, etc.
>
> Yikes, please no. If net has its own struct page look alike it has to
> stay entirely inside net. A non-mm owned struct page should not be
> passed into mm calls. It is just way too hacky to be seriously
> considered :(

Yes, that is something this patchset is trying to do, defining its own
struct page look alike for page pool to support devmem.

struct page for devmem will not be called into the mm subsystem, so most
of the mm calls is avoided by calling into the devmem memory provider'
ops instead of calling mm calls.

As far as I see for now, only page_ref_count(), page_is_pfmemalloc() and
PageTail() is called for devmem page, which should be easy to ensure that
those call for devmem page is consistent with the struct page owned by mm.
I am not sure if we can use some kind of compile/runtime checking to ensure
those kinds of consistency?

>
>>> I would expect net stack, page pool, driver still see the 'struct page',
>>> only memory provider see the specific struct for itself, for the above,
>>> devmem memory provider sees the 'struct page_pool_iov'.
>>>
>>> The reason I still expect driver to see the 'struct page' is that driver
>>> will still need to support normal memory besides devmem.
>
> I wouldn't say this approach is unreasonable, but it does have to be
> done carefully to isolate the mm. Keeping the struct page in the API
> is going to make this very hard.

I would expect that most of the isolation is done in page pool, as far as
I can see:

1. For control part: the driver may need to tell the page pool which memory
provider it want to use. Or the administrator specifies
which memory provider to use by some netlink-based cmd.

2. For data part: I am thinking that driver should only call page_pool_alloc(),
page_pool_free() and page_pool_get_dma_addr related function.

Of course the driver may need to be aware of that if it can call kmap() or
page_address() on the page returned from page_pool_alloc(), and maybe tell
net stack that those pages is not kmap()'able and page_address()'able.

>
> Jason
> .
>


2023-11-15 13:38:42

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On Wed, Nov 15, 2023 at 05:21:02PM +0800, Yunsheng Lin wrote:

> >>> I would expect net stack, page pool, driver still see the 'struct page',
> >>> only memory provider see the specific struct for itself, for the above,
> >>> devmem memory provider sees the 'struct page_pool_iov'.
> >>>
> >>> The reason I still expect driver to see the 'struct page' is that driver
> >>> will still need to support normal memory besides devmem.
> >
> > I wouldn't say this approach is unreasonable, but it does have to be
> > done carefully to isolate the mm. Keeping the struct page in the API
> > is going to make this very hard.
>
> I would expect that most of the isolation is done in page pool, as far as
> I can see:

It is the sort of thing that is important enough it should have
compiler help via types to prove that it is being done
properly. Otherwise it will be full of mistakes over time.

Jason

2023-11-15 17:45:43

by Mina Almasry

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On Wed, Nov 15, 2023 at 1:21 AM Yunsheng Lin <[email protected]> wrote:
>
> On 2023/11/14 21:16, Jason Gunthorpe wrote:
> > On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
> >
> >> Actually because you put the 'strtuct page for devmem' in
> >> skb->bv_frag, the net stack will grab the 'struct page' for devmem
> >> using skb_frag_page() then call things like page_address(), kmap,
> >> get_page, put_page, etc, etc, etc.
> >
> > Yikes, please no. If net has its own struct page look alike it has to
> > stay entirely inside net. A non-mm owned struct page should not be
> > passed into mm calls. It is just way too hacky to be seriously
> > considered :(
>
> Yes, that is something this patchset is trying to do, defining its own
> struct page look alike for page pool to support devmem.
>
> struct page for devmem will not be called into the mm subsystem, so most
> of the mm calls is avoided by calling into the devmem memory provider'
> ops instead of calling mm calls.
>
> As far as I see for now, only page_ref_count(), page_is_pfmemalloc() and
> PageTail() is called for devmem page, which should be easy to ensure that
> those call for devmem page is consistent with the struct page owned by mm.

I'm not sure this is true. These 3 calls are just the calls you're
aware of. In your proposal you're casting mirror pages into page* and
releasing them into the net stack. You need to scrub the entire net
stack for mm calls, i.e. all driver code and all skb_frag_page() call
sites. Of the top of my head, the driver is probably calling
page_address() and illegal_highdma() is calling PageHighMem(). TCP
zerocopy receive is calling vm_insert_pages().

> I am not sure if we can use some kind of compile/runtime checking to ensure
> those kinds of consistency?
>
> >
> >>> I would expect net stack, page pool, driver still see the 'struct page',
> >>> only memory provider see the specific struct for itself, for the above,
> >>> devmem memory provider sees the 'struct page_pool_iov'.
> >>>
> >>> The reason I still expect driver to see the 'struct page' is that driver
> >>> will still need to support normal memory besides devmem.
> >
> > I wouldn't say this approach is unreasonable, but it does have to be
> > done carefully to isolate the mm. Keeping the struct page in the API
> > is going to make this very hard.
>
> I would expect that most of the isolation is done in page pool, as far as
> I can see:
>
> 1. For control part: the driver may need to tell the page pool which memory
> provider it want to use. Or the administrator specifies
> which memory provider to use by some netlink-based cmd.
>
> 2. For data part: I am thinking that driver should only call page_pool_alloc(),
> page_pool_free() and page_pool_get_dma_addr related function.
>
> Of course the driver may need to be aware of that if it can call kmap() or
> page_address() on the page returned from page_pool_alloc(), and maybe tell
> net stack that those pages is not kmap()'able and page_address()'able.
>
> >
> > Jason
> > .
> >



--
Thanks,
Mina

2023-11-15 17:57:48

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 11/15/23 2:21 AM, Yunsheng Lin wrote:
> On 2023/11/14 21:16, Jason Gunthorpe wrote:
>> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>>
>>> Actually because you put the 'strtuct page for devmem' in
>>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>>> using skb_frag_page() then call things like page_address(), kmap,
>>> get_page, put_page, etc, etc, etc.
>>
>> Yikes, please no. If net has its own struct page look alike it has to
>> stay entirely inside net. A non-mm owned struct page should not be
>> passed into mm calls. It is just way too hacky to be seriously
>> considered :(
>
> Yes, that is something this patchset is trying to do, defining its own
> struct page look alike for page pool to support devmem.
>

Networking needs to be able to move away from struct page references.
The devmem and host memory for Rx use cases do not need to be page based.

2023-11-16 11:11:34

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 2023/11/15 21:38, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 05:21:02PM +0800, Yunsheng Lin wrote:
>
>>>>> I would expect net stack, page pool, driver still see the 'struct page',
>>>>> only memory provider see the specific struct for itself, for the above,
>>>>> devmem memory provider sees the 'struct page_pool_iov'.
>>>>>
>>>>> The reason I still expect driver to see the 'struct page' is that driver
>>>>> will still need to support normal memory besides devmem.
>>>
>>> I wouldn't say this approach is unreasonable, but it does have to be
>>> done carefully to isolate the mm. Keeping the struct page in the API
>>> is going to make this very hard.
>>
>> I would expect that most of the isolation is done in page pool, as far as
>> I can see:
>
> It is the sort of thing that is important enough it should have
> compiler help via types to prove that it is being done
> properly. Otherwise it will be full of mistakes over time.

Yes, agreed.

I have done something similar as willy has done for some of
folio conversion as below:

+#define PAGE_POOL_MATCH(pg, iov) \
+ static_assert(offsetof(struct page, pg) == \
+ offsetof(struct page_pool_iov, iov))
+PAGE_POOL_MATCH(flags, res0);
+PAGE_POOL_MATCH(pp_magic, pp_magic);
+PAGE_POOL_MATCH(pp, pp);
...

Not sure if we need to add new API for driver to use when the
driver need the devmem support yet.


>
> Jason
> .
>

2023-11-16 11:11:52

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 2023/11/16 1:44, Mina Almasry wrote:
> On Wed, Nov 15, 2023 at 1:21 AM Yunsheng Lin <[email protected]> wrote:
>>
>> On 2023/11/14 21:16, Jason Gunthorpe wrote:
>>> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>>>
>>>> Actually because you put the 'strtuct page for devmem' in
>>>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>>>> using skb_frag_page() then call things like page_address(), kmap,
>>>> get_page, put_page, etc, etc, etc.
>>>
>>> Yikes, please no. If net has its own struct page look alike it has to
>>> stay entirely inside net. A non-mm owned struct page should not be
>>> passed into mm calls. It is just way too hacky to be seriously
>>> considered :(
>>
>> Yes, that is something this patchset is trying to do, defining its own
>> struct page look alike for page pool to support devmem.
>>
>> struct page for devmem will not be called into the mm subsystem, so most
>> of the mm calls is avoided by calling into the devmem memory provider'
>> ops instead of calling mm calls.
>>
>> As far as I see for now, only page_ref_count(), page_is_pfmemalloc() and
>> PageTail() is called for devmem page, which should be easy to ensure that
>> those call for devmem page is consistent with the struct page owned by mm.
>
> I'm not sure this is true. These 3 calls are just the calls you're
> aware of. In your proposal you're casting mirror pages into page* and
> releasing them into the net stack. You need to scrub the entire net
> stack for mm calls, i.e. all driver code and all skb_frag_page() call
> sites. Of the top of my head, the driver is probably calling
> page_address() and illegal_highdma() is calling PageHighMem(). TCP
> zerocopy receive is calling vm_insert_pages().

For net stack part, I believe your patch below is handling to aovid those
mm calls? I don't include it in this patchset as I thought it is obvious
that whatever the proposal is, we always need those checking.
Maybe we should have included it to avoid this kind of confusion.
https://lore.kernel.org/all/[email protected]/

For driver part, I was thinking if the driver supports devmem, it should check
that if it can call page_address() related call on a specific 'stuct page', or
maybe we should introduce a new helper to make it obvious?

2023-11-16 11:13:06

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 2023/11/16 1:57, David Ahern wrote:
> On 11/15/23 2:21 AM, Yunsheng Lin wrote:
>> On 2023/11/14 21:16, Jason Gunthorpe wrote:
>>> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>>>
>>>> Actually because you put the 'strtuct page for devmem' in
>>>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>>>> using skb_frag_page() then call things like page_address(), kmap,
>>>> get_page, put_page, etc, etc, etc.
>>>
>>> Yikes, please no. If net has its own struct page look alike it has to
>>> stay entirely inside net. A non-mm owned struct page should not be
>>> passed into mm calls. It is just way too hacky to be seriously
>>> considered :(
>>
>> Yes, that is something this patchset is trying to do, defining its own
>> struct page look alike for page pool to support devmem.
>>
>
> Networking needs to be able to move away from struct page references.
> The devmem and host memory for Rx use cases do not need to be page based.

Yes, I am agreed the ultimate goal is to move away from struct page
references. But I am not sure if we can do it right away as there
still are different types of existing 'struct page' in the netstack,
see:

https://lore.kernel.org/all/[email protected]/

>
>
> .
>

2023-11-16 15:33:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On Thu, Nov 16, 2023 at 07:10:01PM +0800, Yunsheng Lin wrote:
> On 2023/11/15 21:38, Jason Gunthorpe wrote:
> > On Wed, Nov 15, 2023 at 05:21:02PM +0800, Yunsheng Lin wrote:
> >
> >>>>> I would expect net stack, page pool, driver still see the 'struct page',
> >>>>> only memory provider see the specific struct for itself, for the above,
> >>>>> devmem memory provider sees the 'struct page_pool_iov'.
> >>>>>
> >>>>> The reason I still expect driver to see the 'struct page' is that driver
> >>>>> will still need to support normal memory besides devmem.
> >>>
> >>> I wouldn't say this approach is unreasonable, but it does have to be
> >>> done carefully to isolate the mm. Keeping the struct page in the API
> >>> is going to make this very hard.
> >>
> >> I would expect that most of the isolation is done in page pool, as far as
> >> I can see:
> >
> > It is the sort of thing that is important enough it should have
> > compiler help via types to prove that it is being done
> > properly. Otherwise it will be full of mistakes over time.
>
> Yes, agreed.
>
> I have done something similar as willy has done for some of
> folio conversion as below:

That is not at all what I mean, I mean you should not use
struct page * types at all in code that flows from the _iov version
except via limited accessors that can be audited and have appropriate
assertions.

Just releasing struct page * that is not a struct page * everywhere
without type safety will never be correct long term.

Jason

2023-11-16 15:58:39

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 11/16/23 4:12 AM, Yunsheng Lin wrote:
> On 2023/11/16 1:57, David Ahern wrote:
>> On 11/15/23 2:21 AM, Yunsheng Lin wrote:
>>> On 2023/11/14 21:16, Jason Gunthorpe wrote:
>>>> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>>>>
>>>>> Actually because you put the 'strtuct page for devmem' in
>>>>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>>>>> using skb_frag_page() then call things like page_address(), kmap,
>>>>> get_page, put_page, etc, etc, etc.
>>>>
>>>> Yikes, please no. If net has its own struct page look alike it has to
>>>> stay entirely inside net. A non-mm owned struct page should not be
>>>> passed into mm calls. It is just way too hacky to be seriously
>>>> considered :(
>>>
>>> Yes, that is something this patchset is trying to do, defining its own
>>> struct page look alike for page pool to support devmem.
>>>
>>
>> Networking needs to be able to move away from struct page references.
>> The devmem and host memory for Rx use cases do not need to be page based.
>
> Yes, I am agreed the ultimate goal is to move away from struct page
> references. But I am not sure if we can do it right away as there
> still are different types of existing 'struct page' in the netstack,
> see:
>
> https://lore.kernel.org/all/[email protected]/

yes, that is the point of a blended approach -- pages and buffers (or
iov) -- leveraging the LSB of the address. That proposal is the right
direction to be moving for co-existence. Adding fake struct page
instances is the wrong direction.

2023-11-17 11:28:09

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH RFC 3/8] memory-provider: dmabuf devmem memory provider

On 2023/11/16 23:58, David Ahern wrote:
> On 11/16/23 4:12 AM, Yunsheng Lin wrote:
>> On 2023/11/16 1:57, David Ahern wrote:
>>> On 11/15/23 2:21 AM, Yunsheng Lin wrote:
>>>> On 2023/11/14 21:16, Jason Gunthorpe wrote:
>>>>> On Tue, Nov 14, 2023 at 04:21:26AM -0800, Mina Almasry wrote:
>>>>>
>>>>>> Actually because you put the 'strtuct page for devmem' in
>>>>>> skb->bv_frag, the net stack will grab the 'struct page' for devmem
>>>>>> using skb_frag_page() then call things like page_address(), kmap,
>>>>>> get_page, put_page, etc, etc, etc.
>>>>>
>>>>> Yikes, please no. If net has its own struct page look alike it has to
>>>>> stay entirely inside net. A non-mm owned struct page should not be
>>>>> passed into mm calls. It is just way too hacky to be seriously
>>>>> considered :(
>>>>
>>>> Yes, that is something this patchset is trying to do, defining its own
>>>> struct page look alike for page pool to support devmem.
>>>>
>>>
>>> Networking needs to be able to move away from struct page references.
>>> The devmem and host memory for Rx use cases do not need to be page based.
>>
>> Yes, I am agreed the ultimate goal is to move away from struct page
>> references. But I am not sure if we can do it right away as there
>> still are different types of existing 'struct page' in the netstack,
>> see:
>>
>> https://lore.kernel.org/all/[email protected]/
>
> yes, that is the point of a blended approach -- pages and buffers (or
> iov) -- leveraging the LSB of the address. That proposal is the right

I am not sure leveraging the LSB of the address is necessary yet, as it
does not seems to provide the type check protection, it seems to just
provide a way to demux between pages(including page pool owned page and
non-page pool owned page) and page pool owned buffer.
That info is avaliable through the page->pp_magic and page->pp->mp_*
too if we mirror the page pool specific union in 'struct page'.

> direction to be moving for co-existence. Adding fake struct page
> instances is the wrong direction.

Perhaps a fake struct page with type check protection is the right
direction?

Intergrating devmem to page pool without a unified metadata between
pages and buffers or without a proper abstract layer does not seems
like the good direction either.

> .
>