Subject: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

FOLL_LONGTERM will avoid share memory alloc from CMA region,
which may be used in secure playback case.
if part of CMA region taken by share memory for long term usage,
CMA will failed to get whole buffer back.

Signed-off-by: Xiaoming Ding <[email protected]>
---
drivers/tee/tee_shm.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index 673cf0359494..ddd3947e2229 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -223,6 +223,7 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
size_t num_pages;
void *ret;
int rc;
+ u32 page_flag = FOLL_WRITE;

if (!tee_device_get(teedev))
return ERR_PTR(-EINVAL);
@@ -255,9 +256,11 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
ret = ERR_PTR(-ENOMEM);
goto err_free_shm;
}
-
+#if IS_ENABLED(CONFIG_CMA)
+ page_flag |= FOLL_LONGTERM;
+#endif
if (flags & TEE_SHM_USER_MAPPED)
- rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE,
+ rc = pin_user_pages_fast(start, num_pages, page_flag,
shm->pages);
else
rc = shm_get_kernel_pages(start, num_pages, shm->pages);
--
2.18.0



2023-05-17 07:47:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

> + u32 page_flag = FOLL_WRITE;
>
> if (!tee_device_get(teedev))
> return ERR_PTR(-EINVAL);
> @@ -255,9 +256,11 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
> ret = ERR_PTR(-ENOMEM);
> goto err_free_shm;
> }
> -
> +#if IS_ENABLED(CONFIG_CMA)
> + page_flag |= FOLL_LONGTERM;
> +#endif
> if (flags & TEE_SHM_USER_MAPPED)

If this mapping is long live it should always use FOLL_LONGTERM.

The ifdef does not make sense.

2023-05-17 08:14:22

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, 17 May 2023 at 13:04, Christoph Hellwig <[email protected]> wrote:
>
> > + u32 page_flag = FOLL_WRITE;
> >
> > if (!tee_device_get(teedev))
> > return ERR_PTR(-EINVAL);
> > @@ -255,9 +256,11 @@ register_shm_helper(struct tee_context *ctx, unsigned long addr,
> > ret = ERR_PTR(-ENOMEM);
> > goto err_free_shm;
> > }
> > -
> > +#if IS_ENABLED(CONFIG_CMA)
> > + page_flag |= FOLL_LONGTERM;
> > +#endif
> > if (flags & TEE_SHM_USER_MAPPED)
>
> If this mapping is long live it should always use FOLL_LONGTERM.

It depends on the userspace application needs. However, I think it
should be safe to use FOLL_LONGTERM by default to serve cases like
secure media playback.

>
> The ifdef does not make sense.

Agree.

-Sumit

2023-05-17 08:35:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, May 17, 2023 at 01:22:51PM +0530, Sumit Garg wrote:
> > > +#if IS_ENABLED(CONFIG_CMA)
> > > + page_flag |= FOLL_LONGTERM;
> > > +#endif
> > > if (flags & TEE_SHM_USER_MAPPED)
> >
> > If this mapping is long live it should always use FOLL_LONGTERM.
>
> It depends on the userspace application needs. However, I think it
> should be safe to use FOLL_LONGTERM by default to serve cases like
> secure media playback.

long term is defined as won't automatically go away during the same
syscall.

Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, 2023-05-17 at 01:08 -0700, Christoph Hellwig wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On Wed, May 17, 2023 at 01:22:51PM +0530, Sumit Garg wrote:
> > > > +#if IS_ENABLED(CONFIG_CMA)
> > > > + page_flag |= FOLL_LONGTERM;
> > > > +#endif
> > > > if (flags & TEE_SHM_USER_MAPPED)
> > >
> > > If this mapping is long live it should always use FOLL_LONGTERM.
> >
> > It depends on the userspace application needs. However, I think it
> > should be safe to use FOLL_LONGTERM by default to serve cases like
> > secure media playback.
>
> long term is defined as won't automatically go away during the same
> syscall.

thanks for the suggestion.
I will update the patch with using FOLL_LONGTERM by default.


2023-05-17 09:31:07

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, 17 May 2023 at 13:38, Christoph Hellwig <[email protected]> wrote:
>
> On Wed, May 17, 2023 at 01:22:51PM +0530, Sumit Garg wrote:
> > > > +#if IS_ENABLED(CONFIG_CMA)
> > > > + page_flag |= FOLL_LONGTERM;
> > > > +#endif
> > > > if (flags & TEE_SHM_USER_MAPPED)
> > >
> > > If this mapping is long live it should always use FOLL_LONGTERM.
> >
> > It depends on the userspace application needs. However, I think it
> > should be safe to use FOLL_LONGTERM by default to serve cases like
> > secure media playback.
>
> long term is defined as won't automatically go away during the same
> syscall.

Do you mean a pinned user-space page can be paged out automatically?
The documentation [1] isn't very helpful here either since it talks
about "short term" vs "long term" in abstract terms.

Just FYI, the underlying use-case for TEE registered shared memory is
that the references to pinned pages are provided to TEE implementation
to operate upon. This can happen over multiple syscalls and we want
the pinned pages to be always in RAM as otherwise the physical
addresses may change if they are paged out in between. If this is only
supported reliably with a long term flag then this patch should be
tagged as a fix and requires stable backports.

[1] Documentation/core-api/pin_user_pages.rst

-Sumit

2023-05-17 09:43:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, May 17, 2023 at 02:56:13PM +0530, Sumit Garg wrote:
> Do you mean a pinned user-space page can be paged out automatically?

No, pinned pages can't be paged out.

But a short term pin implies it will be release after a short delay,
and it is feasible for wait for the pin to go away.

For a long term pin waiting is not an option, and anyone wanting to
do something with the pinned page that requires it to not be pinned
must simply give up.

> Just FYI, the underlying use-case for TEE registered shared memory is
> that the references to pinned pages are provided to TEE implementation
> to operate upon. This can happen over multiple syscalls and we want
> the pinned pages to be always in RAM as otherwise the physical
> addresses may change if they are paged out in between.

That's a very use clear case for a long term pin.

2023-05-17 10:35:52

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, 17 May 2023 at 15:06, Christoph Hellwig <[email protected]> wrote:
>
> On Wed, May 17, 2023 at 02:56:13PM +0530, Sumit Garg wrote:
> > Do you mean a pinned user-space page can be paged out automatically?
>
> No, pinned pages can't be paged out.
>
> But a short term pin implies it will be release after a short delay,
> and it is feasible for wait for the pin to go away.

Okay, I see. I would be interested to know the ranges for that short
delay. I guess it may depend on how much memory pressure there is...

>
> For a long term pin waiting is not an option, and anyone wanting to
> do something with the pinned page that requires it to not be pinned
> must simply give up.
>
> > Just FYI, the underlying use-case for TEE registered shared memory is
> > that the references to pinned pages are provided to TEE implementation
> > to operate upon. This can happen over multiple syscalls and we want
> > the pinned pages to be always in RAM as otherwise the physical
> > addresses may change if they are paged out in between.
>
> That's a very use clear case for a long term pin.

...however, thanks for the insights.

@Xiaoming,

Please use the following fixes tag for the v2 along with extending the
commit description regarding the reliability provided by the long term
flag.

Fixes: 033ddf12bcf5 ("tee: add register user memory")

-Sumit

2023-05-17 18:38:14

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On 17.05.23 12:19, Sumit Garg wrote:
> On Wed, 17 May 2023 at 15:06, Christoph Hellwig <[email protected]> wrote:
>>
>> On Wed, May 17, 2023 at 02:56:13PM +0530, Sumit Garg wrote:
>>> Do you mean a pinned user-space page can be paged out automatically?
>>
>> No, pinned pages can't be paged out.
>>
>> But a short term pin implies it will be release after a short delay,
>> and it is feasible for wait for the pin to go away.
>
> Okay, I see. I would be interested to know the ranges for that short
> delay. I guess it may depend on how much memory pressure there is...
>

In general: if user space controls it -> possibly forever -> long-term.
Even if in most cases it's a short delay: there is no trusting on user
space.

For example, iouring fixed buffers keep pages pinned until user space
decides to unregistered the buffers -> long-term.

Short-term is, for example, something like O_DIRECT where we pin -> DMA
-> unpin in essentially one operation.

--
Thanks,

David / dhildenb


2023-05-18 04:24:48

by Christoph Hellwig

[permalink] [raw]
Subject: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand wrote:
> In general: if user space controls it -> possibly forever -> long-term. Even
> if in most cases it's a short delay: there is no trusting on user space.
>
> For example, iouring fixed buffers keep pages pinned until user space
> decides to unregistered the buffers -> long-term.
>
> Short-term is, for example, something like O_DIRECT where we pin -> DMA ->
> unpin in essentially one operation.

Btw, one thing that's been on my mind is that I think we got the
polarity on FOLL_LONGTERM wrong. Instead of opting into the long term
behavior it really should be the default, with a FOLL_EPHEMERAL flag
to opt out of it. And every users of this flag is required to have
a comment explaining the life time rules for the pin..

2023-05-18 06:15:06

by Sumit Garg

[permalink] [raw]
Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Thu, 18 May 2023 at 09:51, Christoph Hellwig <[email protected]> wrote:
>
> On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand wrote:
> > In general: if user space controls it -> possibly forever -> long-term. Even
> > if in most cases it's a short delay: there is no trusting on user space.
> >
> > For example, iouring fixed buffers keep pages pinned until user space
> > decides to unregistered the buffers -> long-term.
> >
> > Short-term is, for example, something like O_DIRECT where we pin -> DMA ->
> > unpin in essentially one operation.
>
> Btw, one thing that's been on my mind is that I think we got the
> polarity on FOLL_LONGTERM wrong. Instead of opting into the long term
> behavior it really should be the default, with a FOLL_EPHEMERAL flag
> to opt out of it. And every users of this flag is required to have
> a comment explaining the life time rules for the pin..

It does look like a better approach to me given the very nature of
user space pages.

-Sumit

Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

From 35fd062d5cbc4d182eee0183843cd6350d126788 Mon Sep 17 00:00:00 2001
From: Xiaoming Ding <[email protected]>
Date: Wed, 10 May 2023 10:15:23 +0800
Subject: [PATCH v2] tee: add FOLL_LONGTERM for CMA case when alloc shm

CMA is widely used on insufficient memory platform for
secure media playback case, and FOLL_LONGTERM will
avoid tee_shm alloc pages from CMA region.
without FOLL_LONGTERM, CMA region may alloc failed since
tee_shm has a chance to use it in advance.

modify is verified on OPTEE XTEST and kinds of secure + clear playback


Fixes: 033ddf12bcf5 ("tee: add register user memory")
Signed-off-by: Xiaoming Ding <[email protected]>
---
v1 -> v2: take off the ifdef and apply FOLL_LONGTERM by default

drivers/tee/tee_shm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
index 673cf0359494..38878e549ca4 100644
--- a/drivers/tee/tee_shm.c
+++ b/drivers/tee/tee_shm.c
@@ -257,7 +257,7 @@ register_shm_helper(struct tee_context *ctx,
unsigned long addr,
}

if (flags & TEE_SHM_USER_MAPPED)
- rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE,
+ rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE |
FOLL_LONGTERM,
shm->pages);
else
rc = shm_get_kernel_pages(start, num_pages, shm-
>pages);
--
2.18.0

On Wed, 2023-05-17 at 15:49 +0530, Sumit Garg wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On Wed, 17 May 2023 at 15:06, Christoph Hellwig <[email protected]>
> wrote:
> >
> > On Wed, May 17, 2023 at 02:56:13PM +0530, Sumit Garg wrote:
> > > Do you mean a pinned user-space page can be paged out
> > > automatically?
> >
> > No, pinned pages can't be paged out.
> >
> > But a short term pin implies it will be release after a short
> > delay,
> > and it is feasible for wait for the pin to go away.
>
> Okay, I see. I would be interested to know the ranges for that short
> delay. I guess it may depend on how much memory pressure there is...
>
> >
> > For a long term pin waiting is not an option, and anyone wanting to
> > do something with the pinned page that requires it to not be pinned
> > must simply give up.
> >
> > > Just FYI, the underlying use-case for TEE registered shared
> > > memory is
> > > that the references to pinned pages are provided to TEE
> > > implementation
> > > to operate upon. This can happen over multiple syscalls and we
> > > want
> > > the pinned pages to be always in RAM as otherwise the physical
> > > addresses may change if they are paged out in between.
> >
> > That's a very use clear case for a long term pin.
>
> ...however, thanks for the insights.
>
> @Xiaoming,
>
> Please use the following fixes tag for the v2 along with extending
> the
> commit description regarding the reliability provided by the long
> term
> flag.
>
> Fixes: 033ddf12bcf5 ("tee: add register user memory")
>
> -Sumit

2023-05-18 14:13:02

by David Hildenbrand

[permalink] [raw]
Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On 18.05.23 08:08, Sumit Garg wrote:
> On Thu, 18 May 2023 at 09:51, Christoph Hellwig <[email protected]> wrote:
>>
>> On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand wrote:
>>> In general: if user space controls it -> possibly forever -> long-term. Even
>>> if in most cases it's a short delay: there is no trusting on user space.
>>>
>>> For example, iouring fixed buffers keep pages pinned until user space
>>> decides to unregistered the buffers -> long-term.
>>>
>>> Short-term is, for example, something like O_DIRECT where we pin -> DMA ->
>>> unpin in essentially one operation.
>>
>> Btw, one thing that's been on my mind is that I think we got the
>> polarity on FOLL_LONGTERM wrong. Instead of opting into the long term
>> behavior it really should be the default, with a FOLL_EPHEMERAL flag
>> to opt out of it. And every users of this flag is required to have
>> a comment explaining the life time rules for the pin..
>
> It does look like a better approach to me given the very nature of
> user space pages.

Yeah, there is a lot of historical baggage. For example, FOLL_GET should
be inaccessible to kernel modules completely at one point, to be only
used by selected core-mm pieces.

Maybe we should even disallow passing in FOLL_LONGTERM as a flag and
only provide functions like pin_user_pages() vs.
pin_user_pages_longterm(). Then, discussions about conditional
flag-setting are no more :)

... or even use pin_user_pages_shortterm() vs. pin_user_pages() ... to
make the default be longterm.

--
Thanks,

David / dhildenb


2023-05-19 10:17:48

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On 18.05.23 08:40, Xiaoming Ding (丁晓明) wrote:
> From 35fd062d5cbc4d182eee0183843cd6350d126788 Mon Sep 17 00:00:00 2001
> From: Xiaoming Ding <[email protected]>
> Date: Wed, 10 May 2023 10:15:23 +0800
> Subject: [PATCH v2] tee: add FOLL_LONGTERM for CMA case when alloc shm
>
> CMA is widely used on insufficient memory platform for
> secure media playback case, and FOLL_LONGTERM will
> avoid tee_shm alloc pages from CMA region.
> without FOLL_LONGTERM, CMA region may alloc failed since
> tee_shm has a chance to use it in advance.
>
> modify is verified on OPTEE XTEST and kinds of secure + clear playback
>
>
> Fixes: 033ddf12bcf5 ("tee: add register user memory")
> Signed-off-by: Xiaoming Ding <[email protected]>
> ---
> v1 -> v2: take off the ifdef and apply FOLL_LONGTERM by default
>
> drivers/tee/tee_shm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> index 673cf0359494..38878e549ca4 100644
> --- a/drivers/tee/tee_shm.c
> +++ b/drivers/tee/tee_shm.c
> @@ -257,7 +257,7 @@ register_shm_helper(struct tee_context *ctx,
> unsigned long addr,
> }
>
> if (flags & TEE_SHM_USER_MAPPED)
> - rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE,
> + rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE |
> FOLL_LONGTERM,
> shm->pages);
> else
> rc = shm_get_kernel_pages(start, num_pages, shm-
>> pages);

I didn't dive deeply into that code, but I can spot that we can end up
long-term pinning multiple pages -- possibly unbound or is there any
sane limit on the number of pages?

Take a look at io_uring/rsrc.c and how we account long-term pinned pages
against user->locked_vm/ctx->mm_account->pinned_vm in io_account_mem().

If user space could only end up pinning one or two pages via that
interface, ok. But it looks like this interface could be abused to
create real real trouble by unprivileged users that should be able to
long-term pin that many pages.

Am I missing something important (i.e., interface is only accessible by
privileged users) or should there be proper accounting and
RLIMIT_MEMLOCK checks?

--
Thanks,

David / dhildenb


2023-05-19 11:18:28

by Sumit Garg

[permalink] [raw]
Subject: Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Fri, 19 May 2023 at 15:31, David Hildenbrand <[email protected]> wrote:
>
> On 18.05.23 08:40, Xiaoming Ding (丁晓明) wrote:
> > From 35fd062d5cbc4d182eee0183843cd6350d126788 Mon Sep 17 00:00:00 2001
> > From: Xiaoming Ding <[email protected]>
> > Date: Wed, 10 May 2023 10:15:23 +0800
> > Subject: [PATCH v2] tee: add FOLL_LONGTERM for CMA case when alloc shm
> >
> > CMA is widely used on insufficient memory platform for
> > secure media playback case, and FOLL_LONGTERM will
> > avoid tee_shm alloc pages from CMA region.
> > without FOLL_LONGTERM, CMA region may alloc failed since
> > tee_shm has a chance to use it in advance.
> >
> > modify is verified on OPTEE XTEST and kinds of secure + clear playback
> >
> >
> > Fixes: 033ddf12bcf5 ("tee: add register user memory")
> > Signed-off-by: Xiaoming Ding <[email protected]>
> > ---
> > v1 -> v2: take off the ifdef and apply FOLL_LONGTERM by default
> >
> > drivers/tee/tee_shm.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/tee/tee_shm.c b/drivers/tee/tee_shm.c
> > index 673cf0359494..38878e549ca4 100644
> > --- a/drivers/tee/tee_shm.c
> > +++ b/drivers/tee/tee_shm.c
> > @@ -257,7 +257,7 @@ register_shm_helper(struct tee_context *ctx,
> > unsigned long addr,
> > }
> >
> > if (flags & TEE_SHM_USER_MAPPED)
> > - rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE,
> > + rc = pin_user_pages_fast(start, num_pages, FOLL_WRITE |
> > FOLL_LONGTERM,
> > shm->pages);
> > else
> > rc = shm_get_kernel_pages(start, num_pages, shm-
> >> pages);
>
> I didn't dive deeply into that code, but I can spot that we can end up
> long-term pinning multiple pages -- possibly unbound or is there any
> sane limit on the number of pages?

I am not aware of any limit that we put on pinning user-space pages.

>
> Take a look at io_uring/rsrc.c and how we account long-term pinned pages
> against user->locked_vm/ctx->mm_account->pinned_vm in io_account_mem().
>
> If user space could only end up pinning one or two pages via that
> interface, ok. But it looks like this interface could be abused to
> create real real trouble by unprivileged users that should be able to
> long-term pin that many pages.
>
> Am I missing something important (i.e., interface is only accessible by
> privileged users) or should there be proper accounting and
> RLIMIT_MEMLOCK checks?

So your observation is correct. With long term pinning we have to
implement similar RLIMIT_MEMLOCK checks. Thanks for your insights
here.

-Sumit

>
> --
> Thanks,
>
> David / dhildenb
>

2023-05-23 02:12:49

by John Hubbard

[permalink] [raw]
Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On 5/18/23 06:56, David Hildenbrand wrote:
> On 18.05.23 08:08, Sumit Garg wrote:
>> On Thu, 18 May 2023 at 09:51, Christoph Hellwig <[email protected]> wrote:
>>>
>>> On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand wrote:
>>>> In general: if user space controls it -> possibly forever -> long-term. Even
>>>> if in most cases it's a short delay: there is no trusting on user space.
>>>>
>>>> For example, iouring fixed buffers keep pages pinned until user space
>>>> decides to unregistered the buffers -> long-term.
>>>>
>>>> Short-term is, for example, something like O_DIRECT where we pin -> DMA ->
>>>> unpin in essentially one operation.
>>>
>>> Btw, one thing that's been on my mind is that I think we got the
>>> polarity on FOLL_LONGTERM wrong.  Instead of opting into the long term
>>> behavior it really should be the default, with a FOLL_EPHEMERAL flag
>>> to opt out of it.  And every users of this flag is required to have
>>> a comment explaining the life time rules for the pin..

I see maybe 10 or 20 call sites today. So it is definitely feasible to add
documentation at each, explaining the why it wants a long term pin.

>>
>> It does look like a better approach to me given the very nature of
>> user space pages.
>
> Yeah, there is a lot of historical baggage. For example, FOLL_GET should be inaccessible to kernel modules completely at one point, to be only used by selected core-mm pieces.

Yes. When I first mass-converted call sites from gup to pup, I just
preserved FOLL_GET behavior in order to keep from changing too much at
once. But I agree that that it would be nice to make FOLL_GET an
mm internal-only flag like FOLL_PIN.

>
> Maybe we should even disallow passing in FOLL_LONGTERM as a flag and only provide functions like pin_user_pages() vs. pin_user_pages_longterm(). Then, discussions about conditional flag-setting are no more :)
>
> ... or even use pin_user_pages_shortterm() vs. pin_user_pages() ... to make the default be longterm.
>

Yes, it is true that having most gup flags be internal to mm does tend
to avoid some bugs. But it's also a lot of churn. I'm still on the fence
as to whether it's really a good move to do this for FOLL_LONGTERM or
not. But it's really easy to push me off of fences. :)

thanks,
--
John Hubbard
NVIDIA


2023-05-23 07:34:27

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Mon, May 22, 2023 at 06:54:29PM -0700, John Hubbard wrote:
> On 5/18/23 06:56, David Hildenbrand wrote:
> > On 18.05.23 08:08, Sumit Garg wrote:
> > > On Thu, 18 May 2023 at 09:51, Christoph Hellwig <[email protected]> wrote:
> > > >
> > > > On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand wrote:
> > > > > In general: if user space controls it -> possibly forever -> long-term. Even
> > > > > if in most cases it's a short delay: there is no trusting on user space.
> > > > >
> > > > > For example, iouring fixed buffers keep pages pinned until user space
> > > > > decides to unregistered the buffers -> long-term.
> > > > >
> > > > > Short-term is, for example, something like O_DIRECT where we pin -> DMA ->
> > > > > unpin in essentially one operation.
> > > >
> > > > Btw, one thing that's been on my mind is that I think we got the
> > > > polarity on FOLL_LONGTERM wrong.? Instead of opting into the long term
> > > > behavior it really should be the default, with a FOLL_EPHEMERAL flag
> > > > to opt out of it.? And every users of this flag is required to have
> > > > a comment explaining the life time rules for the pin..

I couldn't agree more, based on my recent forays into GUP the interface
continues to strike me as odd:-

- FOLL_GET is a wing and a prayer that nothing that
[folio|page]_maybe_dma_pinned() prevents happens in the brief period the
page is pinned/manipulated. So agree completely with David's concept of
unexporting that and perhaps carefully considering our use of
it. Obviously the comments around functions like gup_remote() make clear
that 'this page not be what you think it is' but I wonder whether many
callers of GUP _truly_ take that on board.

- FOLL_LONGTERM is entirely optional for PUP and you can just go ahead and
fragment page blocks to your heart's content. Of course this would be an
abuse, but abuses happen.

- With the recent change to PUP/FOLL_LONGTERM disallowing dirty tracked
file-backed mappings we're now really relying on this flag indicating a
_long term_ pin semantically. By defaulting to this being switched on, we
avoid cases of callers who might end up treating the won't
reclaim/etc. aspect of PUP as all they care about while ignoring the
MIGRATE_MOVABLE aspect.

>
> I see maybe 10 or 20 call sites today. So it is definitely feasible to add
> documentation at each, explaining the why it wants a long term pin.
>

Yeah, my efforts at e.g. dropping vmas has been eye-opening in actually
quite how often a refactoring like this often ends up being more
straightforward than you might imagine.

> > >
> > > It does look like a better approach to me given the very nature of
> > > user space pages.
> >
> > Yeah, there is a lot of historical baggage. For example, FOLL_GET should be inaccessible to kernel modules completely at one point, to be only used by selected core-mm pieces.
>
> Yes. When I first mass-converted call sites from gup to pup, I just
> preserved FOLL_GET behavior in order to keep from changing too much at
> once. But I agree that that it would be nice to make FOLL_GET an
> mm internal-only flag like FOLL_PIN.

Very glad you did that work! And totally understandable as to you being
conservative with that, but I think we're at a point where there's more
acceptance of incremental improvements to GUP as a whole.

I have another patch series saved up for _yet more_ changes on this. But
mindful of churn I am trying to space them out... until Jason nudges me of
course :)

>
> >
> > Maybe we should even disallow passing in FOLL_LONGTERM as a flag and only provide functions like pin_user_pages() vs. pin_user_pages_longterm(). Then, discussions about conditional flag-setting are no more :)
> >
> > ... or even use pin_user_pages_shortterm() vs. pin_user_pages() ... to make the default be longterm.
> >
>
> Yes, it is true that having most gup flags be internal to mm does tend
> to avoid some bugs. But it's also a lot of churn. I'm still on the fence
> as to whether it's really a good move to do this for FOLL_LONGTERM or
> not. But it's really easy to push me off of fences. :)

*nudge* ;)

>
> thanks,
> --
> John Hubbard
> NVIDIA
>

Looking at non-fast, non-FOLL_LONGTERM PUP callers (forgive me if I missed any):-

- pin_user_pages_remote() in process_vm_rw_single_vec() for the
process_vm_access functionality.

- pin_user_pages_remote() in user_event_enabler_write() in
kernel/trace/trace_events_user.c.

- pin_user_pages_unlocked() in ivtv_udma_setup() in
drivers/media/pci/ivtv/ivtv-udma.c and ivtv_yuv_prep_user_dma() in ivtv-yuv.c.

And none that actually directly invoke PUP without FOLL_LOGNTERM... That
suggests that we could simply disallow non-FOLL_LONGTERM non-fast PUP calls
altogether and move to pin_user_pages_longterm() [I'm happy to write a
patch series doing this].

The ivtv callers look like they really actually want FOLL_LONGTERM unless
I'm missing something so we should probably change that too?

I haven't surveyed the fast versions, but I think defaulting to
FOLL_LONGTERM on them also makes sense.

Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

So do we have a conclution about this patch? or need more time to
study the possible risks

On Tue, 2023-05-23 at 08:25 +0100, Lorenzo Stoakes wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> On Mon, May 22, 2023 at 06:54:29PM -0700, John Hubbard wrote:
> > On 5/18/23 06:56, David Hildenbrand wrote:
> > > On 18.05.23 08:08, Sumit Garg wrote:
> > > > On Thu, 18 May 2023 at 09:51, Christoph Hellwig <
> > > > [email protected]> wrote:
> > > > >
> > > > > On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand
> > > > > wrote:
> > > > > > In general: if user space controls it -> possibly forever
> > > > > > -> long-term. Even
> > > > > > if in most cases it's a short delay: there is no trusting
> > > > > > on user space.
> > > > > >
> > > > > > For example, iouring fixed buffers keep pages pinned until
> > > > > > user space
> > > > > > decides to unregistered the buffers -> long-term.
> > > > > >
> > > > > > Short-term is, for example, something like O_DIRECT where
> > > > > > we pin -> DMA ->
> > > > > > unpin in essentially one operation.
> > > > >
> > > > > Btw, one thing that's been on my mind is that I think we got
> > > > > the
> > > > > polarity on FOLL_LONGTERM wrong. Instead of opting into the
> > > > > long term
> > > > > behavior it really should be the default, with a
> > > > > FOLL_EPHEMERAL flag
> > > > > to opt out of it. And every users of this flag is required
> > > > > to have
> > > > > a comment explaining the life time rules for the pin..
>
> I couldn't agree more, based on my recent forays into GUP the
> interface
> continues to strike me as odd:-
>
> - FOLL_GET is a wing and a prayer that nothing that
> [folio|page]_maybe_dma_pinned() prevents happens in the brief
> period the
> page is pinned/manipulated. So agree completely with David's
> concept of
> unexporting that and perhaps carefully considering our use of
> it. Obviously the comments around functions like gup_remote() make
> clear
> that 'this page not be what you think it is' but I wonder whether
> many
> callers of GUP _truly_ take that on board.
>
> - FOLL_LONGTERM is entirely optional for PUP and you can just go
> ahead and
> fragment page blocks to your heart's content. Of course this would
> be an
> abuse, but abuses happen.
>
> - With the recent change to PUP/FOLL_LONGTERM disallowing dirty
> tracked
> file-backed mappings we're now really relying on this flag
> indicating a
> _long term_ pin semantically. By defaulting to this being switched
> on, we
> avoid cases of callers who might end up treating the won't
> reclaim/etc. aspect of PUP as all they care about while ignoring
> the
> MIGRATE_MOVABLE aspect.
>
> >
> > I see maybe 10 or 20 call sites today. So it is definitely feasible
> > to add
> > documentation at each, explaining the why it wants a long term pin.
> >
>
> Yeah, my efforts at e.g. dropping vmas has been eye-opening in
> actually
> quite how often a refactoring like this often ends up being more
> straightforward than you might imagine.
>
> > > >
> > > > It does look like a better approach to me given the very nature
> > > > of
> > > > user space pages.
> > >
> > > Yeah, there is a lot of historical baggage. For example, FOLL_GET
> > > should be inaccessible to kernel modules completely at one point,
> > > to be only used by selected core-mm pieces.
> >
> > Yes. When I first mass-converted call sites from gup to pup, I just
> > preserved FOLL_GET behavior in order to keep from changing too much
> > at
> > once. But I agree that that it would be nice to make FOLL_GET an
> > mm internal-only flag like FOLL_PIN.
>
> Very glad you did that work! And totally understandable as to you
> being
> conservative with that, but I think we're at a point where there's
> more
> acceptance of incremental improvements to GUP as a whole.
>
> I have another patch series saved up for _yet more_ changes on this.
> But
> mindful of churn I am trying to space them out... until Jason nudges
> me of
> course :)
>
> >
> > >
> > > Maybe we should even disallow passing in FOLL_LONGTERM as a flag
> > > and only provide functions like pin_user_pages() vs.
> > > pin_user_pages_longterm(). Then, discussions about conditional
> > > flag-setting are no more :)
> > >
> > > ... or even use pin_user_pages_shortterm() vs. pin_user_pages()
> > > ... to make the default be longterm.
> > >
> >
> > Yes, it is true that having most gup flags be internal to mm does
> > tend
> > to avoid some bugs. But it's also a lot of churn. I'm still on the
> > fence
> > as to whether it's really a good move to do this for FOLL_LONGTERM
> > or
> > not. But it's really easy to push me off of fences. :)
>
> *nudge* ;)
>
> >
> > thanks,
> > --
> > John Hubbard
> > NVIDIA
> >
>
> Looking at non-fast, non-FOLL_LONGTERM PUP callers (forgive me if I
> missed any):-
>
> - pin_user_pages_remote() in process_vm_rw_single_vec() for the
> process_vm_access functionality.
>
> - pin_user_pages_remote() in user_event_enabler_write() in
> kernel/trace/trace_events_user.c.
>
> - pin_user_pages_unlocked() in ivtv_udma_setup() in
> drivers/media/pci/ivtv/ivtv-udma.c and ivtv_yuv_prep_user_dma() in
> ivtv-yuv.c.
>
> And none that actually directly invoke PUP without FOLL_LOGNTERM...
> That
> suggests that we could simply disallow non-FOLL_LONGTERM non-fast PUP
> calls
> altogether and move to pin_user_pages_longterm() [I'm happy to write
> a
> patch series doing this].
>
> The ivtv callers look like they really actually want FOLL_LONGTERM
> unless
> I'm missing something so we should probably change that too?
>
> I haven't surveyed the fast versions, but I think defaulting to
> FOLL_LONGTERM on them also makes sense.

2023-06-13 09:30:20

by Sumit Garg

[permalink] [raw]
Subject: Re: FOLL_LONGTERM vs FOLL_EPHEMERAL Re: [PATCH] tee: add FOLL_LONGTERM for CMA case when alloc shm

On Tue, 13 Jun 2023 at 11:00, Xiaoming Ding (丁晓明)
<[email protected]> wrote:
>
> So do we have a conclution about this patch? or need more time to
> study the possible risks

Please avoid top posting. As already discussed here [1],
RLIMIT_MEMLOCK checks have to be implemented if we switch to
FOLL_LONGTERM.

[1] https://lists.trustedfirmware.org/archives/list/[email protected]/message/UEOMNYLDFHDFQNLODGCJVFDOQBR723EQ/

-Sumit

>
> On Tue, 2023-05-23 at 08:25 +0100, Lorenzo Stoakes wrote:
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >
> >
> > On Mon, May 22, 2023 at 06:54:29PM -0700, John Hubbard wrote:
> > > On 5/18/23 06:56, David Hildenbrand wrote:
> > > > On 18.05.23 08:08, Sumit Garg wrote:
> > > > > On Thu, 18 May 2023 at 09:51, Christoph Hellwig <
> > > > > [email protected]> wrote:
> > > > > >
> > > > > > On Wed, May 17, 2023 at 08:23:33PM +0200, David Hildenbrand
> > > > > > wrote:
> > > > > > > In general: if user space controls it -> possibly forever
> > > > > > > -> long-term. Even
> > > > > > > if in most cases it's a short delay: there is no trusting
> > > > > > > on user space.
> > > > > > >
> > > > > > > For example, iouring fixed buffers keep pages pinned until
> > > > > > > user space
> > > > > > > decides to unregistered the buffers -> long-term.
> > > > > > >
> > > > > > > Short-term is, for example, something like O_DIRECT where
> > > > > > > we pin -> DMA ->
> > > > > > > unpin in essentially one operation.
> > > > > >
> > > > > > Btw, one thing that's been on my mind is that I think we got
> > > > > > the
> > > > > > polarity on FOLL_LONGTERM wrong. Instead of opting into the
> > > > > > long term
> > > > > > behavior it really should be the default, with a
> > > > > > FOLL_EPHEMERAL flag
> > > > > > to opt out of it. And every users of this flag is required
> > > > > > to have
> > > > > > a comment explaining the life time rules for the pin..
> >
> > I couldn't agree more, based on my recent forays into GUP the
> > interface
> > continues to strike me as odd:-
> >
> > - FOLL_GET is a wing and a prayer that nothing that
> > [folio|page]_maybe_dma_pinned() prevents happens in the brief
> > period the
> > page is pinned/manipulated. So agree completely with David's
> > concept of
> > unexporting that and perhaps carefully considering our use of
> > it. Obviously the comments around functions like gup_remote() make
> > clear
> > that 'this page not be what you think it is' but I wonder whether
> > many
> > callers of GUP _truly_ take that on board.
> >
> > - FOLL_LONGTERM is entirely optional for PUP and you can just go
> > ahead and
> > fragment page blocks to your heart's content. Of course this would
> > be an
> > abuse, but abuses happen.
> >
> > - With the recent change to PUP/FOLL_LONGTERM disallowing dirty
> > tracked
> > file-backed mappings we're now really relying on this flag
> > indicating a
> > _long term_ pin semantically. By defaulting to this being switched
> > on, we
> > avoid cases of callers who might end up treating the won't
> > reclaim/etc. aspect of PUP as all they care about while ignoring
> > the
> > MIGRATE_MOVABLE aspect.
> >
> > >
> > > I see maybe 10 or 20 call sites today. So it is definitely feasible
> > > to add
> > > documentation at each, explaining the why it wants a long term pin.
> > >
> >
> > Yeah, my efforts at e.g. dropping vmas has been eye-opening in
> > actually
> > quite how often a refactoring like this often ends up being more
> > straightforward than you might imagine.
> >
> > > > >
> > > > > It does look like a better approach to me given the very nature
> > > > > of
> > > > > user space pages.
> > > >
> > > > Yeah, there is a lot of historical baggage. For example, FOLL_GET
> > > > should be inaccessible to kernel modules completely at one point,
> > > > to be only used by selected core-mm pieces.
> > >
> > > Yes. When I first mass-converted call sites from gup to pup, I just
> > > preserved FOLL_GET behavior in order to keep from changing too much
> > > at
> > > once. But I agree that that it would be nice to make FOLL_GET an
> > > mm internal-only flag like FOLL_PIN.
> >
> > Very glad you did that work! And totally understandable as to you
> > being
> > conservative with that, but I think we're at a point where there's
> > more
> > acceptance of incremental improvements to GUP as a whole.
> >
> > I have another patch series saved up for _yet more_ changes on this.
> > But
> > mindful of churn I am trying to space them out... until Jason nudges
> > me of
> > course :)
> >
> > >
> > > >
> > > > Maybe we should even disallow passing in FOLL_LONGTERM as a flag
> > > > and only provide functions like pin_user_pages() vs.
> > > > pin_user_pages_longterm(). Then, discussions about conditional
> > > > flag-setting are no more :)
> > > >
> > > > ... or even use pin_user_pages_shortterm() vs. pin_user_pages()
> > > > ... to make the default be longterm.
> > > >
> > >
> > > Yes, it is true that having most gup flags be internal to mm does
> > > tend
> > > to avoid some bugs. But it's also a lot of churn. I'm still on the
> > > fence
> > > as to whether it's really a good move to do this for FOLL_LONGTERM
> > > or
> > > not. But it's really easy to push me off of fences. :)
> >
> > *nudge* ;)
> >
> > >
> > > thanks,
> > > --
> > > John Hubbard
> > > NVIDIA
> > >
> >
> > Looking at non-fast, non-FOLL_LONGTERM PUP callers (forgive me if I
> > missed any):-
> >
> > - pin_user_pages_remote() in process_vm_rw_single_vec() for the
> > process_vm_access functionality.
> >
> > - pin_user_pages_remote() in user_event_enabler_write() in
> > kernel/trace/trace_events_user.c.
> >
> > - pin_user_pages_unlocked() in ivtv_udma_setup() in
> > drivers/media/pci/ivtv/ivtv-udma.c and ivtv_yuv_prep_user_dma() in
> > ivtv-yuv.c.
> >
> > And none that actually directly invoke PUP without FOLL_LOGNTERM...
> > That
> > suggests that we could simply disallow non-FOLL_LONGTERM non-fast PUP
> > calls
> > altogether and move to pin_user_pages_longterm() [I'm happy to write
> > a
> > patch series doing this].
> >
> > The ivtv callers look like they really actually want FOLL_LONGTERM
> > unless
> > I'm missing something so we should probably change that too?
> >
> > I haven't surveyed the fast versions, but I think defaulting to
> > FOLL_LONGTERM on them also makes sense.
>