2021-07-31 17:55:49

by Luigi Rizzo

[permalink] [raw]
Subject: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

find_vma() and variants need protection when used.
This patch adds mmap_assert_lock() calls in the functions.

To make sure the invariant is satisfied, we also need to add a
mmap_read_loc() around the get_user_pages_remote() call in
get_arg_page(). The lock is not strictly necessary because the mm
has been newly created, but the extra cost is limited because
the same mutex was also acquired shortly before in __bprm_mm_init(),
so it is hot and uncontended.

Signed-off-by: Luigi Rizzo <[email protected]>
---
fs/exec.c | 2 ++
mm/mmap.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 38f63451b928..ac7603e985b4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
* We are doing an exec(). 'current' is the process
* doing the exec and bprm->mm is the new process's mm.
*/
+ mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
&page, NULL, NULL);
+ mmap_read_unlock(bprm->mm);
if (ret <= 0)
return NULL;

diff --git a/mm/mmap.c b/mm/mmap.c
index ca54d36d203a..79f4f8ae43ec 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -534,6 +534,7 @@ static int find_vma_links(struct mm_struct *mm, unsigned long addr,
{
struct rb_node **__rb_link, *__rb_parent, *rb_prev;

+ mmap_assert_locked(mm);
__rb_link = &mm->mm_rb.rb_node;
rb_prev = __rb_parent = NULL;

@@ -2303,6 +2304,7 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
struct rb_node *rb_node;
struct vm_area_struct *vma;

+ mmap_assert_locked(mm);
/* Check the cache first. */
vma = vmacache_find(mm, addr);
if (likely(vma))
--
2.32.0.554.ge1b32706d8-goog



2021-08-01 19:34:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Sat, 31 Jul 2021 10:53:41 -0700 Luigi Rizzo <[email protected]> wrote:

> find_vma() and variants need protection when used.
> This patch adds mmap_assert_lock() calls in the functions.
>
> To make sure the invariant is satisfied, we also need to add a
> mmap_read_loc() around the get_user_pages_remote() call in
> get_arg_page(). The lock is not strictly necessary because the mm
> has been newly created, but the extra cost is limited because
> the same mutex was also acquired shortly before in __bprm_mm_init(),
> so it is hot and uncontended.
>

Well, it isn't cost-free. find_vma() is called a lot and a surprising
number of systems apparently run with CONFIG_DEBUG_VM. Why do you
think this cost is justified?


2021-08-02 00:18:04

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

(repost in plain text)

On Sun, Aug 1, 2021 at 9:33 PM Andrew Morton <[email protected]> wrote:
>
> On Sat, 31 Jul 2021 10:53:41 -0700 Luigi Rizzo <[email protected]> wrote:
>
> > find_vma() and variants need protection when used.
> > This patch adds mmap_assert_lock() calls in the functions.
> >
> > To make sure the invariant is satisfied, we also need to add a
> > mmap_read_loc() around the get_user_pages_remote() call in
> > get_arg_page(). The lock is not strictly necessary because the mm
> > has been newly created, but the extra cost is limited because
> > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > so it is hot and uncontended.
> >
>
> Well, it isn't cost-free. find_vma() is called a lot and a surprising
> number of systems apparently run with CONFIG_DEBUG_VM. Why do you
> think this cost is justified?

I assume you are concerned with the cost of mmap_assert_locked() ?

I'd say the justification is the same as for all asserts:
at some point some code change may miss the required lock, and the
asserts are there to catch elusive race conditions,

There are in fact already instances of mmap_locked_assert()
right before find_vma() in walk_page_range(), and a couple before
calls to __get_user_pages().

As for the cost, I'd think that if CONFIG_DEBUG_VM is set,
one does it on purpose to catch errors and is prepared to pay
the cost (in this case the atomic_read(counter) in rwsem_is_locked(),
the counter should be hot).

FWIW I have instrumented find_vma() on a fast machine using kstats

https://github.com/luigirizzo/lr-cstats

(load the module then enable the trace with
echo "trace pcpu:find_vma bits 3" > /sys/kernel/debug/kstats/_control
and monitor the time with
watch "grep CPUS /sys/kernel/debug/kstats/find_vma"

I didn't run anything especially intensive except some network
benchmarks, but I have collected ~2M samples with the following
distribution of find_vma() time in nanoseconds in 3 configs:

CONFIGURATION p10 p50 p90 p95 p98

no-debug 89 109 214 332 605
debug 331 369 603 862 1338
debug+this patch 337 369 603 863 1339

As you can see, just compiling a debug kernel, even without this patch,
makes the function 3x more expensive. The effect of this patch is
not measurable (the differences are below measurement error).

cheers
luigi

2021-08-02 21:14:17

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Mon, 2 Aug 2021 02:16:14 +0200 Luigi Rizzo <[email protected]> wrote:

> > Well, it isn't cost-free. find_vma() is called a lot and a surprising
> > number of systems apparently run with CONFIG_DEBUG_VM. Why do you
> > think this cost is justified?
>
> I assume you are concerned with the cost of mmap_assert_locked() ?
>
> I'd say the justification is the same as for all asserts:
> at some point some code change may miss the required lock, and the
> asserts are there to catch elusive race conditions,
>
> There are in fact already instances of mmap_locked_assert()
> right before find_vma() in walk_page_range(), and a couple before
> calls to __get_user_pages().
>
> As for the cost, I'd think that if CONFIG_DEBUG_VM is set,
> one does it on purpose to catch errors and is prepared to pay
> the cost (in this case the atomic_read(counter) in rwsem_is_locked(),
> the counter should be hot).
>
> FWIW I have instrumented find_vma() on a fast machine using kstats
>
> https://github.com/luigirizzo/lr-cstats
>
> (load the module then enable the trace with
> echo "trace pcpu:find_vma bits 3" > /sys/kernel/debug/kstats/_control
> and monitor the time with
> watch "grep CPUS /sys/kernel/debug/kstats/find_vma"
>
> I didn't run anything especially intensive except some network
> benchmarks, but I have collected ~2M samples with the following
> distribution of find_vma() time in nanoseconds in 3 configs:
>
> CONFIGURATION p10 p50 p90 p95 p98
>
> no-debug 89 109 214 332 605
> debug 331 369 603 862 1338
> debug+this patch 337 369 603 863 1339
>
> As you can see, just compiling a debug kernel, even without this patch,
> makes the function 3x more expensive. The effect of this patch is
> not measurable (the differences are below measurement error).

Cool, thanks, that's convincing.

2021-08-03 16:10:22

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> find_vma() and variants need protection when used.
> This patch adds mmap_assert_lock() calls in the functions.
>
> To make sure the invariant is satisfied, we also need to add a
> mmap_read_loc() around the get_user_pages_remote() call in
> get_arg_page(). The lock is not strictly necessary because the mm
> has been newly created, but the extra cost is limited because
> the same mutex was also acquired shortly before in __bprm_mm_init(),
> so it is hot and uncontended.
>
> Signed-off-by: Luigi Rizzo <[email protected]>
> fs/exec.c | 2 ++
> mm/mmap.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 38f63451b928..ac7603e985b4 100644
> +++ b/fs/exec.c
> @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> * We are doing an exec(). 'current' is the process
> * doing the exec and bprm->mm is the new process's mm.
> */
> + mmap_read_lock(bprm->mm);
> ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> &page, NULL, NULL);
> + mmap_read_unlock(bprm->mm);
> if (ret <= 0)
> return NULL;

Wasn't Jann Horn working on something like this too?

https://lore.kernel.org/linux-mm/[email protected]/

IIRC it was very tricky here, are you sure it is OK to obtain this lock
here?

I would much rather see Jann's complete solution be merged then
hacking at the exec problem on the side..

Jason

2021-08-03 22:37:27

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > find_vma() and variants need protection when used.
> > This patch adds mmap_assert_lock() calls in the functions.
> >
> > To make sure the invariant is satisfied, we also need to add a
> > mmap_read_loc() around the get_user_pages_remote() call in
> > get_arg_page(). The lock is not strictly necessary because the mm
> > has been newly created, but the extra cost is limited because
> > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > so it is hot and uncontended.
> >
> > Signed-off-by: Luigi Rizzo <[email protected]>
> > fs/exec.c | 2 ++
> > mm/mmap.c | 2 ++
> > 2 files changed, 4 insertions(+)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 38f63451b928..ac7603e985b4 100644
> > +++ b/fs/exec.c
> > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > * We are doing an exec(). 'current' is the process
> > * doing the exec and bprm->mm is the new process's mm.
> > */
> > + mmap_read_lock(bprm->mm);
> > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > &page, NULL, NULL);
> > + mmap_read_unlock(bprm->mm);
> > if (ret <= 0)
> > return NULL;
>
> Wasn't Jann Horn working on something like this too?
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> IIRC it was very tricky here, are you sure it is OK to obtain this lock
> here?

I cannot comment on Jann's patch series but no other thread knows
about this mm at this point in the code so the lock is definitely
safe to acquire (shortly before there was also a write lock acquired
on the same mm, in the same conditions).

cheers
luigi

>
> I would much rather see Jann's complete solution be merged then
> hacking at the exec problem on the side..
>
> Jason

2021-08-03 23:09:33

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

* Luigi Rizzo <[email protected]> [210803 17:49]:
> On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > find_vma() and variants need protection when used.
> > > This patch adds mmap_assert_lock() calls in the functions.
> > >
> > > To make sure the invariant is satisfied, we also need to add a
> > > mmap_read_loc() around the get_user_pages_remote() call in
> > > get_arg_page(). The lock is not strictly necessary because the mm
> > > has been newly created, but the extra cost is limited because
> > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > so it is hot and uncontended.
> > >
> > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > fs/exec.c | 2 ++
> > > mm/mmap.c | 2 ++
> > > 2 files changed, 4 insertions(+)
> > >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index 38f63451b928..ac7603e985b4 100644
> > > +++ b/fs/exec.c
> > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > * We are doing an exec(). 'current' is the process
> > > * doing the exec and bprm->mm is the new process's mm.
> > > */
> > > + mmap_read_lock(bprm->mm);
> > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > &page, NULL, NULL);
> > > + mmap_read_unlock(bprm->mm);
> > > if (ret <= 0)
> > > return NULL;
> >
> > Wasn't Jann Horn working on something like this too?
> >
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > here?
>
> I cannot comment on Jann's patch series but no other thread knows
> about this mm at this point in the code so the lock is definitely
> safe to acquire (shortly before there was also a write lock acquired
> on the same mm, in the same conditions).

If there is no other code that knows about this mm, then does one need
the lock at all? Is this just to satisfy the new check you added?

If you want to make this change, I would suggest writing it in a way to
ensure the call to expand_downwards() in the same function also holds
the lock. I believe this is technically required as well? What do you
think?

Thanks,
Liam

>
> cheers
> luigi
>
> >
> > I would much rather see Jann's complete solution be merged then
> > hacking at the exec problem on the side..
> >
> > Jason
>

2021-08-04 00:17:40

by Luigi Rizzo

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Wed, Aug 4, 2021 at 1:35 AM Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> > * Luigi Rizzo <[email protected]> [210803 17:49]:
> > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> > > >
> > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > find_vma() and variants need protection when used.
> > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > >
> > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > has been newly created, but the extra cost is limited because
> > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > so it is hot and uncontended.
> > > > >
> > > > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > > > fs/exec.c | 2 ++
> > > > > mm/mmap.c | 2 ++
> > > > > 2 files changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > +++ b/fs/exec.c
> > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > > * We are doing an exec(). 'current' is the process
> > > > > * doing the exec and bprm->mm is the new process's mm.
> > > > > */
> > > > > + mmap_read_lock(bprm->mm);
> > > > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > > &page, NULL, NULL);
> > > > > + mmap_read_unlock(bprm->mm);
> > > > > if (ret <= 0)
> > > > > return NULL;
> > > >
> > > > Wasn't Jann Horn working on something like this too?
> > > >
> > > > https://lore.kernel.org/linux-mm/[email protected]/
> > > >
> > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > here?
> > >
> > > I cannot comment on Jann's patch series but no other thread knows
> > > about this mm at this point in the code so the lock is definitely
> > > safe to acquire (shortly before there was also a write lock acquired
> > > on the same mm, in the same conditions).
> >
> > If there is no other code that knows about this mm, then does one need
> > the lock at all? Is this just to satisfy the new check you added?
> >
> > If you want to make this change, I would suggest writing it in a way to
> > ensure the call to expand_downwards() in the same function also holds
> > the lock. I believe this is technically required as well? What do you
> > think?
>
> This is essentially what Jann was doing. Since the mm is newly created
> we can create it write locked and then we can add proper locking tests
> to many of the functions called along this path.
>
> Adding useless locks around each troublesome callsite just seems
> really confusing to me.

Uhm... by that reasoning, even creating the mm locked (and unlocking
at the end) is equally unnecessary.

My goal was to add asserts and invariants that are easy
to understand and get right, rather than optimize a path
that does not appear to be critical.

Adding one read lock pair around the one function we annotate
is easy to understand and clearly a leaf lock.

Having alloc_bprm return a locked object is a bit unconventional,
and also passing it to other methods raises the question of whether
they take other lock possibly causing lock order reversals
in the future.

cheers
luigi

2021-08-04 01:09:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> * Luigi Rizzo <[email protected]> [210803 17:49]:
> > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > find_vma() and variants need protection when used.
> > > > This patch adds mmap_assert_lock() calls in the functions.
> > > >
> > > > To make sure the invariant is satisfied, we also need to add a
> > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > has been newly created, but the extra cost is limited because
> > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > so it is hot and uncontended.
> > > >
> > > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > > fs/exec.c | 2 ++
> > > > mm/mmap.c | 2 ++
> > > > 2 files changed, 4 insertions(+)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 38f63451b928..ac7603e985b4 100644
> > > > +++ b/fs/exec.c
> > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > * We are doing an exec(). 'current' is the process
> > > > * doing the exec and bprm->mm is the new process's mm.
> > > > */
> > > > + mmap_read_lock(bprm->mm);
> > > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > &page, NULL, NULL);
> > > > + mmap_read_unlock(bprm->mm);
> > > > if (ret <= 0)
> > > > return NULL;
> > >
> > > Wasn't Jann Horn working on something like this too?
> > >
> > > https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > here?
> >
> > I cannot comment on Jann's patch series but no other thread knows
> > about this mm at this point in the code so the lock is definitely
> > safe to acquire (shortly before there was also a write lock acquired
> > on the same mm, in the same conditions).
>
> If there is no other code that knows about this mm, then does one need
> the lock at all? Is this just to satisfy the new check you added?
>
> If you want to make this change, I would suggest writing it in a way to
> ensure the call to expand_downwards() in the same function also holds
> the lock. I believe this is technically required as well? What do you
> think?

This is essentially what Jann was doing. Since the mm is newly created
we can create it write locked and then we can add proper locking tests
to many of the functions called along this path.

Adding useless locks around each troublesome callsite just seems
really confusing to me.

Jason

2021-08-04 05:32:01

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

* Luigi Rizzo <[email protected]> [210803 19:58]:
> On Wed, Aug 4, 2021 at 1:35 AM Jason Gunthorpe <[email protected]> wrote:
> >
> > On Tue, Aug 03, 2021 at 11:07:35PM +0000, Liam Howlett wrote:
> > > * Luigi Rizzo <[email protected]> [210803 17:49]:
> > > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> > > > >
> > > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > > find_vma() and variants need protection when used.
> > > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > > >
> > > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > > has been newly created, but the extra cost is limited because
> > > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > > so it is hot and uncontended.
> > > > > >
> > > > > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > > > > fs/exec.c | 2 ++
> > > > > > mm/mmap.c | 2 ++
> > > > > > 2 files changed, 4 insertions(+)
> > > > > >
> > > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > > +++ b/fs/exec.c
> > > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > > > * We are doing an exec(). 'current' is the process
> > > > > > * doing the exec and bprm->mm is the new process's mm.
> > > > > > */
> > > > > > + mmap_read_lock(bprm->mm);
> > > > > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > > > &page, NULL, NULL);
> > > > > > + mmap_read_unlock(bprm->mm);
> > > > > > if (ret <= 0)
> > > > > > return NULL;
> > > > >
> > > > > Wasn't Jann Horn working on something like this too?
> > > > >
> > > > > https://lore.kernel.org/linux-mm/[email protected]/
> > > > >
> > > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > > here?
> > > >
> > > > I cannot comment on Jann's patch series but no other thread knows
> > > > about this mm at this point in the code so the lock is definitely
> > > > safe to acquire (shortly before there was also a write lock acquired
> > > > on the same mm, in the same conditions).
> > >
> > > If there is no other code that knows about this mm, then does one need
> > > the lock at all? Is this just to satisfy the new check you added?
> > >
> > > If you want to make this change, I would suggest writing it in a way to
> > > ensure the call to expand_downwards() in the same function also holds
> > > the lock. I believe this is technically required as well? What do you
> > > think?
> >
> > This is essentially what Jann was doing. Since the mm is newly created
> > we can create it write locked and then we can add proper locking tests
> > to many of the functions called along this path.

That sounds good. Jann has left the patch as pending a fix since
November 2020. Can't the removal of the lock/unlock be added to the
next iteration of the patch? Was there a v4 of that patch?

> >
> > Adding useless locks around each troublesome callsite just seems
> > really confusing to me.
>
> Uhm... by that reasoning, even creating the mm locked (and unlocking
> at the end) is equally unnecessary.

I think taking the lock is more clear than leaving it the way it's
currently written. It is actually confusing to see the lock taken after
calling expand_downwards() which explicitly mentions the lock as
required in the comments though. This should at least have a comment
about early creation not requiring the lock.

>
> My goal was to add asserts and invariants that are easy
> to understand and get right, rather than optimize a path
> that does not appear to be critical.
>
> Adding one read lock pair around the one function we annotate
> is easy to understand and clearly a leaf lock.
>
> Having alloc_bprm return a locked object is a bit unconventional,
> and also passing it to other methods raises the question of whether
> they take other lock possibly causing lock order reversals
> in the future.

We are (probably?) okay as the usual order right now is to take the mmap
sem before the pte and interval tree. It's also just for the set up, so
unless there is a special case that could cause trouble... or maybe I
should ask which cases will cause trouble?

Thanks,
Liam

2021-08-04 15:06:17

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Wed, Aug 4, 2021 at 1:07 AM Liam Howlett <[email protected]> wrote:
> * Luigi Rizzo <[email protected]> [210803 17:49]:
> > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> > >
> > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > find_vma() and variants need protection when used.
> > > > This patch adds mmap_assert_lock() calls in the functions.
> > > >
> > > > To make sure the invariant is satisfied, we also need to add a
> > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > has been newly created, but the extra cost is limited because
> > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > so it is hot and uncontended.
> > > >
> > > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > > fs/exec.c | 2 ++
> > > > mm/mmap.c | 2 ++
> > > > 2 files changed, 4 insertions(+)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 38f63451b928..ac7603e985b4 100644
> > > > +++ b/fs/exec.c
> > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > * We are doing an exec(). 'current' is the process
> > > > * doing the exec and bprm->mm is the new process's mm.
> > > > */
> > > > + mmap_read_lock(bprm->mm);
> > > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > &page, NULL, NULL);
> > > > + mmap_read_unlock(bprm->mm);
> > > > if (ret <= 0)
> > > > return NULL;
> > >
> > > Wasn't Jann Horn working on something like this too?
> > >
> > > https://lore.kernel.org/linux-mm/[email protected]/
> > >
> > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > here?
> >
> > I cannot comment on Jann's patch series but no other thread knows
> > about this mm at this point in the code so the lock is definitely
> > safe to acquire (shortly before there was also a write lock acquired
> > on the same mm, in the same conditions).
>
> If there is no other code that knows about this mm, then does one need
> the lock at all? Is this just to satisfy the new check you added?
>
> If you want to make this change, I would suggest writing it in a way to
> ensure the call to expand_downwards() in the same function also holds
> the lock. I believe this is technically required as well? What do you
> think?

The call to expand_downwards() takes a VMA pointer as argument, and
the mmap lock is the only thing that normally prevents concurrent
freeing of VMA structs. Taking a lock there would be of limited utility - either
the lock is not necessary because nobody else can access the MM, or
the lock is insufficient because someone could have freed the VMA
pointer before the lock was taken. So I think that taking a lock
around the expand_downwards() call would just be obfuscating things,
unless you specifically want to prevent concurrent *reads* while
concurrent *writes* are impossible.

Since I haven't sent a new version of my old series for almost a year,
I think it'd be fine to take Luigi's patch for now, and undo it at a
later point when/if we want to actually use proper locking here
because we're worried about concurrent access to the MM.

2021-08-04 18:46:18

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Wed, Aug 04, 2021 at 04:42:23PM +0200, Jann Horn wrote:
> Since I haven't sent a new version of my old series for almost a year,
> I think it'd be fine to take Luigi's patch for now, and undo it at a
> later point when/if we want to actually use proper locking here
> because we're worried about concurrent access to the MM.

IIRC one of the major points of that work was not "proper locking" but
to have enough locking to be complatible with lockdep so we could add
assertions like in get_user_pages and find_vma.

Jason

2021-08-04 20:42:45

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

* Jann Horn <[email protected]> [210804 10:42]:
> On Wed, Aug 4, 2021 at 1:07 AM Liam Howlett <[email protected]> wrote:
> > * Luigi Rizzo <[email protected]> [210803 17:49]:
> > > On Tue, Aug 3, 2021 at 6:08 PM Jason Gunthorpe <[email protected]> wrote:
> > > >
> > > > On Sat, Jul 31, 2021 at 10:53:41AM -0700, Luigi Rizzo wrote:
> > > > > find_vma() and variants need protection when used.
> > > > > This patch adds mmap_assert_lock() calls in the functions.
> > > > >
> > > > > To make sure the invariant is satisfied, we also need to add a
> > > > > mmap_read_loc() around the get_user_pages_remote() call in
> > > > > get_arg_page(). The lock is not strictly necessary because the mm
> > > > > has been newly created, but the extra cost is limited because
> > > > > the same mutex was also acquired shortly before in __bprm_mm_init(),
> > > > > so it is hot and uncontended.
> > > > >
> > > > > Signed-off-by: Luigi Rizzo <[email protected]>
> > > > > fs/exec.c | 2 ++
> > > > > mm/mmap.c | 2 ++
> > > > > 2 files changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 38f63451b928..ac7603e985b4 100644
> > > > > +++ b/fs/exec.c
> > > > > @@ -217,8 +217,10 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
> > > > > * We are doing an exec(). 'current' is the process
> > > > > * doing the exec and bprm->mm is the new process's mm.
> > > > > */
> > > > > + mmap_read_lock(bprm->mm);
> > > > > ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
> > > > > &page, NULL, NULL);
> > > > > + mmap_read_unlock(bprm->mm);
> > > > > if (ret <= 0)
> > > > > return NULL;
> > > >
> > > > Wasn't Jann Horn working on something like this too?
> > > >
> > > > https://lore.kernel.org/linux-mm/[email protected]/
> > > >
> > > > IIRC it was very tricky here, are you sure it is OK to obtain this lock
> > > > here?
> > >
> > > I cannot comment on Jann's patch series but no other thread knows
> > > about this mm at this point in the code so the lock is definitely
> > > safe to acquire (shortly before there was also a write lock acquired
> > > on the same mm, in the same conditions).
> >
> > If there is no other code that knows about this mm, then does one need
> > the lock at all? Is this just to satisfy the new check you added?
> >
> > If you want to make this change, I would suggest writing it in a way to
> > ensure the call to expand_downwards() in the same function also holds
> > the lock. I believe this is technically required as well? What do you
> > think?
>
> The call to expand_downwards() takes a VMA pointer as argument, and
> the mmap lock is the only thing that normally prevents concurrent
> freeing of VMA structs. Taking a lock there would be of limited utility - either
> the lock is not necessary because nobody else can access the MM, or
> the lock is insufficient because someone could have freed the VMA
> pointer before the lock was taken. So I think that taking a lock
> around the expand_downwards() call would just be obfuscating things,
> unless you specifically want to prevent concurrent *reads* while
> concurrent *writes* are impossible.

Good point on the VMA being passed in, that certainly points to your
previous patch being a better approach. That resolves my questions
around the patch.

>
> Since I haven't sent a new version of my old series for almost a year,
> I think it'd be fine to take Luigi's patch for now, and undo it at a
> later point when/if we want to actually use proper locking here
> because we're worried about concurrent access to the MM.

2021-08-04 22:36:03

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] Add mmap_assert_locked() annotations to find_vma*()

On Wed, Aug 4, 2021 at 5:21 PM Jason Gunthorpe <[email protected]> wrote:
> On Wed, Aug 04, 2021 at 04:42:23PM +0200, Jann Horn wrote:
> > Since I haven't sent a new version of my old series for almost a year,
> > I think it'd be fine to take Luigi's patch for now, and undo it at a
> > later point when/if we want to actually use proper locking here
> > because we're worried about concurrent access to the MM.
>
> IIRC one of the major points of that work was not "proper locking" but
> to have enough locking to be complatible with lockdep so we could add
> assertions like in get_user_pages and find_vma.

That's part of it; but it's also for making the code more clearly
correct and future-proofing it. Looking at it now, I think
process_madvise() might actually already be able to race with execve()
to some degree; and if you made a change like this to the current
kernel:

diff --git a/mm/madvise.c b/mm/madvise.c
index 6d3d348b17f4..3648c198673c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1043,12 +1043,14 @@ madvise_behavior_valid(int behavior)
static bool
process_madvise_behavior_valid(int behavior)
{
switch (behavior) {
case MADV_COLD:
case MADV_PAGEOUT:
+ case MADV_DOFORK:
+ case MADV_DONTFORK:
return true;
default:
return false;
}
}

it would probably introduce a memory corruption bug, because then
someone might be able to destroy the stack VMA between
setup_new_exec() and setup_arg_pages() by using process_madvise() to
trigger VMA splitting/merging in the right pattern.