2022-02-16 07:19:25

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
otherwise it points to a vma that was freed and when reused leads to
a use-after-free bug.

Reported-by: [email protected]
Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
---
mm/mmap.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1e8fdb0b51ed..d445c1b9d606 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
vma = remove_vma(vma);
cond_resched();
}
+ mm->mmap = NULL;
mmap_write_unlock(mm);
vm_unacct_memory(nr_accounted);
}
--
2.35.1.265.g69c8d7142f-goog


2022-02-16 07:38:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:

> After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> otherwise it points to a vma that was freed and when reused leads to
> a use-after-free bug.
>
> ...
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> vma = remove_vma(vma);
> cond_resched();
> }
> + mm->mmap = NULL;
> mmap_write_unlock(mm);
> vm_unacct_memory(nr_accounted);
> }

https://lore.kernel.org/all/[email protected]/

It would be nice to have a Fixes: for this.

Is it specific to process_mrelease(), or should we backport further?

2022-02-16 08:15:53

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Tue 15-02-22 12:19:22, Suren Baghdasaryan wrote:
> After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> otherwise it points to a vma that was freed and when reused leads to
> a use-after-free bug.

OK, so I have dived into this again. exit_mmap doesn't reset mmap
indeed. That doesn't really matter for _oom victims_. Both the oom reaper and
mrelease do check for MMF_OOM_SKIP before calling __oom_reap_task_mm.
exit_mmap still sets MMF_OOM_SKIP before taking the mmap_lock for oom
victims so those paths should be still properly synchronized. I have
proposed to get rid of this
http://lkml.kernel.org/r/[email protected] but we haven't
agreed on that.

mrelease path is broken because it doesn't mark the process oom_victim
and so the MMF_OOM_SKIP synchronization doesn't work. So we really need
this.

I would propose to rephrase the changelog to be more specific because I
do not want to remember all those details later on.
What about
"
oom reaping (__oom_reap_task_mm) relies on a 2 way synchronization with
exit_mmap. First it relies on the mmap_lock to exclude from unlock
path[1], page tables tear down (free_pgtables) and vma destruction.
This alone is not sufficient because mm->mmap is never reset. For
historical reasons[2] the lock is taken there is also MMF_OOM_SKIP set
for oom victims before.

The oom reaper only ever looks at oom victims so the whole scheme works
properly but process_mrelease can opearate on any task (with fatal
signals pending) which doesn't really imply oom victims. That means that
the MMF_OOM_SKIP part of the synchronization doesn't work and it can
see a task after the whole address space has been demolished and
traverse an already released mm->mmap list. This leads to use after free
as properly caught up by KASAN report.

Fix the issue by reseting mm->mmap so that MMF_OOM_SKIP synchronization
is not needed anymore. The MMF_OOM_SKIP is not removed from exit_mmap
yet but it acts mostly as an optimization now.

[1] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap,
v3")
[2] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run
concurrently")
"

I really have to say that I hate how complex this has grown in the name
of optimizations. This has backfired several times already resulting in
2 security issues. I really hope to get read any note of the oom reaper
from exit_mmap.

> Reported-by: [email protected]
> Suggested-by: Michal Hocko <[email protected]>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Acked-by: Michal Hocko <[email protected]>

Thanks!
> ---
> mm/mmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 1e8fdb0b51ed..d445c1b9d606 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> vma = remove_vma(vma);
> cond_resched();
> }
> + mm->mmap = NULL;
> mmap_write_unlock(mm);
> vm_unacct_memory(nr_accounted);
> }
> --
> 2.35.1.265.g69c8d7142f-goog

--
Michal Hocko
SUSE Labs

2022-02-17 21:06:12

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Tue, Feb 15, 2022 at 11:54 PM Michal Hocko <[email protected]> wrote:
>
> On Tue 15-02-22 12:19:22, Suren Baghdasaryan wrote:
> > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > otherwise it points to a vma that was freed and when reused leads to
> > a use-after-free bug.
>
> OK, so I have dived into this again. exit_mmap doesn't reset mmap
> indeed. That doesn't really matter for _oom victims_. Both the oom reaper and
> mrelease do check for MMF_OOM_SKIP before calling __oom_reap_task_mm.
> exit_mmap still sets MMF_OOM_SKIP before taking the mmap_lock for oom
> victims so those paths should be still properly synchronized. I have
> proposed to get rid of this
> http://lkml.kernel.org/r/[email protected] but we haven't
> agreed on that.
>
> mrelease path is broken because it doesn't mark the process oom_victim
> and so the MMF_OOM_SKIP synchronization doesn't work. So we really need
> this.
>
> I would propose to rephrase the changelog to be more specific because I
> do not want to remember all those details later on.
> What about
> "
> oom reaping (__oom_reap_task_mm) relies on a 2 way synchronization with
> exit_mmap. First it relies on the mmap_lock to exclude from unlock
> path[1], page tables tear down (free_pgtables) and vma destruction.
> This alone is not sufficient because mm->mmap is never reset. For
> historical reasons[2] the lock is taken there is also MMF_OOM_SKIP set
> for oom victims before.
>
> The oom reaper only ever looks at oom victims so the whole scheme works
> properly but process_mrelease can opearate on any task (with fatal
> signals pending) which doesn't really imply oom victims. That means that
> the MMF_OOM_SKIP part of the synchronization doesn't work and it can
> see a task after the whole address space has been demolished and
> traverse an already released mm->mmap list. This leads to use after free
> as properly caught up by KASAN report.
>
> Fix the issue by reseting mm->mmap so that MMF_OOM_SKIP synchronization
> is not needed anymore. The MMF_OOM_SKIP is not removed from exit_mmap
> yet but it acts mostly as an optimization now.
>
> [1] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap,
> v3")
> [2] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run
> concurrently")
> "

This changelog is very detailed and I support switching to it. Andrew,
please let me know if I should re-post the patch with this description
or you will just amend the one in your tree.

>
> I really have to say that I hate how complex this has grown in the name
> of optimizations. This has backfired several times already resulting in
> 2 security issues. I really hope to get read any note of the oom reaper
> from exit_mmap.

Agree. I want to take another stab at removing __oom_reap_task_mm from
exit_mmap. Now that Hugh made changes to mlock mechanisms and
__oom_reap_task_mm does not skip locked vmas I think that should be
possible. Planning to look into that sometimes next week.

>
> > Reported-by: [email protected]
> > Suggested-by: Michal Hocko <[email protected]>
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>

Thanks!

>
> Thanks!
> > ---
> > mm/mmap.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 1e8fdb0b51ed..d445c1b9d606 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > vma = remove_vma(vma);
> > cond_resched();
> > }
> > + mm->mmap = NULL;
> > mmap_write_unlock(mm);
> > vm_unacct_memory(nr_accounted);
> > }
> > --
> > 2.35.1.265.g69c8d7142f-goog
>
> --
> Michal Hocko
> SUSE Labs

2022-02-18 00:21:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu, 17 Feb 2022 11:51:13 -0800 Suren Baghdasaryan <[email protected]> wrote:

> This changelog is very detailed and I support switching to it. Andrew,
> please let me know if I should re-post the patch with this description
> or you will just amend the one in your tree.

I have made that change.


From: Suren Baghdasaryan <[email protected]>
Subject: mm: fix use-after-free bug when mm->mmap is reused after being freed

oom reaping (__oom_reap_task_mm) relies on a 2 way synchronization with
exit_mmap. First it relies on the mmap_lock to exclude from unlock
path[1], page tables tear down (free_pgtables) and vma destruction.
This alone is not sufficient because mm->mmap is never reset. For
historical reasons[2] the lock is taken there is also MMF_OOM_SKIP set
for oom victims before.

The oom reaper only ever looks at oom victims so the whole scheme works
properly but process_mrelease can opearate on any task (with fatal
signals pending) which doesn't really imply oom victims. That means
that the MMF_OOM_SKIP part of the synchronization doesn't work and it
can see a task after the whole address space has been demolished and
traverse an already released mm->mmap list. This leads to use after
free as properly caught up by KASAN report.

Fix the issue by reseting mm->mmap so that MMF_OOM_SKIP synchronization
is not needed anymore. The MMF_OOM_SKIP is not removed from exit_mmap
yet but it acts mostly as an optimization now.

[1] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[2] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")

[[email protected]: changelog rewrite]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 64591e8605d6 ("mm: protect free_pgtables with mmap_lock write lock in exit_mmap")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Reported-by: [email protected]
Suggested-by: Michal Hocko <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jan Engelhardt <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/mmap.c | 1 +
1 file changed, 1 insertion(+)

--- a/mm/mmap.c~mm-fix-use-after-free-bug-when-mm-mmap-is-reused-after-being-freed
+++ a/mm/mmap.c
@@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
vma = remove_vma(vma);
cond_resched();
}
+ mm->mmap = NULL;
mmap_write_unlock(mm);
vm_unacct_memory(nr_accounted);
}
_

2022-02-18 08:34:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu 17-02-22 12:50:56, Andrew Morton wrote:
> On Thu, 17 Feb 2022 11:51:13 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > This changelog is very detailed and I support switching to it. Andrew,
> > please let me know if I should re-post the patch with this description
> > or you will just amend the one in your tree.
>
> I have made that change.

LGTM. Thanks!
--
Michal Hocko
SUSE Labs

2022-02-18 09:01:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu 17-02-22 11:51:13, Suren Baghdasaryan wrote:
> On Tue, Feb 15, 2022 at 11:54 PM Michal Hocko <[email protected]> wrote:
[...]
> > I really have to say that I hate how complex this has grown in the name
> > of optimizations. This has backfired several times already resulting in
> > 2 security issues. I really hope to get read any note of the oom reaper
> > from exit_mmap.
>
> Agree. I want to take another stab at removing __oom_reap_task_mm from
> exit_mmap. Now that Hugh made changes to mlock mechanisms and
> __oom_reap_task_mm does not skip locked vmas I think that should be
> possible. Planning to look into that sometimes next week.

Thanks!
--
Michal Hocko
SUSE Labs

2022-02-25 07:37:24

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > otherwise it points to a vma that was freed and when reused leads to
> > a use-after-free bug.
> >
> > ...
> >
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > vma = remove_vma(vma);
> > cond_resched();
> > }
> > + mm->mmap = NULL;
> > mmap_write_unlock(mm);
> > vm_unacct_memory(nr_accounted);
> > }
>
> After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> revert this fix as part of merging the maple-tree parts of linux-next.
> I'll be sending this fix to Linus this week.
>
> All of which means that the thusly-resolved Maple tree patches might
> reintroduce this use-after-free bug.

I don't think so? The problem is that VMAs are (currently) part of
two data structures -- the rbtree and the linked list. remove_vma()
only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.

With maple tree, the linked list goes away. remove_vma() removes VMAs
from the maple tree. So anyone looking to iterate over all VMAs has to
go and look in the maple tree for them ... and there's nothing there.

But maybe I misunderstood the bug that's being solved here.

2022-02-25 08:51:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:

> After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> otherwise it points to a vma that was freed and when reused leads to
> a use-after-free bug.
>
> ...
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> vma = remove_vma(vma);
> cond_resched();
> }
> + mm->mmap = NULL;
> mmap_write_unlock(mm);
> vm_unacct_memory(nr_accounted);
> }

After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
revert this fix as part of merging the maple-tree parts of linux-next.
I'll be sending this fix to Linus this week.

All of which means that the thusly-resolved Maple tree patches might
reintroduce this use-after-free bug.

2022-02-25 09:24:40

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
>
> On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > otherwise it points to a vma that was freed and when reused leads to
> > > a use-after-free bug.
> > >
> > > ...
> > >
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > vma = remove_vma(vma);
> > > cond_resched();
> > > }
> > > + mm->mmap = NULL;
> > > mmap_write_unlock(mm);
> > > vm_unacct_memory(nr_accounted);
> > > }
> >
> > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > revert this fix as part of merging the maple-tree parts of linux-next.
> > I'll be sending this fix to Linus this week.
> >
> > All of which means that the thusly-resolved Maple tree patches might
> > reintroduce this use-after-free bug.
>
> I don't think so? The problem is that VMAs are (currently) part of
> two data structures -- the rbtree and the linked list. remove_vma()
> only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
>
> With maple tree, the linked list goes away. remove_vma() removes VMAs
> from the maple tree. So anyone looking to iterate over all VMAs has to
> go and look in the maple tree for them ... and there's nothing there.

Yes, I think you are right. With maple trees we don't need this fix.

>
> But maybe I misunderstood the bug that's being solved here.

2022-02-25 13:08:20

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu 24-02-22 20:18:59, Andrew Morton wrote:
> On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > otherwise it points to a vma that was freed and when reused leads to
> > a use-after-free bug.
> >
> > ...
> >
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > vma = remove_vma(vma);
> > cond_resched();
> > }
> > + mm->mmap = NULL;
> > mmap_write_unlock(mm);
> > vm_unacct_memory(nr_accounted);
> > }
>
> After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> revert this fix as part of merging the maple-tree parts of linux-next.
> I'll be sending this fix to Linus this week.

But this is a regression introduced in this release cycle so the patch
should be merged before Maple tree patches, no?
--
Michal Hocko
SUSE Labs

2022-02-26 01:41:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Fri, 25 Feb 2022 11:17:34 +0100 Michal Hocko <[email protected]> wrote:

> On Thu 24-02-22 20:18:59, Andrew Morton wrote:
> > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > otherwise it points to a vma that was freed and when reused leads to
> > > a use-after-free bug.
> > >
> > > ...
> > >
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > vma = remove_vma(vma);
> > > cond_resched();
> > > }
> > > + mm->mmap = NULL;
> > > mmap_write_unlock(mm);
> > > vm_unacct_memory(nr_accounted);
> > > }
> >
> > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > revert this fix as part of merging the maple-tree parts of linux-next.
> > I'll be sending this fix to Linus this week.
>
> But this is a regression introduced in this release cycle so the patch
> should be merged before Maple tree patches, no?

Yes, I'll be sending this one-liner upstream very soon and we'll then
undo it in the maple-tree patchset.

2022-03-10 17:24:10

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

* Suren Baghdasaryan <[email protected]> [220225 00:51]:
> On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > otherwise it points to a vma that was freed and when reused leads to
> > > > a use-after-free bug.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > vma = remove_vma(vma);
> > > > cond_resched();
> > > > }
> > > > + mm->mmap = NULL;
> > > > mmap_write_unlock(mm);
> > > > vm_unacct_memory(nr_accounted);
> > > > }
> > >
> > > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > I'll be sending this fix to Linus this week.
> > >
> > > All of which means that the thusly-resolved Maple tree patches might
> > > reintroduce this use-after-free bug.
> >
> > I don't think so? The problem is that VMAs are (currently) part of
> > two data structures -- the rbtree and the linked list. remove_vma()
> > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> >
> > With maple tree, the linked list goes away. remove_vma() removes VMAs
> > from the maple tree. So anyone looking to iterate over all VMAs has to
> > go and look in the maple tree for them ... and there's nothing there.
>
> Yes, I think you are right. With maple trees we don't need this fix.


Yes, this is correct. The maple tree removes the entire linked list...
but since the mm is unstable in the exit_mmap(), I had added the
destruction of the maple tree there. Maybe this is the wrong place to
be destroying the tree tracking the VMAs (althought this patch partially
destroys the VMA tracking linked list), but it brought my attention to
the race that this patch solves and the process_mrelease() function.
Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
in process_mrelease()? That would ensure we aren't stepping on an
exit_mmap() and potentially the locking change in exit_mmap() wouldn't
be needed either? Logically, I view this as process_mrelease() having
issue with the fact that the mmaps are no longer stable in tear down
regardless of the data structure that is used.

Thanks,
Liam

2022-03-11 01:34:12

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

* Suren Baghdasaryan <[email protected]> [220310 11:28]:
> On Thu, Mar 10, 2022 at 7:55 AM Liam Howlett <[email protected]> wrote:
> >
> > * Suren Baghdasaryan <[email protected]> [220225 00:51]:
> > > On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
> > > >
> > > > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > > >
> > > > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > > > otherwise it points to a vma that was freed and when reused leads to
> > > > > > a use-after-free bug.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > --- a/mm/mmap.c
> > > > > > +++ b/mm/mmap.c
> > > > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > > > vma = remove_vma(vma);
> > > > > > cond_resched();
> > > > > > }
> > > > > > + mm->mmap = NULL;
> > > > > > mmap_write_unlock(mm);
> > > > > > vm_unacct_memory(nr_accounted);
> > > > > > }
> > > > >
> > > > > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > > > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > > > I'll be sending this fix to Linus this week.
> > > > >
> > > > > All of which means that the thusly-resolved Maple tree patches might
> > > > > reintroduce this use-after-free bug.
> > > >
> > > > I don't think so? The problem is that VMAs are (currently) part of
> > > > two data structures -- the rbtree and the linked list. remove_vma()
> > > > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> > > >
> > > > With maple tree, the linked list goes away. remove_vma() removes VMAs
> > > > from the maple tree. So anyone looking to iterate over all VMAs has to
> > > > go and look in the maple tree for them ... and there's nothing there.
> > >
> > > Yes, I think you are right. With maple trees we don't need this fix.
> >
> >
> > Yes, this is correct. The maple tree removes the entire linked list...
> > but since the mm is unstable in the exit_mmap(), I had added the
> > destruction of the maple tree there. Maybe this is the wrong place to
> > be destroying the tree tracking the VMAs (althought this patch partially
> > destroys the VMA tracking linked list), but it brought my attention to
> > the race that this patch solves and the process_mrelease() function.
> > Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
> > in process_mrelease()?
>
> That's what we were doing before [1]. That unfortunately has a problem
> of process_mrelease possibly calling the last mmput and being blocked
> on IO completion in exit_aio.

Oh, I see. Thanks.


> The race between exit_mmap and
> process_mrelease is solved by using mmap_lock.

I think an important part of the race fix isn't just the lock holding
but the setting of the start of the linked list to NULL above. That
means the code in __oom_reap_task_mm() via process_mrelease() will
continue to execute but iterate for zero VMAs.

> I think by destroying the maple tree in exit_mmap before the
> mmap_write_unlock call, you keep things working and functionality
> intact. Is there any reason this can't be done?

Yes, unfortunately. If MMF_OOM_SKIP is not set, then process_mrelease()
will call __oom_reap_task_mm() which will get a null pointer dereference
or a use after free in the vma iterator as it tries to iterate the maple
tree. I think the best plan is to set MMF_OOM_SKIP unconditionally
when the mmap_write_lock() is acquired. Doing so will ensure nothing
will try to gain memory by reaping a task that no longer has memory to
yield - or at least won't shortly. If we do use MMF_OOM_SKIP in such a
way, then I think it is safe to quickly drop the lock?

Also, should process_mrelease() be setting MMF_OOM_VICTIM on this mm?
It would enable the fast path on a race with exit_mmap() - thought that
may not be desirable?

>
> [1] ba535c1caf3ee78a ("mm/oom_kill: allow process_mrelease to run
> under mmap_lock protection")
>
> > That would ensure we aren't stepping on an
> > exit_mmap() and potentially the locking change in exit_mmap() wouldn't
> > be needed either? Logically, I view this as process_mrelease() having
> > issue with the fact that the mmaps are no longer stable in tear down
> > regardless of the data structure that is used.
> >
> > Thanks,
> > Liam
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> >

2022-03-11 16:28:21

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

* Suren Baghdasaryan <[email protected]> [220310 18:31]:
> On Thu, Mar 10, 2022 at 2:22 PM Liam Howlett <[email protected]> wrote:
> >
> > * Suren Baghdasaryan <[email protected]> [220310 11:28]:
> > > On Thu, Mar 10, 2022 at 7:55 AM Liam Howlett <[email protected]> wrote:
> > > >
> > > > * Suren Baghdasaryan <[email protected]> [220225 00:51]:
> > > > > On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > > > > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > > > > >
> > > > > > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > > > > > otherwise it points to a vma that was freed and when reused leads to
> > > > > > > > a use-after-free bug.
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > --- a/mm/mmap.c
> > > > > > > > +++ b/mm/mmap.c
> > > > > > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > > > > > vma = remove_vma(vma);
> > > > > > > > cond_resched();
> > > > > > > > }
> > > > > > > > + mm->mmap = NULL;
> > > > > > > > mmap_write_unlock(mm);
> > > > > > > > vm_unacct_memory(nr_accounted);
> > > > > > > > }
> > > > > > >
> > > > > > > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > > > > > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > > > > > I'll be sending this fix to Linus this week.
> > > > > > >
> > > > > > > All of which means that the thusly-resolved Maple tree patches might
> > > > > > > reintroduce this use-after-free bug.
> > > > > >
> > > > > > I don't think so? The problem is that VMAs are (currently) part of
> > > > > > two data structures -- the rbtree and the linked list. remove_vma()
> > > > > > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> > > > > >
> > > > > > With maple tree, the linked list goes away. remove_vma() removes VMAs
> > > > > > from the maple tree. So anyone looking to iterate over all VMAs has to
> > > > > > go and look in the maple tree for them ... and there's nothing there.
> > > > >
> > > > > Yes, I think you are right. With maple trees we don't need this fix.
> > > >
> > > >
> > > > Yes, this is correct. The maple tree removes the entire linked list...
> > > > but since the mm is unstable in the exit_mmap(), I had added the
> > > > destruction of the maple tree there. Maybe this is the wrong place to
> > > > be destroying the tree tracking the VMAs (althought this patch partially
> > > > destroys the VMA tracking linked list), but it brought my attention to
> > > > the race that this patch solves and the process_mrelease() function.
> > > > Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
> > > > in process_mrelease()?
> > >
> > > That's what we were doing before [1]. That unfortunately has a problem
> > > of process_mrelease possibly calling the last mmput and being blocked
> > > on IO completion in exit_aio.
> >
> > Oh, I see. Thanks.
> >
> >
> > > The race between exit_mmap and
> > > process_mrelease is solved by using mmap_lock.
> >
> > I think an important part of the race fix isn't just the lock holding
> > but the setting of the start of the linked list to NULL above. That
> > means the code in __oom_reap_task_mm() via process_mrelease() will
> > continue to execute but iterate for zero VMAs.
> >
> > > I think by destroying the maple tree in exit_mmap before the
> > > mmap_write_unlock call, you keep things working and functionality
> > > intact. Is there any reason this can't be done?
> >
> > Yes, unfortunately. If MMF_OOM_SKIP is not set, then process_mrelease()
> > will call __oom_reap_task_mm() which will get a null pointer dereference
> > or a use after free in the vma iterator as it tries to iterate the maple
> > tree. I think the best plan is to set MMF_OOM_SKIP unconditionally
> > when the mmap_write_lock() is acquired. Doing so will ensure nothing
> > will try to gain memory by reaping a task that no longer has memory to
> > yield - or at least won't shortly. If we do use MMF_OOM_SKIP in such a
> > way, then I think it is safe to quickly drop the lock?
>
> That technically would work but it changes the semantics of
> MMF_OOM_SKIP flag from "mm is of no interest for the OOM killer" to
> something like "mm is empty" akin to mm->mmap == NULL.

Well, an empty mm is of no interest to the oom killer was my thought.

> So, there is no way for maple tree to indicate that it is empty?

On second look, the tree is part of the mm_struct. Destroying will
clear the flags and remove all VMAs, but that should be fine as long as
nothing tries to add anything back to the tree. I don't think there is
a dereference issue here and it will continue to run through the motions
on an empty set as it does right now.

>
> >
> > Also, should process_mrelease() be setting MMF_OOM_VICTIM on this mm?
> > It would enable the fast path on a race with exit_mmap() - thought that
> > may not be desirable?
>
> Michal does not like that approach because again, process_mrelease is
> not oom-killer to set MMF_OOM_VICTIM flag. Besides, we want to get rid
> of that special mm_is_oom_victim(mm) branch inside exit_mmap. Which
> reminds me to look into it again.
>
> >
> > >
> > > [1] ba535c1caf3ee78a ("mm/oom_kill: allow process_mrelease to run
> > > under mmap_lock protection")
> > >
> > > > That would ensure we aren't stepping on an
> > > > exit_mmap() and potentially the locking change in exit_mmap() wouldn't
> > > > be needed either? Logically, I view this as process_mrelease() having
> > > > issue with the fact that the mmaps are no longer stable in tear down
> > > > regardless of the data structure that is used.
> > > >
> > > > Thanks,
> > > > Liam
> > > >
> > > > --
> > > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> >

2022-03-11 22:52:01

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu, Mar 10, 2022 at 7:55 AM Liam Howlett <[email protected]> wrote:
>
> * Suren Baghdasaryan <[email protected]> [220225 00:51]:
> > On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > > otherwise it points to a vma that was freed and when reused leads to
> > > > > a use-after-free bug.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > > vma = remove_vma(vma);
> > > > > cond_resched();
> > > > > }
> > > > > + mm->mmap = NULL;
> > > > > mmap_write_unlock(mm);
> > > > > vm_unacct_memory(nr_accounted);
> > > > > }
> > > >
> > > > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > > I'll be sending this fix to Linus this week.
> > > >
> > > > All of which means that the thusly-resolved Maple tree patches might
> > > > reintroduce this use-after-free bug.
> > >
> > > I don't think so? The problem is that VMAs are (currently) part of
> > > two data structures -- the rbtree and the linked list. remove_vma()
> > > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> > >
> > > With maple tree, the linked list goes away. remove_vma() removes VMAs
> > > from the maple tree. So anyone looking to iterate over all VMAs has to
> > > go and look in the maple tree for them ... and there's nothing there.
> >
> > Yes, I think you are right. With maple trees we don't need this fix.
>
>
> Yes, this is correct. The maple tree removes the entire linked list...
> but since the mm is unstable in the exit_mmap(), I had added the
> destruction of the maple tree there. Maybe this is the wrong place to
> be destroying the tree tracking the VMAs (althought this patch partially
> destroys the VMA tracking linked list), but it brought my attention to
> the race that this patch solves and the process_mrelease() function.
> Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
> in process_mrelease()?

That's what we were doing before [1]. That unfortunately has a problem
of process_mrelease possibly calling the last mmput and being blocked
on IO completion in exit_aio. The race between exit_mmap and
process_mrelease is solved by using mmap_lock.
I think by destroying the maple tree in exit_mmap before the
mmap_write_unlock call, you keep things working and functionality
intact. Is there any reason this can't be done?

[1] ba535c1caf3ee78a ("mm/oom_kill: allow process_mrelease to run
under mmap_lock protection")

> That would ensure we aren't stepping on an
> exit_mmap() and potentially the locking change in exit_mmap() wouldn't
> be needed either? Logically, I view this as process_mrelease() having
> issue with the fact that the mmaps are no longer stable in tear down
> regardless of the data structure that is used.
>
> Thanks,
> Liam
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2022-03-11 23:18:59

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH 1/1] mm: fix use-after-free bug when mm->mmap is reused after being freed

On Thu, Mar 10, 2022 at 2:22 PM Liam Howlett <[email protected]> wrote:
>
> * Suren Baghdasaryan <[email protected]> [220310 11:28]:
> > On Thu, Mar 10, 2022 at 7:55 AM Liam Howlett <[email protected]> wrote:
> > >
> > > * Suren Baghdasaryan <[email protected]> [220225 00:51]:
> > > > On Thu, Feb 24, 2022 at 8:23 PM Matthew Wilcox <[email protected]> wrote:
> > > > >
> > > > > On Thu, Feb 24, 2022 at 08:18:59PM -0800, Andrew Morton wrote:
> > > > > > On Tue, 15 Feb 2022 12:19:22 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > > > >
> > > > > > > After exit_mmap frees all vmas in the mm, mm->mmap needs to be reset,
> > > > > > > otherwise it points to a vma that was freed and when reused leads to
> > > > > > > a use-after-free bug.
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > --- a/mm/mmap.c
> > > > > > > +++ b/mm/mmap.c
> > > > > > > @@ -3186,6 +3186,7 @@ void exit_mmap(struct mm_struct *mm)
> > > > > > > vma = remove_vma(vma);
> > > > > > > cond_resched();
> > > > > > > }
> > > > > > > + mm->mmap = NULL;
> > > > > > > mmap_write_unlock(mm);
> > > > > > > vm_unacct_memory(nr_accounted);
> > > > > > > }
> > > > > >
> > > > > > After the Maple tree patches, mm_struct.mmap doesn't exist. So I'll
> > > > > > revert this fix as part of merging the maple-tree parts of linux-next.
> > > > > > I'll be sending this fix to Linus this week.
> > > > > >
> > > > > > All of which means that the thusly-resolved Maple tree patches might
> > > > > > reintroduce this use-after-free bug.
> > > > >
> > > > > I don't think so? The problem is that VMAs are (currently) part of
> > > > > two data structures -- the rbtree and the linked list. remove_vma()
> > > > > only removes VMAs from the rbtree; it doesn't set mm->mmap to NULL.
> > > > >
> > > > > With maple tree, the linked list goes away. remove_vma() removes VMAs
> > > > > from the maple tree. So anyone looking to iterate over all VMAs has to
> > > > > go and look in the maple tree for them ... and there's nothing there.
> > > >
> > > > Yes, I think you are right. With maple trees we don't need this fix.
> > >
> > >
> > > Yes, this is correct. The maple tree removes the entire linked list...
> > > but since the mm is unstable in the exit_mmap(), I had added the
> > > destruction of the maple tree there. Maybe this is the wrong place to
> > > be destroying the tree tracking the VMAs (althought this patch partially
> > > destroys the VMA tracking linked list), but it brought my attention to
> > > the race that this patch solves and the process_mrelease() function.
> > > Couldn't this be avoided by using mmget_not_zero() instead of mmgrab()
> > > in process_mrelease()?
> >
> > That's what we were doing before [1]. That unfortunately has a problem
> > of process_mrelease possibly calling the last mmput and being blocked
> > on IO completion in exit_aio.
>
> Oh, I see. Thanks.
>
>
> > The race between exit_mmap and
> > process_mrelease is solved by using mmap_lock.
>
> I think an important part of the race fix isn't just the lock holding
> but the setting of the start of the linked list to NULL above. That
> means the code in __oom_reap_task_mm() via process_mrelease() will
> continue to execute but iterate for zero VMAs.
>
> > I think by destroying the maple tree in exit_mmap before the
> > mmap_write_unlock call, you keep things working and functionality
> > intact. Is there any reason this can't be done?
>
> Yes, unfortunately. If MMF_OOM_SKIP is not set, then process_mrelease()
> will call __oom_reap_task_mm() which will get a null pointer dereference
> or a use after free in the vma iterator as it tries to iterate the maple
> tree. I think the best plan is to set MMF_OOM_SKIP unconditionally
> when the mmap_write_lock() is acquired. Doing so will ensure nothing
> will try to gain memory by reaping a task that no longer has memory to
> yield - or at least won't shortly. If we do use MMF_OOM_SKIP in such a
> way, then I think it is safe to quickly drop the lock?

That technically would work but it changes the semantics of
MMF_OOM_SKIP flag from "mm is of no interest for the OOM killer" to
something like "mm is empty" akin to mm->mmap == NULL.
So, there is no way for maple tree to indicate that it is empty?

>
> Also, should process_mrelease() be setting MMF_OOM_VICTIM on this mm?
> It would enable the fast path on a race with exit_mmap() - thought that
> may not be desirable?

Michal does not like that approach because again, process_mrelease is
not oom-killer to set MMF_OOM_VICTIM flag. Besides, we want to get rid
of that special mm_is_oom_victim(mm) branch inside exit_mmap. Which
reminds me to look into it again.

>
> >
> > [1] ba535c1caf3ee78a ("mm/oom_kill: allow process_mrelease to run
> > under mmap_lock protection")
> >
> > > That would ensure we aren't stepping on an
> > > exit_mmap() and potentially the locking change in exit_mmap() wouldn't
> > > be needed either? Logically, I view this as process_mrelease() having
> > > issue with the fact that the mmaps are no longer stable in tear down
> > > regardless of the data structure that is used.
> > >
> > > Thanks,
> > > Liam
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>