This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
This patch fixed an issue reported by syzkaller in [1]. However, this
turned out to be only a band-aid in binder. The root cause, as bisected
by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap()
when mas_preallocate() fails"). We no longer need the patch for binder.
Reverting such patch allows us to have a lockless access to alloc->vma
in specific cases where the mmap_lock is not required. This approach
avoids the contention that caused a performance regression.
[1] https://lore.kernel.org/all/[email protected]
[cmllamas: resolved conflicts with rework of alloc->mm and removal of
binder_alloc_set_vma() also fixed comment section]
Cc: Liam Howlett <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 17 +++++++++--------
drivers/android/binder_alloc.h | 4 ++--
drivers/android/binder_alloc_selftest.c | 2 +-
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 92c814ec44fe..eb082b33115b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -213,7 +213,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
if (mm) {
mmap_read_lock(mm);
- vma = vma_lookup(mm, alloc->vma_addr);
+ vma = alloc->vma;
}
if (!vma && need_mm) {
@@ -314,9 +314,11 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
{
struct vm_area_struct *vma = NULL;
- if (alloc->vma_addr)
- vma = vma_lookup(alloc->mm, alloc->vma_addr);
-
+ if (alloc->vma) {
+ /* Look at description in binder_alloc_set_vma */
+ smp_rmb();
+ vma = alloc->vma;
+ }
return vma;
}
@@ -775,7 +777,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
buffer->free = 1;
binder_insert_free_buffer(alloc, buffer);
alloc->free_async_space = alloc->buffer_size / 2;
- alloc->vma_addr = vma->vm_start;
+ alloc->vma = vma;
return 0;
@@ -805,8 +807,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
buffers = 0;
mutex_lock(&alloc->mutex);
- BUG_ON(alloc->vma_addr &&
- vma_lookup(alloc->mm, alloc->vma_addr));
+ BUG_ON(alloc->vma);
while ((n = rb_first(&alloc->allocated_buffers))) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -958,7 +959,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
*/
void binder_alloc_vma_close(struct binder_alloc *alloc)
{
- alloc->vma_addr = 0;
+ alloc->vma = 0;
}
/**
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 0f811ac4bcff..138d1d5af9ce 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -75,7 +75,7 @@ struct binder_lru_page {
/**
* struct binder_alloc - per-binder proc state for binder allocator
* @mutex: protects binder_alloc fields
- * @vma_addr: vm_area_struct->vm_start passed to mmap_handler
+ * @vma: vm_area_struct passed to mmap_handler
* (invariant after mmap)
* @mm: copy of task->mm (invariant after open)
* @buffer: base of per-proc address space mapped via mmap
@@ -99,7 +99,7 @@ struct binder_lru_page {
*/
struct binder_alloc {
struct mutex mutex;
- unsigned long vma_addr;
+ struct vm_area_struct *vma;
struct mm_struct *mm;
void __user *buffer;
struct list_head buffers;
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 43a881073a42..c2b323bc3b3a 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
if (!binder_selftest_run)
return;
mutex_lock(&binder_selftest_lock);
- if (!binder_selftest_run || !alloc->vma_addr)
+ if (!binder_selftest_run || !alloc->vma)
goto done;
pr_info("STARTED\n");
binder_selftest_alloc_offset(alloc, end_offset, 0);
--
2.40.0.634.g4ca3ef3211-goog
Cc linux-mm
* Carlos Llamas <[email protected]> [230424 16:56]:
> This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
>
> This patch fixed an issue reported by syzkaller in [1]. However, this
> turned out to be only a band-aid in binder. The root cause, as bisected
> by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap()
> when mas_preallocate() fails"). We no longer need the patch for binder.
>
> Reverting such patch allows us to have a lockless access to alloc->vma
> in specific cases where the mmap_lock is not required.
Can you elaborate on the situation where recording a VMA pointer and
later accessing it outside the mmap_lock is okay?
>This approach
> avoids the contention that caused a performance regression.
>
> [1] https://lore.kernel.org/all/[email protected]
>
> [cmllamas: resolved conflicts with rework of alloc->mm and removal of
> binder_alloc_set_vma() also fixed comment section]
>
> Cc: Liam Howlett <[email protected]>
> Cc: Suren Baghdasaryan <[email protected]>
> Signed-off-by: Carlos Llamas <[email protected]>
> ---
> drivers/android/binder_alloc.c | 17 +++++++++--------
> drivers/android/binder_alloc.h | 4 ++--
> drivers/android/binder_alloc_selftest.c | 2 +-
> 3 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index 92c814ec44fe..eb082b33115b 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -213,7 +213,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
>
> if (mm) {
> mmap_read_lock(mm);
> - vma = vma_lookup(mm, alloc->vma_addr);
> + vma = alloc->vma;
> }
>
> if (!vma && need_mm) {
> @@ -314,9 +314,11 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
> {
> struct vm_area_struct *vma = NULL;
>
> - if (alloc->vma_addr)
> - vma = vma_lookup(alloc->mm, alloc->vma_addr);
> -
> + if (alloc->vma) {
> + /* Look at description in binder_alloc_set_vma */
> + smp_rmb();
> + vma = alloc->vma;
> + }
> return vma;
> }
>
> @@ -775,7 +777,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
> buffer->free = 1;
> binder_insert_free_buffer(alloc, buffer);
> alloc->free_async_space = alloc->buffer_size / 2;
> - alloc->vma_addr = vma->vm_start;
> + alloc->vma = vma;
>
> return 0;
>
> @@ -805,8 +807,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
>
> buffers = 0;
> mutex_lock(&alloc->mutex);
> - BUG_ON(alloc->vma_addr &&
> - vma_lookup(alloc->mm, alloc->vma_addr));
> + BUG_ON(alloc->vma);
>
> while ((n = rb_first(&alloc->allocated_buffers))) {
> buffer = rb_entry(n, struct binder_buffer, rb_node);
> @@ -958,7 +959,7 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
> */
> void binder_alloc_vma_close(struct binder_alloc *alloc)
> {
> - alloc->vma_addr = 0;
> + alloc->vma = 0;
> }
>
> /**
> diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
> index 0f811ac4bcff..138d1d5af9ce 100644
> --- a/drivers/android/binder_alloc.h
> +++ b/drivers/android/binder_alloc.h
> @@ -75,7 +75,7 @@ struct binder_lru_page {
> /**
> * struct binder_alloc - per-binder proc state for binder allocator
> * @mutex: protects binder_alloc fields
> - * @vma_addr: vm_area_struct->vm_start passed to mmap_handler
> + * @vma: vm_area_struct passed to mmap_handler
> * (invariant after mmap)
> * @mm: copy of task->mm (invariant after open)
> * @buffer: base of per-proc address space mapped via mmap
> @@ -99,7 +99,7 @@ struct binder_lru_page {
> */
> struct binder_alloc {
> struct mutex mutex;
> - unsigned long vma_addr;
> + struct vm_area_struct *vma;
> struct mm_struct *mm;
> void __user *buffer;
> struct list_head buffers;
> diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
> index 43a881073a42..c2b323bc3b3a 100644
> --- a/drivers/android/binder_alloc_selftest.c
> +++ b/drivers/android/binder_alloc_selftest.c
> @@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder_alloc *alloc)
> if (!binder_selftest_run)
> return;
> mutex_lock(&binder_selftest_lock);
> - if (!binder_selftest_run || !alloc->vma_addr)
> + if (!binder_selftest_run || !alloc->vma)
> goto done;
> pr_info("STARTED\n");
> binder_selftest_alloc_offset(alloc, end_offset, 0);
> --
> 2.40.0.634.g4ca3ef3211-goog
>
O Mon, Apr 24, 2023 at 06:34:19PM -0400, Liam R. Howlett wrote:
> Cc linux-mm
>
> * Carlos Llamas <[email protected]> [230424 16:56]:
> > This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
> >
> > This patch fixed an issue reported by syzkaller in [1]. However, this
> > turned out to be only a band-aid in binder. The root cause, as bisected
> > by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap()
> > when mas_preallocate() fails"). We no longer need the patch for binder.
> >
> > Reverting such patch allows us to have a lockless access to alloc->vma
> > in specific cases where the mmap_lock is not required.
>
> Can you elaborate on the situation where recording a VMA pointer and
> later accessing it outside the mmap_lock is okay?
The specifics are in the third patch of this patchset but the gist of it
is that during ->mmap() handler, binder will complete the initialization
of the binder_alloc structure. With the last step of this process being
the caching of the vma pointer. Since the ordering is protected with a
barrier we can then check alloc->vma to determine if the initialization
has been completed.
Since this check is part of the critical path for every single binder
transaction, the performance plummeted when we started contending for
the mmap_lock. In this particular case, binder doesn't actually use the
vma. It only needs to know if the internal structure has been fully
initialized and it is safe to use it.
FWIW, this had been the design for ~15 years. The original patch is
this: https://git.kernel.org/torvalds/c/457b9a6f09f0
* Carlos Llamas <[email protected]> [230424 19:11]:
> O Mon, Apr 24, 2023 at 06:34:19PM -0400, Liam R. Howlett wrote:
> > Cc linux-mm
> >
> > * Carlos Llamas <[email protected]> [230424 16:56]:
> > > This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
> > >
> > > This patch fixed an issue reported by syzkaller in [1]. However, this
> > > turned out to be only a band-aid in binder. The root cause, as bisected
> > > by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap()
> > > when mas_preallocate() fails"). We no longer need the patch for binder.
> > >
> > > Reverting such patch allows us to have a lockless access to alloc->vma
> > > in specific cases where the mmap_lock is not required.
> >
> > Can you elaborate on the situation where recording a VMA pointer and
> > later accessing it outside the mmap_lock is okay?
>
> The specifics are in the third patch of this patchset but the gist of it
> is that during ->mmap() handler, binder will complete the initialization
> of the binder_alloc structure. With the last step of this process being
> the caching of the vma pointer. Since the ordering is protected with a
> barrier we can then check alloc->vma to determine if the initialization
> has been completed.
>
> Since this check is part of the critical path for every single binder
> transaction, the performance plummeted when we started contending for
> the mmap_lock. In this particular case, binder doesn't actually use the
> vma.
So why does binder_update_page_range() take the mmap_read_lock then use
the cached vma in the reverted patch?
If you want to use it as a flag to see if the driver is initialized, why
not use the cached address != 0?
Or better yet,
>It only needs to know if the internal structure has been fully
> initialized and it is safe to use it.
This seems like a good reason to use your own rwsem. This is,
essentially, rolling your own lock with
smp_store_release()/smp_load_acquire() and a pointer which should not be
cached.
>
> FWIW, this had been the design for ~15 years. The original patch is
> this: https://git.kernel.org/torvalds/c/457b9a6f09f0
On Mon, Apr 24, 2023 at 09:43:28PM -0400, Liam R. Howlett wrote:
> * Carlos Llamas <[email protected]> [230424 19:11]:
> >
> > The specifics are in the third patch of this patchset but the gist of it
> > is that during ->mmap() handler, binder will complete the initialization
> > of the binder_alloc structure. With the last step of this process being
> > the caching of the vma pointer. Since the ordering is protected with a
> > barrier we can then check alloc->vma to determine if the initialization
> > has been completed.
> >
> > Since this check is part of the critical path for every single binder
> > transaction, the performance plummeted when we started contending for
> > the mmap_lock. In this particular case, binder doesn't actually use the
> > vma.
>
> So why does binder_update_page_range() take the mmap_read_lock then use
> the cached vma in the reverted patch?
>
> If you want to use it as a flag to see if the driver is initialized, why
> not use the cached address != 0?
>
> Or better yet,
>
> >It only needs to know if the internal structure has been fully
> > initialized and it is safe to use it.
>
> This seems like a good reason to use your own rwsem. This is,
> essentially, rolling your own lock with
> smp_store_release()/smp_load_acquire() and a pointer which should not be
> cached.
We can't use an rwsem to protect the initialization. We already have an
alloc->mutex which would be an option. However, using it under ->mmap()
would only lead to dead-locks with the mmap_lock.
I agree with you that we could use some other flag instead of the vma
pointer to signal the initialization. I've actually tried several times
to come up with a scenario in which caching the vma pointer becomes an
issue to stop doing this altogether. However, I can't find anything
concrete.
I don't think the current solution in which we do all these unnecessary
vma lookups is correct. Instead, I'm currently working on a redesign of
this section in which binder stops to allocate/insert pages manually. We
should be making use of the page-fault handler and let the infra handle
all the work. The overall idea is here:
https://lore.kernel.org/all/[email protected]/
It's hard to make the case for just dropping the vma pointer after ~15
years and take the performance hit without having an actual issue to
support this idea. So I'll revert this for now and keep working on the
page-fault solution.
Thanks Liam, I'll keep you in the loop.
* Carlos Llamas <[email protected]> [230426 17:17]:
> On Mon, Apr 24, 2023 at 09:43:28PM -0400, Liam R. Howlett wrote:
> > * Carlos Llamas <[email protected]> [230424 19:11]:
> > >
> > > The specifics are in the third patch of this patchset but the gist of it
> > > is that during ->mmap() handler, binder will complete the initialization
> > > of the binder_alloc structure. With the last step of this process being
> > > the caching of the vma pointer. Since the ordering is protected with a
> > > barrier we can then check alloc->vma to determine if the initialization
> > > has been completed.
> > >
> > > Since this check is part of the critical path for every single binder
> > > transaction, the performance plummeted when we started contending for
> > > the mmap_lock. In this particular case, binder doesn't actually use the
> > > vma.
> >
> > So why does binder_update_page_range() take the mmap_read_lock then use
> > the cached vma in the reverted patch?
> >
> > If you want to use it as a flag to see if the driver is initialized, why
> > not use the cached address != 0?
> >
> > Or better yet,
> >
> > >It only needs to know if the internal structure has been fully
> > > initialized and it is safe to use it.
> >
> > This seems like a good reason to use your own rwsem. This is,
> > essentially, rolling your own lock with
> > smp_store_release()/smp_load_acquire() and a pointer which should not be
> > cached.
>
> We can't use an rwsem to protect the initialization. We already have an
> alloc->mutex which would be an option. However, using it under ->mmap()
> would only lead to dead-locks with the mmap_lock.
>
> I agree with you that we could use some other flag instead of the vma
> pointer to signal the initialization. I've actually tried several times
> to come up with a scenario in which caching the vma pointer becomes an
> issue to stop doing this altogether. However, I can't find anything
> concrete.
>
> I don't think the current solution in which we do all these unnecessary
> vma lookups is correct. Instead, I'm currently working on a redesign of
> this section in which binder stops to allocate/insert pages manually. We
> should be making use of the page-fault handler and let the infra handle
> all the work. The overall idea is here:
> https://lore.kernel.org/all/[email protected]/
>
> It's hard to make the case for just dropping the vma pointer after ~15
> years and take the performance hit without having an actual issue to
> support this idea. So I'll revert this for now and keep working on the
> page-fault solution.
>
I came across this [1] when I was looking into something else and
thought I'd double back and make sure your fix for this UAF is also
included, since your revert will restore this bug.
I do still see the mmap_read_lock() in binder_update_page_range() vs the
required mmap_write_lock(), at least in my branch.
[1] https://lore.kernel.org/all/[email protected]/
Thanks,
Liam
On Thu, May 18, 2023 at 10:40:52AM -0400, Liam R. Howlett wrote:
>
> I came across this [1] when I was looking into something else and
> thought I'd double back and make sure your fix for this UAF is also
> included, since your revert will restore this bug.
>
> I do still see the mmap_read_lock() in binder_update_page_range() vs the
> required mmap_write_lock(), at least in my branch.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
Thanks Liam, I believe you are correct. The UAF should trigger on newer
releases after the revert of your patch. I'll try to reproduce the issue
to confirm and will send the fix afterwards. This was a nice find!
Thanks,
--
Carlos Llamas