2023-11-02 19:01:45

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 00/21] binder: convert alloc->mutex to spinlock

This series refactors the binder buffer allocation space to be able to
convert the alloc->mutex into a spinlock. Doing so decreases the latency
of binder transactions seen under worst-case scenarios.

I attempted to organize and improve readability of things that were
touched during this process. I also include a couple of fixes (unrelated
to the lock convertion) for issues I found along the way.

Regards,
Carlos Llamas

Cc: Todd Kjos <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>

Carlos Llamas (21):
binder: use EPOLLERR from eventpoll.h
binder: fix use-after-free in shinker's callback
binder: fix race between mmput() and do_exit()
binder: fix async space check for 0-sized buffers
binder: fix trivial typo of binder_free_buf_locked()
binder: fix comment on binder_alloc_new_buf() return value
binder: remove extern from function prototypes
binder: keep vma addresses type as unsigned long
binder: split up binder_update_page_range()
binder: do unlocked work in binder_alloc_new_buf()
binder: remove pid param in binder_alloc_new_buf()
binder: separate the no-space debugging logic
binder: relocate low space calculation
binder: do not add pages to LRU in release path
binder: relocate binder_alloc_clear_buf()
binder: refactor page range allocation
binder: malloc new_buffer outside of locks
binder: initialize lru pages in mmap callback
binder: perform page allocation outside of locks
binder: reverse locking order in shrinker callback
binder: switch alloc->mutex to spinlock_t

drivers/android/binder.c | 25 +-
drivers/android/binder_alloc.c | 746 ++++++++++++------------
drivers/android/binder_alloc.h | 57 +-
drivers/android/binder_alloc_selftest.c | 8 +-
drivers/android/binder_trace.h | 2 +-
5 files changed, 428 insertions(+), 410 deletions(-)


base-commit: 21e80f3841c01aeaf32d7aee7bbc87b3db1aa0c6
--
2.42.0.869.gea05f2083d-goog


2023-11-02 19:01:46

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 06/21] binder: fix comment on binder_alloc_new_buf() return value

Update the comments of binder_alloc_new_buf() to reflect that the return
value of the function is now ERR_PTR(-errno) on failure.

No functional changes in this patch.

Cc: [email protected]
Fixes: 57ada2fb2250 ("binder: add log information for binder transaction failures")
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index cd720bb5c9ce..0e8312f4b771 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -558,7 +558,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* is the sum of the three given sizes (each rounded up to
* pointer-sized boundary)
*
- * Return: The allocated buffer or %NULL if error
+ * Return: The allocated buffer or %ERR_PTR(-errno) if error
*/
struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:17

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 03/21] binder: fix race between mmput() and do_exit()

Task A calls binder_update_page_range() to allocate and insert pages on
a remote address space from Task B. For this, Task A pins the remote mm
via mmget_not_zero() first. This can race with Task B do_exit() and the
final mmput() refcount decrement will come from Task A.

Task A | Task B
------------------+------------------
mmget_not_zero() |
| do_exit()
| exit_mm()
| mmput()
mmput() |
exit_mmap() |
remove_vma() |
fput() |

In this case, the work of ____fput() from Task B is queued up in Task A
as TWA_RESUME. So in theory, Task A returns to userspace and the cleanup
work gets executed. However, Task A instead sleep, waiting for a reply
from Task B that never comes (it's dead).

This means the binder_deferred_release() is blocked until an unrelated
binder event forces Task A to go back to userspace. All the associated
death notifications will also be delayed until then.

In order to fix this use mmput_async() that will schedule the work in
the corresponding mm->async_put_work WQ instead of Task A.

Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index c4d60d81221b..046c3ae9eb8f 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -271,7 +271,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
}
if (mm) {
mmap_write_unlock(mm);
- mmput(mm);
+ mmput_async(mm);
}
return 0;

@@ -304,7 +304,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
err_no_vma:
if (mm) {
mmap_write_unlock(mm);
- mmput(mm);
+ mmput_async(mm);
}
return vma ? -ENOMEM : -ESRCH;
}
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:25

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 12/21] binder: separate the no-space debugging logic

Move the no-space debugging logic into a separate function. Lets also
mark this branch as unlikely in binder_alloc_new_buf_locked() as most
requests will fit without issue.

Also add a few cosmetic changes and suggestions from checkpatch.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 71 +++++++++++++++++++---------------
1 file changed, 40 insertions(+), 31 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index eacc5a3ce538..3fe5c734c3df 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -323,6 +323,43 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return smp_load_acquire(&alloc->vma);
}

+static void debug_no_space_locked(struct binder_alloc *alloc)
+{
+ size_t largest_alloc_size = 0;
+ struct binder_buffer *buffer;
+ size_t allocated_buffers = 0;
+ size_t largest_free_size = 0;
+ size_t total_alloc_size = 0;
+ size_t total_free_size = 0;
+ size_t free_buffers = 0;
+ size_t buffer_size;
+ struct rb_node *n;
+
+ for (n = rb_first(&alloc->allocated_buffers); n; n = rb_next(n)) {
+ buffer = rb_entry(n, struct binder_buffer, rb_node);
+ buffer_size = binder_alloc_buffer_size(alloc, buffer);
+ allocated_buffers++;
+ total_alloc_size += buffer_size;
+ if (buffer_size > largest_alloc_size)
+ largest_alloc_size = buffer_size;
+ }
+
+ for (n = rb_first(&alloc->free_buffers); n; n = rb_next(n)) {
+ buffer = rb_entry(n, struct binder_buffer, rb_node);
+ buffer_size = binder_alloc_buffer_size(alloc, buffer);
+ free_buffers++;
+ total_free_size += buffer_size;
+ if (buffer_size > largest_free_size)
+ largest_free_size = buffer_size;
+ }
+
+ binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+ "allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)\n",
+ total_alloc_size, allocated_buffers,
+ largest_alloc_size, total_free_size,
+ free_buffers, largest_free_size);
+}
+
static bool debug_low_async_space_locked(struct binder_alloc *alloc)
{
/*
@@ -404,42 +441,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
}
}

- if (best_fit == NULL) {
- size_t allocated_buffers = 0;
- size_t largest_alloc_size = 0;
- size_t total_alloc_size = 0;
- size_t free_buffers = 0;
- size_t largest_free_size = 0;
- size_t total_free_size = 0;
-
- for (n = rb_first(&alloc->allocated_buffers); n != NULL;
- n = rb_next(n)) {
- buffer = rb_entry(n, struct binder_buffer, rb_node);
- buffer_size = binder_alloc_buffer_size(alloc, buffer);
- allocated_buffers++;
- total_alloc_size += buffer_size;
- if (buffer_size > largest_alloc_size)
- largest_alloc_size = buffer_size;
- }
- for (n = rb_first(&alloc->free_buffers); n != NULL;
- n = rb_next(n)) {
- buffer = rb_entry(n, struct binder_buffer, rb_node);
- buffer_size = binder_alloc_buffer_size(alloc, buffer);
- free_buffers++;
- total_free_size += buffer_size;
- if (buffer_size > largest_free_size)
- largest_free_size = buffer_size;
- }
+ if (unlikely(!best_fit)) {
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%d: binder_alloc_buf size %zd failed, no address space\n",
alloc->pid, size);
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "allocated: %zd (num: %zd largest: %zd), free: %zd (num: %zd largest: %zd)\n",
- total_alloc_size, allocated_buffers,
- largest_alloc_size, total_free_size,
- free_buffers, largest_free_size);
+ debug_no_space_locked(alloc);
return ERR_PTR(-ENOSPC);
}
+
if (n == NULL) {
buffer = rb_entry(best_fit, struct binder_buffer, rb_node);
buffer_size = binder_alloc_buffer_size(alloc, buffer);
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:34

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 08/21] binder: keep vma addresses type as unsigned long

The vma addresses in binder are currently stored as void __user *. This
requires casting back and forth between the mm/ api which uses unsigned
long. Since we also do internal arithmetic on these addresses we end up
having to cast them _again_ to an integer type.

Lets stop all the unnecessary casting which kills code readability and
store the virtual addresses as the native unsigned long from mm/. Note
that this approach is preferred over uintptr_t as Linus explains in [1].

Opportunistically add a few cosmetic touchups.

Link: https://lore.kernel.org/all/CAHk-=wj2OHy-5e+srG1fy+ZU00TmZ1NFp6kFLbVLMXHe7A1d-g@mail.gmail.com/ [1]
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder.c | 21 +++---
drivers/android/binder_alloc.c | 91 +++++++++++--------------
drivers/android/binder_alloc.h | 6 +-
drivers/android/binder_alloc_selftest.c | 6 +-
drivers/android/binder_trace.h | 2 +-
5 files changed, 57 insertions(+), 69 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 71a40a4c546f..437d1097b118 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2077,9 +2077,8 @@ static void binder_transaction_buffer_release(struct binder_proc *proc,
* Convert the address to an offset relative to
* the base of the transaction buffer.
*/
- fda_offset =
- (parent->buffer - (uintptr_t)buffer->user_data) +
- fda->parent_offset;
+ fda_offset = parent->buffer - buffer->user_data +
+ fda->parent_offset;
for (fd_index = 0; fd_index < fda->num_fds;
fd_index++) {
u32 fd;
@@ -2597,7 +2596,7 @@ static int binder_translate_fd_array(struct list_head *pf_head,
* Convert the address to an offset relative to
* the base of the transaction buffer.
*/
- fda_offset = (parent->buffer - (uintptr_t)t->buffer->user_data) +
+ fda_offset = parent->buffer - t->buffer->user_data +
fda->parent_offset;
sender_ufda_base = (void __user *)(uintptr_t)sender_uparent->buffer +
fda->parent_offset;
@@ -2672,8 +2671,9 @@ static int binder_fixup_parent(struct list_head *pf_head,
proc->pid, thread->pid);
return -EINVAL;
}
- buffer_offset = bp->parent_offset +
- (uintptr_t)parent->buffer - (uintptr_t)b->user_data;
+
+ buffer_offset = bp->parent_offset + parent->buffer - b->user_data;
+
return binder_add_fixup(pf_head, buffer_offset, bp->buffer, 0);
}

@@ -3250,7 +3250,7 @@ static void binder_transaction(struct binder_proc *proc,
ALIGN(extra_buffers_size, sizeof(void *)) -
ALIGN(secctx_sz, sizeof(u64));

- t->security_ctx = (uintptr_t)t->buffer->user_data + buf_offset;
+ t->security_ctx = t->buffer->user_data + buf_offset;
err = binder_alloc_copy_to_buffer(&target_proc->alloc,
t->buffer, buf_offset,
secctx, secctx_sz);
@@ -3527,8 +3527,7 @@ static void binder_transaction(struct binder_proc *proc,
goto err_translate_failed;
}
/* Fixup buffer pointer to target proc address space */
- bp->buffer = (uintptr_t)
- t->buffer->user_data + sg_buf_offset;
+ bp->buffer = t->buffer->user_data + sg_buf_offset;
sg_buf_offset += ALIGN(bp->length, sizeof(u64));

num_valid = (buffer_offset - off_start_offset) /
@@ -4698,7 +4697,7 @@ static int binder_thread_read(struct binder_proc *proc,
}
trd->data_size = t->buffer->data_size;
trd->offsets_size = t->buffer->offsets_size;
- trd->data.ptr.buffer = (uintptr_t)t->buffer->user_data;
+ trd->data.ptr.buffer = t->buffer->user_data;
trd->data.ptr.offsets = trd->data.ptr.buffer +
ALIGN(t->buffer->data_size,
sizeof(void *));
@@ -5981,7 +5980,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m,
}
if (buffer->target_node)
seq_printf(m, " node %d", buffer->target_node->debug_id);
- seq_printf(m, " size %zd:%zd data %pK\n",
+ seq_printf(m, " size %zd:%zd data %lx\n",
buffer->data_size, buffer->offsets_size,
buffer->user_data);
}
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 0e8312f4b771..b99d9845d69a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -125,23 +125,20 @@ static void binder_insert_allocated_buffer_locked(

static struct binder_buffer *binder_alloc_prepare_to_free_locked(
struct binder_alloc *alloc,
- uintptr_t user_ptr)
+ unsigned long user_ptr)
{
struct rb_node *n = alloc->allocated_buffers.rb_node;
struct binder_buffer *buffer;
- void __user *uptr;
-
- uptr = (void __user *)user_ptr;

while (n) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
BUG_ON(buffer->free);

- if (uptr < buffer->user_data)
+ if (user_ptr < buffer->user_data) {
n = n->rb_left;
- else if (uptr > buffer->user_data)
+ } else if (user_ptr > buffer->user_data) {
n = n->rb_right;
- else {
+ } else {
/*
* Guard against user threads attempting to
* free the buffer when in use by kernel or
@@ -168,7 +165,7 @@ static struct binder_buffer *binder_alloc_prepare_to_free_locked(
* Return: Pointer to buffer or NULL
*/
struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
- uintptr_t user_ptr)
+ unsigned long user_ptr)
{
struct binder_buffer *buffer;

@@ -179,18 +176,17 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
}

static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
- void __user *start, void __user *end)
+ unsigned long start, unsigned long end)
{
- void __user *page_addr;
- unsigned long user_page_addr;
- struct binder_lru_page *page;
struct vm_area_struct *vma = NULL;
+ struct binder_lru_page *page;
struct mm_struct *mm = NULL;
+ unsigned long page_addr;
bool need_mm = false;

binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: %s pages %pK-%pK\n", alloc->pid,
- allocate ? "allocate" : "free", start, end);
+ "%d: %s allocate pages %lx-%lx\n", alloc->pid,
+ allocate ? "allocate" : "free", start, end);

if (end <= start)
return 0;
@@ -249,18 +245,17 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
__GFP_HIGHMEM |
__GFP_ZERO);
if (!page->page_ptr) {
- pr_err("%d: binder_alloc_buf failed for page at %pK\n",
- alloc->pid, page_addr);
+ pr_err("%d: binder_alloc_buf failed for page at %lx\n",
+ alloc->pid, page_addr);
goto err_alloc_page_failed;
}
page->alloc = alloc;
INIT_LIST_HEAD(&page->lru);

- user_page_addr = (uintptr_t)page_addr;
- ret = vm_insert_page(vma, user_page_addr, page[0].page_ptr);
+ ret = vm_insert_page(vma, page_addr, page->page_ptr);
if (ret) {
pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n",
- alloc->pid, user_page_addr);
+ alloc->pid, page_addr);
goto err_vm_insert_page_failed;
}

@@ -378,9 +373,9 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_buffer *buffer;
size_t buffer_size;
struct rb_node *best_fit = NULL;
- void __user *has_page_addr;
- void __user *end_page_addr;
size_t size, data_offsets_size;
+ unsigned long has_page_addr;
+ unsigned long end_page_addr;
int ret;

/* Check binder_alloc is fully initialized */
@@ -479,15 +474,13 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
"%d: binder_alloc_buf size %zd got buffer %pK size %zd\n",
alloc->pid, size, buffer, buffer_size);

- has_page_addr = (void __user *)
- (((uintptr_t)buffer->user_data + buffer_size) & PAGE_MASK);
+ has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
WARN_ON(n && buffer_size != size);
- end_page_addr =
- (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size);
+ end_page_addr = PAGE_ALIGN(buffer->user_data + size);
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
- ret = binder_update_page_range(alloc, 1, (void __user *)
- PAGE_ALIGN((uintptr_t)buffer->user_data), end_page_addr);
+ ret = binder_update_page_range(alloc, 1, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
if (ret)
return ERR_PTR(ret);

@@ -500,7 +493,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
__func__, alloc->pid);
goto err_alloc_buf_struct_failed;
}
- new_buffer->user_data = (u8 __user *)buffer->user_data + size;
+ new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
new_buffer->free = 1;
binder_insert_free_buffer(alloc, new_buffer);
@@ -538,8 +531,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
return buffer;

err_alloc_buf_struct_failed:
- binder_update_page_range(alloc, 0, (void __user *)
- PAGE_ALIGN((uintptr_t)buffer->user_data),
+ binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
end_page_addr);
return ERR_PTR(-ENOMEM);
}
@@ -576,15 +568,14 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
return buffer;
}

-static void __user *buffer_start_page(struct binder_buffer *buffer)
+static unsigned long buffer_start_page(struct binder_buffer *buffer)
{
- return (void __user *)((uintptr_t)buffer->user_data & PAGE_MASK);
+ return buffer->user_data & PAGE_MASK;
}

-static void __user *prev_buffer_end_page(struct binder_buffer *buffer)
+static unsigned long prev_buffer_end_page(struct binder_buffer *buffer)
{
- return (void __user *)
- (((uintptr_t)(buffer->user_data) - 1) & PAGE_MASK);
+ return (buffer->user_data - 1) & PAGE_MASK;
}

static void binder_delete_free_buffer(struct binder_alloc *alloc,
@@ -599,7 +590,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (prev_buffer_end_page(prev) == buffer_start_page(buffer)) {
to_free = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK share page with %pK\n",
+ "%d: merge free, buffer %lx share page with %lx\n",
alloc->pid, buffer->user_data,
prev->user_data);
}
@@ -609,7 +600,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (buffer_start_page(next) == buffer_start_page(buffer)) {
to_free = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK share page with %pK\n",
+ "%d: merge free, buffer %lx share page with %lx\n",
alloc->pid,
buffer->user_data,
next->user_data);
@@ -618,17 +609,17 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,

if (PAGE_ALIGNED(buffer->user_data)) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer start %pK is page aligned\n",
+ "%d: merge free, buffer start %lx is page aligned\n",
alloc->pid, buffer->user_data);
to_free = false;
}

if (to_free) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: merge free, buffer %pK do not share page with %pK or %pK\n",
+ "%d: merge free, buffer %lx do not share page with %lx or %lx\n",
alloc->pid, buffer->user_data,
prev->user_data,
- next ? next->user_data : NULL);
+ next ? next->user_data : 0);
binder_update_page_range(alloc, 0, buffer_start_page(buffer),
buffer_start_page(buffer) + PAGE_SIZE);
}
@@ -665,10 +656,8 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
alloc->pid, size, alloc->free_async_space);
}

- binder_update_page_range(alloc, 0,
- (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data),
- (void __user *)(((uintptr_t)
- buffer->user_data + buffer_size) & PAGE_MASK));
+ binder_update_page_range(alloc, 0, PAGE_ALIGN(buffer->user_data),
+ (buffer->user_data + buffer_size) & PAGE_MASK);

rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
@@ -757,7 +746,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
SZ_4M);
mutex_unlock(&binder_alloc_mmap_lock);

- alloc->buffer = (void __user *)vma->vm_start;
+ alloc->buffer = vma->vm_start;

alloc->pages = kcalloc(alloc->buffer_size / PAGE_SIZE,
sizeof(alloc->pages[0]),
@@ -790,7 +779,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
kfree(alloc->pages);
alloc->pages = NULL;
err_alloc_pages_failed:
- alloc->buffer = NULL;
+ alloc->buffer = 0;
mutex_lock(&binder_alloc_mmap_lock);
alloc->buffer_size = 0;
err_already_mapped:
@@ -843,7 +832,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
int i;

for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
- void __user *page_addr;
+ unsigned long page_addr;
bool on_lru;

if (!alloc->pages[i].page_ptr)
@@ -853,7 +842,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
&alloc->pages[i].lru);
page_addr = alloc->buffer + i * PAGE_SIZE;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%s: %d: page %d at %pK %s\n",
+ "%s: %d: page %d at %lx %s\n",
__func__, alloc->pid, i, page_addr,
on_lru ? "on lru" : "active");
__free_page(alloc->pages[i].page_ptr);
@@ -873,7 +862,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
static void print_binder_buffer(struct seq_file *m, const char *prefix,
struct binder_buffer *buffer)
{
- seq_printf(m, "%s %d: %pK size %zd:%zd:%zd %s\n",
+ seq_printf(m, "%s %d: %lx size %zd:%zd:%zd %s\n",
prefix, buffer->debug_id, buffer->user_data,
buffer->data_size, buffer->offsets_size,
buffer->extra_buffers_size,
@@ -987,7 +976,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
struct binder_lru_page,
lru);
struct binder_alloc *alloc;
- uintptr_t page_addr;
+ unsigned long page_addr;
size_t index;
struct vm_area_struct *vma;

@@ -999,7 +988,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_page_already_freed;

index = page - alloc->pages;
- page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
+ page_addr = alloc->buffer + index * PAGE_SIZE;

mm = alloc->mm;
if (!mmget_not_zero(mm))
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 82380febdd85..cb19677a5c15 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -56,7 +56,7 @@ struct binder_buffer {
size_t data_size;
size_t offsets_size;
size_t extra_buffers_size;
- void __user *user_data;
+ unsigned long user_data;
int pid;
};

@@ -101,7 +101,7 @@ struct binder_alloc {
struct mutex mutex;
struct vm_area_struct *vma;
struct mm_struct *mm;
- void __user *buffer;
+ unsigned long buffer;
struct list_head buffers;
struct rb_root free_buffers;
struct rb_root allocated_buffers;
@@ -133,7 +133,7 @@ void binder_alloc_shrinker_exit(void);
void binder_alloc_vma_close(struct binder_alloc *alloc);
struct binder_buffer *
binder_alloc_prepare_to_free(struct binder_alloc *alloc,
- uintptr_t user_ptr);
+ unsigned long user_ptr);
void binder_alloc_free_buf(struct binder_alloc *alloc,
struct binder_buffer *buffer);
int binder_alloc_mmap_handler(struct binder_alloc *alloc,
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index c2b323bc3b3a..341c73b4a807 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -93,11 +93,11 @@ static bool check_buffer_pages_allocated(struct binder_alloc *alloc,
struct binder_buffer *buffer,
size_t size)
{
- void __user *page_addr;
- void __user *end;
+ unsigned long page_addr;
+ unsigned long end;
int page_index;

- end = (void __user *)PAGE_ALIGN((uintptr_t)buffer->user_data + size);
+ end = PAGE_ALIGN(buffer->user_data + size);
page_addr = buffer->user_data;
for (; page_addr < end; page_addr += PAGE_SIZE) {
page_index = (page_addr - alloc->buffer) / PAGE_SIZE;
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index 8cc07e6a4273..fe38c6fc65d0 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h
@@ -317,7 +317,7 @@ DEFINE_EVENT(binder_buffer_class, binder_transaction_update_buffer_release,

TRACE_EVENT(binder_update_page_range,
TP_PROTO(struct binder_alloc *alloc, bool allocate,
- void __user *start, void __user *end),
+ unsigned long start, unsigned long end),
TP_ARGS(alloc, allocate, start, end),
TP_STRUCT__entry(
__field(int, proc)
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:39

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 02/21] binder: fix use-after-free in shinker's callback

The mmap read lock is used during the shrinker's callback, which means
that using alloc->vma pointer isn't safe as it can race with munmap().
As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
munmap") the mmap lock is downgraded after the vma has been isolated.

I was able to reproduce this issue by manually adding some delays and
triggering page reclaiming through the shrinker's debug sysfs. The
following KASAN report confirms the UAF:

==================================================================
BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
Read of size 8 at addr ffff356ed50e50f0 by task bash/478

CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
Hardware name: linux,dummy-virt (DT)
Call trace:
zap_page_range_single+0x470/0x4b8
binder_alloc_free_page+0x608/0xadc
__list_lru_walk_one+0x130/0x3b0
list_lru_walk_node+0xc4/0x22c
binder_shrink_scan+0x108/0x1dc
shrinker_debugfs_scan_write+0x2b4/0x500
full_proxy_write+0xd4/0x140
vfs_write+0x1ac/0x758
ksys_write+0xf0/0x1dc
__arm64_sys_write+0x6c/0x9c

Allocated by task 492:
kmem_cache_alloc+0x130/0x368
vm_area_alloc+0x2c/0x190
mmap_region+0x258/0x18bc
do_mmap+0x694/0xa60
vm_mmap_pgoff+0x170/0x29c
ksys_mmap_pgoff+0x290/0x3a0
__arm64_sys_mmap+0xcc/0x144

Freed by task 491:
kmem_cache_free+0x17c/0x3c8
vm_area_free_rcu_cb+0x74/0x98
rcu_core+0xa38/0x26d4
rcu_core_si+0x10/0x1c
__do_softirq+0x2fc/0xd24

Last potentially related work creation:
__call_rcu_common.constprop.0+0x6c/0xba0
call_rcu+0x10/0x1c
vm_area_free+0x18/0x24
remove_vma+0xe4/0x118
do_vmi_align_munmap.isra.0+0x718/0xb5c
do_vmi_munmap+0xdc/0x1fc
__vm_munmap+0x10c/0x278
__arm64_sys_munmap+0x58/0x7c

Fix this issue by performing instead a vma_lookup() which will fail to
find the vma that was isolated before the mmap lock downgrade. Note that
this option has better performance than upgrading to a mmap write lock
which would increase contention. Plus, mmap_write_trylock() has been
recently removed anyway.

Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: [email protected]
Cc: Liam Howlett <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e3db8297095a..c4d60d81221b 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- vma = binder_alloc_get_vma(alloc);
+ vma = vma_lookup(mm, page_addr);
+ if (vma && vma != binder_alloc_get_vma(alloc))
+ goto err_invalid_vma;

list_lru_isolate(lru, item);
spin_unlock(lock);
@@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;

+err_invalid_vma:
+ mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:39

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 01/21] binder: use EPOLLERR from eventpoll.h

Use EPOLLERR instead of POLLERR to make sure it is cast to the correct
__poll_t type. This fixes the following sparse issue:

drivers/android/binder.c:5030:24: warning: incorrect type in return expression (different base types)
drivers/android/binder.c:5030:24: expected restricted __poll_t
drivers/android/binder.c:5030:24: got int

Fixes: f88982679f54 ("binder: check for binder_thread allocation failure in binder_poll()")
Cc: [email protected]
Cc: Eric Biggers <[email protected]>
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 92128aae2d06..71a40a4c546f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -5030,7 +5030,7 @@ static __poll_t binder_poll(struct file *filp,

thread = binder_get_thread(proc);
if (!thread)
- return POLLERR;
+ return EPOLLERR;

binder_inner_proc_lock(thread->proc);
thread->looper |= BINDER_LOOPER_STATE_POLL;
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:02:49

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked()

Fix minor misspelling of the function in the comment section.

No functional changes in this patch.

Cc: [email protected]
Fixes: 0f966cba95c7 ("binder: add flag to clear buffer on txn complete")
Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 9b28d0f9666d..cd720bb5c9ce 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -707,7 +707,7 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
/*
* We could eliminate the call to binder_alloc_clear_buf()
* from binder_alloc_deferred_release() by moving this to
- * binder_alloc_free_buf_locked(). However, that could
+ * binder_free_buf_locked(). However, that could
* increase contention for the alloc mutex if clear_on_free
* is used frequently for large buffers. The mutex is not
* needed for correctness here.
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:02

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 17/21] binder: malloc new_buffer outside of locks

Preallocate new_buffer before acquiring the alloc->mutex and hand it
down to binder_alloc_new_buf_locked(). The new buffer will be used in
the vast majority of requests (measured at 98.2% in field data). The
buffer is discarded otherwise. This change is required in preparation
for transitioning alloc->mutex into a spinlock in subsequent commits.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 38 +++++++++++++++++-----------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 56936430954f..da6c62567ffb 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -400,17 +400,19 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc)
return false;
}

+/* Callers preallocate @new_buffer, it is freed by this function if unused */
static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
+ struct binder_buffer *new_buffer,
size_t size,
int is_async)
{
struct rb_node *n = alloc->free_buffers.rb_node;
- struct binder_buffer *buffer;
- size_t buffer_size;
struct rb_node *best_fit = NULL;
+ struct binder_buffer *buffer;
unsigned long has_page_addr;
unsigned long end_page_addr;
+ size_t buffer_size;
int ret;

if (is_async &&
@@ -461,22 +463,17 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
end_page_addr = has_page_addr;
ret = binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
end_page_addr);
- if (ret)
- return ERR_PTR(ret);
+ if (ret) {
+ buffer = ERR_PTR(ret);
+ goto out;
+ }

if (buffer_size != size) {
- struct binder_buffer *new_buffer;
-
- new_buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
- if (!new_buffer) {
- pr_err("%s: %d failed to alloc new buffer struct\n",
- __func__, alloc->pid);
- goto err_alloc_buf_struct_failed;
- }
new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
new_buffer->free = 1;
binder_insert_free_buffer(alloc, new_buffer);
+ new_buffer = NULL;
}

rb_erase(best_fit, &alloc->free_buffers);
@@ -497,12 +494,10 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->oneway_spam_suspect = true;
}

+out:
+ /* discard possibly unused new_buffer */
+ kfree(new_buffer);
return buffer;
-
-err_alloc_buf_struct_failed:
- binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
- return ERR_PTR(-ENOMEM);
}

/**
@@ -526,7 +521,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t extra_buffers_size,
int is_async)
{
- struct binder_buffer *buffer;
+ struct binder_buffer *buffer, *next;
size_t size;

/* Check binder_alloc is fully initialized */
@@ -551,11 +546,16 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
return ERR_PTR(-EINVAL);
}

+ /* preallocate the next buffer */
+ next = kzalloc(sizeof(*next), GFP_KERNEL);
+ if (!next)
+ return ERR_PTR(-ENOMEM);
+
/* Pad 0-size buffers so they get assigned unique addresses */
size = max(size, sizeof(void *));

mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, size, is_async);
+ buffer = binder_alloc_new_buf_locked(alloc, next, size, is_async);
mutex_unlock(&alloc->mutex);

if (IS_ERR(buffer))
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:03

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 11/21] binder: remove pid param in binder_alloc_new_buf()

Binder attributes the buffer allocation to the current->tgid everytime.
There is no need to pass this as a parameter so drop it.

Also add a few touchups to follow the coding guidelines. No functional
changes are introduced in this patch.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder.c | 2 +-
drivers/android/binder_alloc.c | 18 ++++++++----------
drivers/android/binder_alloc.h | 7 ++-----
drivers/android/binder_alloc_selftest.c | 2 +-
4 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 437d1097b118..45674af6310f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3225,7 +3225,7 @@ static void binder_transaction(struct binder_proc *proc,

t->buffer = binder_alloc_new_buf(&target_proc->alloc, tr->data_size,
tr->offsets_size, extra_buffers_size,
- !reply && (t->flags & TF_ONE_WAY), current->tgid);
+ !reply && (t->flags & TF_ONE_WAY));
if (IS_ERR(t->buffer)) {
char *s;

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index ed1f52f98b0d..eacc5a3ce538 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -323,7 +323,7 @@ static inline struct vm_area_struct *binder_alloc_get_vma(
return smp_load_acquire(&alloc->vma);
}

-static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
+static bool debug_low_async_space_locked(struct binder_alloc *alloc)
{
/*
* Find the amount and size of buffers allocated by the current caller;
@@ -332,10 +332,11 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
* and at some point we'll catch them in the act. This is more efficient
* than keeping a map per pid.
*/
- struct rb_node *n;
struct binder_buffer *buffer;
size_t total_alloc_size = 0;
+ int pid = current->tgid;
size_t num_buffers = 0;
+ struct rb_node *n;

for (n = rb_first(&alloc->allocated_buffers); n != NULL;
n = rb_next(n)) {
@@ -369,8 +370,7 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)
static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
size_t size,
- int is_async,
- int pid)
+ int is_async)
{
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -494,7 +494,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* of async space left (which is less than 10% of total
* buffer size).
*/
- buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc, pid);
+ buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc);
} else {
alloc->oneway_spam_detected = false;
}
@@ -515,7 +515,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
* @offsets_size: user specified buffer offset
* @extra_buffers_size: size of extra space for meta-data (eg, security context)
* @is_async: buffer for async transaction
- * @pid: pid to attribute allocation to (used for debugging)
*
* Allocate a new buffer given the requested sizes. Returns
* the kernel version of the buffer pointer. The size allocated
@@ -528,8 +527,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
- int is_async,
- int pid)
+ int is_async)
{
struct binder_buffer *buffer;
size_t size;
@@ -560,7 +558,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size = max(size, sizeof(void *));

mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, size, is_async, pid);
+ buffer = binder_alloc_new_buf_locked(alloc, size, is_async);
mutex_unlock(&alloc->mutex);

if (IS_ERR(buffer))
@@ -570,7 +568,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
buffer->offsets_size = offsets_size;
buffer->async_transaction = is_async;
buffer->extra_buffers_size = extra_buffers_size;
- buffer->pid = pid;
+ buffer->pid = current->tgid;

out:
return buffer;
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index cb19677a5c15..bbc16bc6d5ac 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -49,15 +49,13 @@ struct binder_buffer {
unsigned async_transaction:1;
unsigned oneway_spam_suspect:1;
unsigned debug_id:27;
-
struct binder_transaction *transaction;
-
struct binder_node *target_node;
size_t data_size;
size_t offsets_size;
size_t extra_buffers_size;
unsigned long user_data;
- int pid;
+ int pid;
};

/**
@@ -125,8 +123,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
size_t data_size,
size_t offsets_size,
size_t extra_buffers_size,
- int is_async,
- int pid);
+ int is_async);
void binder_alloc_init(struct binder_alloc *alloc);
int binder_alloc_shrinker_init(void);
void binder_alloc_shrinker_exit(void);
diff --git a/drivers/android/binder_alloc_selftest.c b/drivers/android/binder_alloc_selftest.c
index 341c73b4a807..ed753747e54c 100644
--- a/drivers/android/binder_alloc_selftest.c
+++ b/drivers/android/binder_alloc_selftest.c
@@ -119,7 +119,7 @@ static void binder_selftest_alloc_buf(struct binder_alloc *alloc,
int i;

for (i = 0; i < BUFFER_NUM; i++) {
- buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0, 0);
+ buffers[i] = binder_alloc_new_buf(alloc, sizes[i], 0, 0, 0);
if (IS_ERR(buffers[i]) ||
!check_buffer_pages_allocated(alloc, buffers[i],
sizes[i])) {
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:04

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf()

Extract non-critical sections from binder_alloc_new_buf_locked() that
don't require holding the alloc->mutex. While we are here, consolidate
the multiple checks for size overflow into a single statement.

Also add a few touchups to follow the coding guidelines.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 85 ++++++++++++++++++----------------
1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 27c7163761c4..ed1f52f98b0d 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -368,9 +368,7 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid)

static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_alloc *alloc,
- size_t data_size,
- size_t offsets_size,
- size_t extra_buffers_size,
+ size_t size,
int is_async,
int pid)
{
@@ -378,39 +376,10 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
struct binder_buffer *buffer;
size_t buffer_size;
struct rb_node *best_fit = NULL;
- size_t size, data_offsets_size;
unsigned long has_page_addr;
unsigned long end_page_addr;
int ret;

- /* Check binder_alloc is fully initialized */
- if (!binder_alloc_get_vma(alloc)) {
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "%d: binder_alloc_buf, no vma\n",
- alloc->pid);
- return ERR_PTR(-ESRCH);
- }
-
- data_offsets_size = ALIGN(data_size, sizeof(void *)) +
- ALIGN(offsets_size, sizeof(void *));
-
- if (data_offsets_size < data_size || data_offsets_size < offsets_size) {
- binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: got transaction with invalid size %zd-%zd\n",
- alloc->pid, data_size, offsets_size);
- return ERR_PTR(-EINVAL);
- }
- size = data_offsets_size + ALIGN(extra_buffers_size, sizeof(void *));
- if (size < data_offsets_size || size < extra_buffers_size) {
- binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: got transaction with invalid extra_buffers_size %zd\n",
- alloc->pid, extra_buffers_size);
- return ERR_PTR(-EINVAL);
- }
-
- /* Pad 0-size buffers so they get assigned unique addresses */
- size = max(size, sizeof(void *));
-
if (is_async &&
alloc->free_async_space < size + sizeof(struct binder_buffer)) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
@@ -427,13 +396,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
if (size < buffer_size) {
best_fit = n;
n = n->rb_left;
- } else if (size > buffer_size)
+ } else if (size > buffer_size) {
n = n->rb_right;
- else {
+ } else {
best_fit = n;
break;
}
}
+
if (best_fit == NULL) {
size_t allocated_buffers = 0;
size_t largest_alloc_size = 0;
@@ -511,11 +481,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: binder_alloc_buf size %zd got %pK\n",
alloc->pid, size, buffer);
- buffer->data_size = data_size;
- buffer->offsets_size = offsets_size;
- buffer->async_transaction = is_async;
- buffer->extra_buffers_size = extra_buffers_size;
- buffer->pid = pid;
+
buffer->oneway_spam_suspect = false;
if (is_async) {
alloc->free_async_space -= size + sizeof(struct binder_buffer);
@@ -533,6 +499,7 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
alloc->oneway_spam_detected = false;
}
}
+
return buffer;

err_alloc_buf_struct_failed:
@@ -565,11 +532,47 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
int pid)
{
struct binder_buffer *buffer;
+ size_t size;
+
+ /* Check binder_alloc is fully initialized */
+ if (!binder_alloc_get_vma(alloc)) {
+ binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
+ "%d: binder_alloc_buf, no vma\n",
+ alloc->pid);
+ return ERR_PTR(-ESRCH);
+ }
+
+ size = ALIGN(data_size, sizeof(void *)) +
+ ALIGN(offsets_size, sizeof(void *)) +
+ ALIGN(extra_buffers_size, sizeof(void *));
+
+ if (size < data_size ||
+ size < offsets_size ||
+ size < extra_buffers_size) {
+ binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
+ "%d: got transaction with invalid size %zd-%zd-%zd\n",
+ alloc->pid, data_size, offsets_size,
+ extra_buffers_size);
+ return ERR_PTR(-EINVAL);
+ }
+
+ /* Pad 0-size buffers so they get assigned unique addresses */
+ size = max(size, sizeof(void *));

mutex_lock(&alloc->mutex);
- buffer = binder_alloc_new_buf_locked(alloc, data_size, offsets_size,
- extra_buffers_size, is_async, pid);
+ buffer = binder_alloc_new_buf_locked(alloc, size, is_async, pid);
mutex_unlock(&alloc->mutex);
+
+ if (IS_ERR(buffer))
+ goto out;
+
+ buffer->data_size = data_size;
+ buffer->offsets_size = offsets_size;
+ buffer->async_transaction = is_async;
+ buffer->extra_buffers_size = extra_buffers_size;
+ buffer->pid = pid;
+
+out:
return buffer;
}

--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:09

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 13/21] binder: relocate low space calculation

Move the low async space calculation to debug_low_async_space_locked().
This logic not only fits better here but also offloads some of the many
tasks currently done in binder_alloc_new_buf_locked().

No functional change in this patch.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 3fe5c734c3df..cc627c106a01 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -375,6 +375,15 @@ static bool debug_low_async_space_locked(struct binder_alloc *alloc)
size_t num_buffers = 0;
struct rb_node *n;

+ /*
+ * Only start detecting spammers once we have less than 20% of async
+ * space left (which is less than 10% of total buffer size).
+ */
+ if (alloc->free_async_space >= alloc->buffer_size / 10) {
+ alloc->oneway_spam_detected = false;
+ return false;
+ }
+
for (n = rb_first(&alloc->allocated_buffers); n != NULL;
n = rb_next(n)) {
buffer = rb_entry(n, struct binder_buffer, rb_node);
@@ -497,16 +506,8 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC_ASYNC,
"%d: binder_alloc_buf size %zd async free %zd\n",
alloc->pid, size, alloc->free_async_space);
- if (alloc->free_async_space < alloc->buffer_size / 10) {
- /*
- * Start detecting spammers once we have less than 20%
- * of async space left (which is less than 10% of total
- * buffer size).
- */
- buffer->oneway_spam_suspect = debug_low_async_space_locked(alloc);
- } else {
- alloc->oneway_spam_detected = false;
- }
+ if (debug_low_async_space_locked(alloc))
+ buffer->oneway_spam_suspect = true;
}

return buffer;
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:22

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 16/21] binder: refactor page range allocation

Instead of looping through the page range twice to first determine if
the mmap lock is required, simply do it per-page as needed. Split out
all this logic into a separate binder_get_user_page_remote() function.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 107 +++++++++++++++------------------
1 file changed, 47 insertions(+), 60 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 4821a29799c8..56936430954f 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -203,14 +203,51 @@ static void binder_free_page_range(struct binder_alloc *alloc,
}
}

+static int binder_get_user_page_remote(struct binder_alloc *alloc,
+ struct binder_lru_page *lru_page,
+ unsigned long addr)
+{
+ struct page *page;
+ int ret = 0;
+
+ if (!mmget_not_zero(alloc->mm))
+ return -ESRCH;
+
+ mmap_write_lock(alloc->mm);
+ if (!alloc->vma) {
+ pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
+ ret = -ESRCH;
+ goto out;
+ }
+
+ page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
+ if (!page) {
+ pr_err("%d: failed to allocate page\n", alloc->pid);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = vm_insert_page(alloc->vma, addr, page);
+ if (ret) {
+ pr_err("%d: %s failed to insert page at %lx with %d\n",
+ alloc->pid, __func__, addr, ret);
+ __free_page(page);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ lru_page->page_ptr = page;
+out:
+ mmap_write_unlock(alloc->mm);
+ mmput_async(alloc->mm);
+ return ret;
+}
+
static int binder_allocate_page_range(struct binder_alloc *alloc,
unsigned long start, unsigned long end)
{
- struct vm_area_struct *vma = NULL;
struct binder_lru_page *page;
- struct mm_struct *mm = NULL;
unsigned long page_addr;
- bool need_mm = false;

binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: allocate pages %lx-%lx\n",
@@ -222,32 +259,9 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
trace_binder_update_page_range(alloc, true, start, end);

for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
- page = &alloc->pages[(page_addr - alloc->buffer) / PAGE_SIZE];
- if (!page->page_ptr) {
- need_mm = true;
- break;
- }
- }
-
- if (need_mm && mmget_not_zero(alloc->mm))
- mm = alloc->mm;
-
- if (mm) {
- mmap_write_lock(mm);
- vma = alloc->vma;
- }
-
- if (!vma && need_mm) {
- binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
- "%d: binder_alloc_buf failed to map pages in userspace, no vma\n",
- alloc->pid);
- goto err_no_vma;
- }
-
- for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
- int ret;
+ unsigned long index;
bool on_lru;
- size_t index;
+ int ret;

index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];
@@ -262,26 +276,15 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
continue;
}

- if (WARN_ON(!vma))
- goto err_page_ptr_cleared;
-
trace_binder_alloc_page_start(alloc, index);
- page->page_ptr = alloc_page(GFP_KERNEL |
- __GFP_HIGHMEM |
- __GFP_ZERO);
- if (!page->page_ptr) {
- pr_err("%d: binder_alloc_buf failed for page at %lx\n",
- alloc->pid, page_addr);
- goto err_alloc_page_failed;
- }
+
page->alloc = alloc;
INIT_LIST_HEAD(&page->lru);

- ret = vm_insert_page(vma, page_addr, page->page_ptr);
+ ret = binder_get_user_page_remote(alloc, page, page_addr);
if (ret) {
- pr_err("%d: binder_alloc_buf failed to map page at %lx in userspace\n",
- alloc->pid, page_addr);
- goto err_vm_insert_page_failed;
+ binder_free_page_range(alloc, start, page_addr);
+ return ret;
}

if (index + 1 > alloc->pages_high)
@@ -289,24 +292,8 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,

trace_binder_alloc_page_end(alloc, index);
}
- if (mm) {
- mmap_write_unlock(mm);
- mmput_async(mm);
- }
- return 0;

-err_vm_insert_page_failed:
- __free_page(page->page_ptr);
- page->page_ptr = NULL;
-err_alloc_page_failed:
-err_page_ptr_cleared:
- binder_free_page_range(alloc, start, page_addr);
-err_no_vma:
- if (mm) {
- mmap_write_unlock(mm);
- mmput_async(mm);
- }
- return vma ? -ENOMEM : -ESRCH;
+ return 0;
}

static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:26

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 19/21] binder: perform page allocation outside of locks

Split out the insertion of pages to be done outside of the alloc->mutex
in a separate binder_get_pages_range() routine. Since this is no longer
serialized with other requests we need to make sure we look at the full
range of pages for this buffer, including those shared with neighboring
buffers. The insertion of pages into the vma is still serialized with
the mmap write lock.

Besides avoiding unnecessary nested locking this helps in preparation of
switching the alloc->mutex into a spinlock_t in subsequent patches.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 85 ++++++++++++++++++++++------------
1 file changed, 56 insertions(+), 29 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 7e0af1786b20..e739be7f2dd4 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -194,6 +194,9 @@ static void binder_free_page_range(struct binder_alloc *alloc,
index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];

+ if (!page->page_ptr)
+ continue;
+
trace_binder_free_lru_start(alloc, index);

ret = list_lru_add(&binder_alloc_lru, &page->lru);
@@ -214,6 +217,9 @@ static int binder_get_user_page_remote(struct binder_alloc *alloc,
return -ESRCH;

mmap_write_lock(alloc->mm);
+ if (lru_page->page_ptr)
+ goto out;
+
if (!alloc->vma) {
pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
ret = -ESRCH;
@@ -236,32 +242,64 @@ static int binder_get_user_page_remote(struct binder_alloc *alloc,
goto out;
}

- lru_page->page_ptr = page;
+ /* mark page insertion complete and safe to acquire */
+ smp_store_release(&lru_page->page_ptr, page);
out:
mmap_write_unlock(alloc->mm);
mmput_async(alloc->mm);
return ret;
}

-static int binder_allocate_page_range(struct binder_alloc *alloc,
- unsigned long start, unsigned long end)
+/* The range of pages should include those shared with other buffers */
+static int binder_get_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
{
struct binder_lru_page *page;
unsigned long page_addr;

binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
- "%d: allocate pages %lx-%lx\n",
+ "%d: get pages %lx-%lx\n",
alloc->pid, start, end);

- if (end <= start)
- return 0;
+ for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
+ unsigned long index;
+ int ret;
+
+ index = (page_addr - alloc->buffer) / PAGE_SIZE;
+ page = &alloc->pages[index];
+
+ /* check if page insertion is marked complete by release */
+ if (smp_load_acquire(&page->page_ptr))
+ continue;
+
+ trace_binder_alloc_page_start(alloc, index);
+
+ ret = binder_get_user_page_remote(alloc, page, page_addr);
+ if (ret)
+ return ret;
+
+ trace_binder_alloc_page_end(alloc, index);
+ }
+
+ return 0;
+}
+
+/* The range of pages should exclude those shared with other buffers */
+static void binder_allocate_page_range(struct binder_alloc *alloc,
+ unsigned long start, unsigned long end)
+{
+ struct binder_lru_page *page;
+ unsigned long page_addr;
+
+ binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
+ "%d: allocate pages %lx-%lx\n",
+ alloc->pid, start, end);

trace_binder_update_page_range(alloc, true, start, end);

for (page_addr = start; page_addr < end; page_addr += PAGE_SIZE) {
unsigned long index;
bool on_lru;
- int ret;

index = (page_addr - alloc->buffer) / PAGE_SIZE;
page = &alloc->pages[index];
@@ -276,21 +314,9 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,
continue;
}

- trace_binder_alloc_page_start(alloc, index);
-
- ret = binder_get_user_page_remote(alloc, page, page_addr);
- if (ret) {
- binder_free_page_range(alloc, start, page_addr);
- return ret;
- }
-
if (index + 1 > alloc->pages_high)
alloc->pages_high = index + 1;
-
- trace_binder_alloc_page_end(alloc, index);
}
-
- return 0;
}

static inline void binder_alloc_set_vma(struct binder_alloc *alloc,
@@ -410,7 +436,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
unsigned long has_page_addr;
unsigned long end_page_addr;
size_t buffer_size;
- int ret;

if (is_async &&
alloc->free_async_space < size + sizeof(struct binder_buffer)) {
@@ -453,18 +478,14 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
"%d: binder_alloc_buf size %zd got buffer %pK size %zd\n",
alloc->pid, size, buffer, buffer_size);

- has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
WARN_ON(n && buffer_size != size);
+
+ has_page_addr = (buffer->user_data + buffer_size) & PAGE_MASK;
end_page_addr = PAGE_ALIGN(buffer->user_data + size);
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
- ret = binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- end_page_addr);
- if (ret) {
- buffer = ERR_PTR(ret);
- goto out;
- }
-
+ binder_allocate_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ end_page_addr);
if (buffer_size != size) {
new_buffer->user_data = buffer->user_data + size;
list_add(&new_buffer->entry, &buffer->entry);
@@ -491,7 +512,6 @@ static struct binder_buffer *binder_alloc_new_buf_locked(
buffer->oneway_spam_suspect = true;
}

-out:
/* discard possibly unused new_buffer */
kfree(new_buffer);
return buffer;
@@ -520,6 +540,7 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
{
struct binder_buffer *buffer, *next;
size_t size;
+ int ret;

/* Check binder_alloc is fully initialized */
if (!binder_alloc_get_vma(alloc)) {
@@ -564,6 +585,12 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
buffer->extra_buffers_size = extra_buffers_size;
buffer->pid = current->tgid;

+ ret = binder_get_page_range(alloc, buffer->user_data & PAGE_MASK,
+ PAGE_ALIGN(buffer->user_data + size));
+ if (ret) {
+ binder_alloc_free_buf(alloc, buffer);
+ buffer = ERR_PTR(ret);
+ }
out:
return buffer;
}
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:29

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 20/21] binder: reverse locking order in shrinker callback

The locking order currently requires the alloc->mutex to be acquired
first followed by the mmap lock. However, the alloc->mutex is converted
into a spinlock in subsequent commits so the order needs to be reversed
to avoid nesting the sleeping mmap lock under the spinlock.

The shrinker's callback binder_alloc_free_page() is the only place that
needs to be reordered since other functions have been refactored and no
longer nest these locks.

Some minor cosmetic changes are also included in this patch.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 44 ++++++++++++++++------------------
1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index e739be7f2dd4..0ba9f524e0ff 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1065,35 +1065,38 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
void *cb_arg)
__must_hold(lock)
{
- struct mm_struct *mm = NULL;
- struct binder_lru_page *page = container_of(item,
- struct binder_lru_page,
- lru);
- struct binder_alloc *alloc;
+ struct binder_lru_page *page = container_of(item, typeof(*page), lru);
+ struct binder_alloc *alloc = page->alloc;
+ struct mm_struct *mm = alloc->mm;
+ struct vm_area_struct *vma;
unsigned long page_addr;
size_t index;
- struct vm_area_struct *vma;

- alloc = page->alloc;
+ if (!mmget_not_zero(mm))
+ goto err_mmget;
+ if (!mmap_read_trylock(mm))
+ goto err_mmap_read_lock_failed;
if (!mutex_trylock(&alloc->mutex))
goto err_get_alloc_mutex_failed;
-
if (!page->page_ptr)
goto err_page_already_freed;

index = page - alloc->pages;
page_addr = alloc->buffer + index * PAGE_SIZE;

- mm = alloc->mm;
- if (!mmget_not_zero(mm))
- goto err_mmget;
- if (!mmap_read_trylock(mm))
- goto err_mmap_read_lock_failed;
vma = vma_lookup(mm, page_addr);
if (vma && vma != binder_alloc_get_vma(alloc))
goto err_invalid_vma;

+ trace_binder_unmap_kernel_start(alloc, index);
+
+ __free_page(page->page_ptr);
+ page->page_ptr = NULL;
+
+ trace_binder_unmap_kernel_end(alloc, index);
+
list_lru_isolate(lru, item);
+ mutex_unlock(&alloc->mutex);
spin_unlock(lock);

if (vma) {
@@ -1103,28 +1106,21 @@ enum lru_status binder_alloc_free_page(struct list_head *item,

trace_binder_unmap_user_end(alloc, index);
}
+
mmap_read_unlock(mm);
mmput_async(mm);

- trace_binder_unmap_kernel_start(alloc, index);
-
- __free_page(page->page_ptr);
- page->page_ptr = NULL;
-
- trace_binder_unmap_kernel_end(alloc, index);
-
spin_lock(lock);
- mutex_unlock(&alloc->mutex);
return LRU_REMOVED_RETRY;

err_invalid_vma:
+err_page_already_freed:
+ mutex_unlock(&alloc->mutex);
+err_get_alloc_mutex_failed:
mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
err_mmget:
-err_page_already_freed:
- mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
return LRU_SKIP;
}

--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:32

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 18/21] binder: initialize lru pages in mmap callback

Rather than repeatedly initializing some of the binder_lru_page members
during binder_alloc_new_buf(), perform this initialization just once in
binder_alloc_mmap_handler(), after the pages have been created.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index da6c62567ffb..7e0af1786b20 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -278,9 +278,6 @@ static int binder_allocate_page_range(struct binder_alloc *alloc,

trace_binder_alloc_page_start(alloc, index);

- page->alloc = alloc;
- INIT_LIST_HEAD(&page->lru);
-
ret = binder_get_user_page_remote(alloc, page, page_addr);
if (ret) {
binder_free_page_range(alloc, start, page_addr);
@@ -791,9 +788,9 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
int binder_alloc_mmap_handler(struct binder_alloc *alloc,
struct vm_area_struct *vma)
{
- int ret;
- const char *failure_string;
struct binder_buffer *buffer;
+ const char *failure_string;
+ int ret, i;

if (unlikely(vma->vm_mm != alloc->mm)) {
ret = -EINVAL;
@@ -822,6 +819,11 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
goto err_alloc_pages_failed;
}

+ for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) {
+ alloc->pages[i].alloc = alloc;
+ INIT_LIST_HEAD(&alloc->pages[i].lru);
+ }
+
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
if (!buffer) {
ret = -ENOMEM;
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:31

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 14/21] binder: do not add pages to LRU in release path

In binder_alloc_deferred_release() pages are added to the LRU list via
binder_free_buf_locked(). However, this is pointless because these pages
are kfree'd immediately afterwards. Add an option to skip the LRU list.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index cc627c106a01..d2a38dee12db 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -595,16 +595,16 @@ static unsigned long prev_buffer_end_page(struct binder_buffer *buffer)
}

static void binder_delete_free_buffer(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
+ struct binder_buffer *buffer,
+ bool free_pages)
{
struct binder_buffer *prev, *next = NULL;
- bool to_free = true;

BUG_ON(alloc->buffers.next == &buffer->entry);
prev = binder_buffer_prev(buffer);
BUG_ON(!prev->free);
if (prev_buffer_end_page(prev) == buffer_start_page(buffer)) {
- to_free = false;
+ free_pages = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx share page with %lx\n",
alloc->pid, buffer->user_data,
@@ -614,7 +614,7 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
if (!list_is_last(&buffer->entry, &alloc->buffers)) {
next = binder_buffer_next(buffer);
if (buffer_start_page(next) == buffer_start_page(buffer)) {
- to_free = false;
+ free_pages = false;
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx share page with %lx\n",
alloc->pid,
@@ -627,10 +627,10 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer start %lx is page aligned\n",
alloc->pid, buffer->user_data);
- to_free = false;
+ free_pages = false;
}

- if (to_free) {
+ if (free_pages) {
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
"%d: merge free, buffer %lx do not share page with %lx or %lx\n",
alloc->pid, buffer->user_data,
@@ -644,7 +644,8 @@ static void binder_delete_free_buffer(struct binder_alloc *alloc,
}

static void binder_free_buf_locked(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
+ struct binder_buffer *buffer,
+ bool free_pages)
{
size_t size, buffer_size;

@@ -672,8 +673,9 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
alloc->pid, size, alloc->free_async_space);
}

- binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
- (buffer->user_data + buffer_size) & PAGE_MASK);
+ if (free_pages)
+ binder_free_page_range(alloc, PAGE_ALIGN(buffer->user_data),
+ (buffer->user_data + buffer_size) & PAGE_MASK);

rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
@@ -682,14 +684,14 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,

if (next->free) {
rb_erase(&next->rb_node, &alloc->free_buffers);
- binder_delete_free_buffer(alloc, next);
+ binder_delete_free_buffer(alloc, next, free_pages);
}
}
if (alloc->buffers.next != &buffer->entry) {
struct binder_buffer *prev = binder_buffer_prev(buffer);

if (prev->free) {
- binder_delete_free_buffer(alloc, buffer);
+ binder_delete_free_buffer(alloc, buffer, free_pages);
rb_erase(&prev->rb_node, &alloc->free_buffers);
buffer = prev;
}
@@ -722,7 +724,7 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
buffer->clear_on_free = false;
}
mutex_lock(&alloc->mutex);
- binder_free_buf_locked(alloc, buffer);
+ binder_free_buf_locked(alloc, buffer, true);
mutex_unlock(&alloc->mutex);
}

@@ -829,7 +831,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
binder_alloc_clear_buf(alloc, buffer);
buffer->clear_on_free = false;
}
- binder_free_buf_locked(alloc, buffer);
+ binder_free_buf_locked(alloc, buffer, false);
buffers++;
}

--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:34

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 21/21] binder: switch alloc->mutex to spinlock_t

The alloc->mutex is a highly contended lock that causes performance
issues on Android devices. When a low-priority task is given this lock
and it sleeps, it becomes difficult for the task to wakeup and complete
its work. This delays other tasks that are also waiting on the mutex.

The problem gets worse when there is memory pressure in the system,
because this increases the contention on the alloc->mutex while the
shrinker reclaims binder pages.

Switching to a spinlock helps to keep the waiters running and avoids the
overhead of waking up tasks. This significantly improves the transaction
latency when the problematic scenario occurs.

The performance impact of this patchset was measured by stress-testing
the binder alloc contention. In this test, several clients of different
priorities send thousands of transactions of different sizes to a single
server. In parallel, pages get reclaimed using the shinker's debugfs.

The test was run on a Pixel 8, Pixel 6 and qemu machine. The results
were similar on all three devices:

after:
| sched | prio | average | max | min |
|--------+------+---------+-----------+---------|
| fifo | 99 | 0.135ms | 1.197ms | 0.022ms |
| fifo | 01 | 0.136ms | 5.232ms | 0.018ms |
| other | -20 | 0.180ms | 7.403ms | 0.019ms |
| other | 19 | 0.241ms | 58.094ms | 0.018ms |

before:
| sched | prio | average | max | min |
|--------+------+---------+-----------+---------|
| fifo | 99 | 0.350ms | 248.730ms | 0.020ms |
| fifo | 01 | 0.357ms | 248.817ms | 0.024ms |
| other | -20 | 0.399ms | 249.906ms | 0.020ms |
| other | 19 | 0.477ms | 297.756ms | 0.022ms |

The key metrics above are the average and max latencies (wall time).
These improvements should roughly translate to p95-p99 latencies on real
workloads. The response time is up to 200x faster in these scenarios and
there is no penalty in the regular path.

Note that it is only possible to convert this lock after a series of
changes made by previous patches. These mainly include refactoring the
sections that might_sleep() and changing the locking order with the
mmap_lock amongst others.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 44 +++++++++++++++++-----------------
drivers/android/binder_alloc.h | 10 ++++----
2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 0ba9f524e0ff..a8a3e91ba308 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -169,9 +169,9 @@ struct binder_buffer *binder_alloc_prepare_to_free(struct binder_alloc *alloc,
{
struct binder_buffer *buffer;

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
buffer = binder_alloc_prepare_to_free_locked(alloc, user_ptr);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return buffer;
}

@@ -572,9 +572,9 @@ struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
/* Pad 0-size buffers so they get assigned unique addresses */
size = max(size, sizeof(void *));

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
buffer = binder_alloc_new_buf_locked(alloc, next, size, is_async);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);

if (IS_ERR(buffer))
goto out;
@@ -786,17 +786,17 @@ void binder_alloc_free_buf(struct binder_alloc *alloc,
* We could eliminate the call to binder_alloc_clear_buf()
* from binder_alloc_deferred_release() by moving this to
* binder_free_buf_locked(). However, that could
- * increase contention for the alloc mutex if clear_on_free
- * is used frequently for large buffers. The mutex is not
+ * increase contention for the alloc->lock if clear_on_free
+ * is used frequently for large buffers. This lock is not
* needed for correctness here.
*/
if (buffer->clear_on_free) {
binder_alloc_clear_buf(alloc, buffer);
buffer->clear_on_free = false;
}
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
binder_free_buf_locked(alloc, buffer, true);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
}

/**
@@ -894,7 +894,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
struct binder_buffer *buffer;

buffers = 0;
- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
BUG_ON(alloc->vma);

while ((n = rb_first(&alloc->allocated_buffers))) {
@@ -944,7 +944,7 @@ void binder_alloc_deferred_release(struct binder_alloc *alloc)
}
kfree(alloc->pages);
}
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
if (alloc->mm)
mmdrop(alloc->mm);

@@ -976,11 +976,11 @@ void binder_alloc_print_allocated(struct seq_file *m,
{
struct rb_node *n;

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n))
print_binder_buffer(m, " buffer",
rb_entry(n, struct binder_buffer, rb_node));
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
}

/**
@@ -997,7 +997,7 @@ void binder_alloc_print_pages(struct seq_file *m,
int lru = 0;
int free = 0;

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
/*
* Make sure the binder_alloc is fully initialized, otherwise we might
* read inconsistent state.
@@ -1013,7 +1013,7 @@ void binder_alloc_print_pages(struct seq_file *m,
lru++;
}
}
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
seq_printf(m, " pages: %d:%d:%d\n", active, lru, free);
seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);
}
@@ -1029,10 +1029,10 @@ int binder_alloc_get_allocated_count(struct binder_alloc *alloc)
struct rb_node *n;
int count = 0;

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
for (n = rb_first(&alloc->allocated_buffers); n != NULL; n = rb_next(n))
count++;
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return count;
}

@@ -1076,8 +1076,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
goto err_mmget;
if (!mmap_read_trylock(mm))
goto err_mmap_read_lock_failed;
- if (!mutex_trylock(&alloc->mutex))
- goto err_get_alloc_mutex_failed;
+ if (!spin_trylock(&alloc->lock))
+ goto err_get_alloc_lock_failed;
if (!page->page_ptr)
goto err_page_already_freed;

@@ -1096,7 +1096,7 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
trace_binder_unmap_kernel_end(alloc, index);

list_lru_isolate(lru, item);
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
spin_unlock(lock);

if (vma) {
@@ -1115,8 +1115,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,

err_invalid_vma:
err_page_already_freed:
- mutex_unlock(&alloc->mutex);
-err_get_alloc_mutex_failed:
+ spin_unlock(&alloc->lock);
+err_get_alloc_lock_failed:
mmap_read_unlock(mm);
err_mmap_read_lock_failed:
mmput_async(mm);
@@ -1155,7 +1155,7 @@ void binder_alloc_init(struct binder_alloc *alloc)
alloc->pid = current->group_leader->pid;
alloc->mm = current->mm;
mmgrab(alloc->mm);
- mutex_init(&alloc->mutex);
+ spin_lock_init(&alloc->lock);
INIT_LIST_HEAD(&alloc->buffers);
}

diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index bbc16bc6d5ac..3602a60d9609 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -9,7 +9,7 @@
#include <linux/rbtree.h>
#include <linux/list.h>
#include <linux/mm.h>
-#include <linux/rtmutex.h>
+#include <linux/spinlock.h>
#include <linux/vmalloc.h>
#include <linux/slab.h>
#include <linux/list_lru.h>
@@ -72,7 +72,7 @@ struct binder_lru_page {

/**
* struct binder_alloc - per-binder proc state for binder allocator
- * @mutex: protects binder_alloc fields
+ * @lock: protects binder_alloc fields
* @vma: vm_area_struct passed to mmap_handler
* (invariant after mmap)
* @mm: copy of task->mm (invariant after open)
@@ -96,7 +96,7 @@ struct binder_lru_page {
* struct binder_buffer objects used to track the user buffers
*/
struct binder_alloc {
- struct mutex mutex;
+ spinlock_t lock;
struct vm_area_struct *vma;
struct mm_struct *mm;
unsigned long buffer;
@@ -153,9 +153,9 @@ binder_alloc_get_free_async_space(struct binder_alloc *alloc)
{
size_t free_async_space;

- mutex_lock(&alloc->mutex);
+ spin_lock(&alloc->lock);
free_async_space = alloc->free_async_space;
- mutex_unlock(&alloc->mutex);
+ spin_unlock(&alloc->lock);
return free_async_space;
}

--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:03:43

by Carlos Llamas

[permalink] [raw]
Subject: [PATCH 15/21] binder: relocate binder_alloc_clear_buf()

Move this function up along with binder_alloc_get_page() so that their
prototypes aren't necessary.

No functional change in this patch.

Signed-off-by: Carlos Llamas <[email protected]>
---
drivers/android/binder_alloc.c | 124 ++++++++++++++++-----------------
1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index d2a38dee12db..4821a29799c8 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -699,8 +699,68 @@ static void binder_free_buf_locked(struct binder_alloc *alloc,
binder_insert_free_buffer(alloc, buffer);
}

+/**
+ * binder_alloc_get_page() - get kernel pointer for given buffer offset
+ * @alloc: binder_alloc for this proc
+ * @buffer: binder buffer to be accessed
+ * @buffer_offset: offset into @buffer data
+ * @pgoffp: address to copy final page offset to
+ *
+ * Lookup the struct page corresponding to the address
+ * at @buffer_offset into @buffer->user_data. If @pgoffp is not
+ * NULL, the byte-offset into the page is written there.
+ *
+ * The caller is responsible to ensure that the offset points
+ * to a valid address within the @buffer and that @buffer is
+ * not freeable by the user. Since it can't be freed, we are
+ * guaranteed that the corresponding elements of @alloc->pages[]
+ * cannot change.
+ *
+ * Return: struct page
+ */
+static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
+ struct binder_buffer *buffer,
+ binder_size_t buffer_offset,
+ pgoff_t *pgoffp)
+{
+ binder_size_t buffer_space_offset = buffer_offset +
+ (buffer->user_data - alloc->buffer);
+ pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
+ size_t index = buffer_space_offset >> PAGE_SHIFT;
+ struct binder_lru_page *lru_page;
+
+ lru_page = &alloc->pages[index];
+ *pgoffp = pgoff;
+ return lru_page->page_ptr;
+}
+
+/**
+ * binder_alloc_clear_buf() - zero out buffer
+ * @alloc: binder_alloc for this proc
+ * @buffer: binder buffer to be cleared
+ *
+ * memset the given buffer to 0
+ */
static void binder_alloc_clear_buf(struct binder_alloc *alloc,
- struct binder_buffer *buffer);
+ struct binder_buffer *buffer)
+{
+ size_t bytes = binder_alloc_buffer_size(alloc, buffer);
+ binder_size_t buffer_offset = 0;
+
+ while (bytes) {
+ unsigned long size;
+ struct page *page;
+ pgoff_t pgoff;
+
+ page = binder_alloc_get_page(alloc, buffer,
+ buffer_offset, &pgoff);
+ size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
+ memset_page(page, pgoff, 0, size);
+ bytes -= size;
+ buffer_offset += size;
+ }
+}
+
/**
* binder_alloc_free_buf() - free a binder buffer
* @alloc: binder_alloc for this proc
@@ -1137,68 +1197,6 @@ static inline bool check_buffer(struct binder_alloc *alloc,
(!buffer->allow_user_free || !buffer->transaction);
}

-/**
- * binder_alloc_get_page() - get kernel pointer for given buffer offset
- * @alloc: binder_alloc for this proc
- * @buffer: binder buffer to be accessed
- * @buffer_offset: offset into @buffer data
- * @pgoffp: address to copy final page offset to
- *
- * Lookup the struct page corresponding to the address
- * at @buffer_offset into @buffer->user_data. If @pgoffp is not
- * NULL, the byte-offset into the page is written there.
- *
- * The caller is responsible to ensure that the offset points
- * to a valid address within the @buffer and that @buffer is
- * not freeable by the user. Since it can't be freed, we are
- * guaranteed that the corresponding elements of @alloc->pages[]
- * cannot change.
- *
- * Return: struct page
- */
-static struct page *binder_alloc_get_page(struct binder_alloc *alloc,
- struct binder_buffer *buffer,
- binder_size_t buffer_offset,
- pgoff_t *pgoffp)
-{
- binder_size_t buffer_space_offset = buffer_offset +
- (buffer->user_data - alloc->buffer);
- pgoff_t pgoff = buffer_space_offset & ~PAGE_MASK;
- size_t index = buffer_space_offset >> PAGE_SHIFT;
- struct binder_lru_page *lru_page;
-
- lru_page = &alloc->pages[index];
- *pgoffp = pgoff;
- return lru_page->page_ptr;
-}
-
-/**
- * binder_alloc_clear_buf() - zero out buffer
- * @alloc: binder_alloc for this proc
- * @buffer: binder buffer to be cleared
- *
- * memset the given buffer to 0
- */
-static void binder_alloc_clear_buf(struct binder_alloc *alloc,
- struct binder_buffer *buffer)
-{
- size_t bytes = binder_alloc_buffer_size(alloc, buffer);
- binder_size_t buffer_offset = 0;
-
- while (bytes) {
- unsigned long size;
- struct page *page;
- pgoff_t pgoff;
-
- page = binder_alloc_get_page(alloc, buffer,
- buffer_offset, &pgoff);
- size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
- memset_page(page, pgoff, 0, size);
- bytes -= size;
- buffer_offset += size;
- }
-}
-
/**
* binder_alloc_copy_user_to_buffer() - copy src user to tgt user
* @alloc: binder_alloc for this proc
--
2.42.0.869.gea05f2083d-goog

2023-11-02 19:21:54

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback

* Carlos Llamas <[email protected]> [231102 15:00]:
> The mmap read lock is used during the shrinker's callback, which means
> that using alloc->vma pointer isn't safe as it can race with munmap().

I think you know my feelings about the safety of that pointer from
previous discussions.

> As of commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
> munmap") the mmap lock is downgraded after the vma has been isolated.
>
> I was able to reproduce this issue by manually adding some delays and
> triggering page reclaiming through the shrinker's debug sysfs. The
> following KASAN report confirms the UAF:
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in zap_page_range_single+0x470/0x4b8
> Read of size 8 at addr ffff356ed50e50f0 by task bash/478
>
> CPU: 1 PID: 478 Comm: bash Not tainted 6.6.0-rc5-00055-g1c8b86a3799f-dirty #70
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> zap_page_range_single+0x470/0x4b8
> binder_alloc_free_page+0x608/0xadc
> __list_lru_walk_one+0x130/0x3b0
> list_lru_walk_node+0xc4/0x22c
> binder_shrink_scan+0x108/0x1dc
> shrinker_debugfs_scan_write+0x2b4/0x500
> full_proxy_write+0xd4/0x140
> vfs_write+0x1ac/0x758
> ksys_write+0xf0/0x1dc
> __arm64_sys_write+0x6c/0x9c
>
> Allocated by task 492:
> kmem_cache_alloc+0x130/0x368
> vm_area_alloc+0x2c/0x190
> mmap_region+0x258/0x18bc
> do_mmap+0x694/0xa60
> vm_mmap_pgoff+0x170/0x29c
> ksys_mmap_pgoff+0x290/0x3a0
> __arm64_sys_mmap+0xcc/0x144
>
> Freed by task 491:
> kmem_cache_free+0x17c/0x3c8
> vm_area_free_rcu_cb+0x74/0x98
> rcu_core+0xa38/0x26d4
> rcu_core_si+0x10/0x1c
> __do_softirq+0x2fc/0xd24
>
> Last potentially related work creation:
> __call_rcu_common.constprop.0+0x6c/0xba0
> call_rcu+0x10/0x1c
> vm_area_free+0x18/0x24
> remove_vma+0xe4/0x118
> do_vmi_align_munmap.isra.0+0x718/0xb5c
> do_vmi_munmap+0xdc/0x1fc
> __vm_munmap+0x10c/0x278
> __arm64_sys_munmap+0x58/0x7c
>
> Fix this issue by performing instead a vma_lookup() which will fail to
> find the vma that was isolated before the mmap lock downgrade. Note that
> this option has better performance than upgrading to a mmap write lock
> which would increase contention. Plus, mmap_write_trylock() has been
> recently removed anyway.
>
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Cc: [email protected]
> Cc: Liam Howlett <[email protected]>
> Cc: Minchan Kim <[email protected]>
> Signed-off-by: Carlos Llamas <[email protected]>
> ---
> drivers/android/binder_alloc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> index e3db8297095a..c4d60d81221b 100644
> --- a/drivers/android/binder_alloc.c
> +++ b/drivers/android/binder_alloc.c
> @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> goto err_mmget;
> if (!mmap_read_trylock(mm))
> goto err_mmap_read_lock_failed;
> - vma = binder_alloc_get_vma(alloc);
> + vma = vma_lookup(mm, page_addr);
> + if (vma && vma != binder_alloc_get_vma(alloc))
> + goto err_invalid_vma;

Doesn't this need to be:
if (!vma || vma != binder_alloc_get_vma(alloc))

This way, we catch a different vma and a NULL vma.

Or even, just:
if (vma != binder_alloc_get_vma(alloc))

if the alloc vma cannot be NULL?

>
> list_lru_isolate(lru, item);
> spin_unlock(lock);
> @@ -1031,6 +1033,8 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> mutex_unlock(&alloc->mutex);
> return LRU_REMOVED_RETRY;
>
> +err_invalid_vma:
> + mmap_read_unlock(mm);
> err_mmap_read_lock_failed:
> mmput_async(mm);
> err_mmget:
> --
> 2.42.0.869.gea05f2083d-goog
>

2023-11-02 20:10:40

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback

On Thu, Nov 02, 2023 at 03:20:51PM -0400, Liam R. Howlett wrote:
> * Carlos Llamas <[email protected]> [231102 15:00]:
> > The mmap read lock is used during the shrinker's callback, which means
> > that using alloc->vma pointer isn't safe as it can race with munmap().
>
> I think you know my feelings about the safety of that pointer from
> previous discussions.
>

Yeah. The work here is not done. We actually already store the vm_start
address in alloc->buffer, so in theory we don't even need to swap the
alloc->vma pointer we could just drop it. So, I agree with you.

I want to include this saftey "fix" along with some other work that uses
the page fault handler and get_user_pages_remote(). I've tried a quick
prototype of this and it works fine.

> > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > index e3db8297095a..c4d60d81221b 100644
> > --- a/drivers/android/binder_alloc.c
> > +++ b/drivers/android/binder_alloc.c
> > @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> > goto err_mmget;
> > if (!mmap_read_trylock(mm))
> > goto err_mmap_read_lock_failed;
> > - vma = binder_alloc_get_vma(alloc);
> > + vma = vma_lookup(mm, page_addr);
> > + if (vma && vma != binder_alloc_get_vma(alloc))
> > + goto err_invalid_vma;
>
> Doesn't this need to be:
> if (!vma || vma != binder_alloc_get_vma(alloc))
>
> This way, we catch a different vma and a NULL vma.
>
> Or even, just:
> if (vma != binder_alloc_get_vma(alloc))
>
> if the alloc vma cannot be NULL?
>

If the vma_lookup() is NULL then we still need to isolate and free the
given binder page and we obviously skip the zap() in this case.

However, if we receive a random unexpected vma because of a corrupted
address or similar, then the whole process is skipped.

Thus, why we use the check above.

--
Carlos Llamas

2023-11-02 20:27:42

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 02/21] binder: fix use-after-free in shinker's callback

* Carlos Llamas <[email protected]> [231102 16:09]:

...
>
> > > diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
> > > index e3db8297095a..c4d60d81221b 100644
> > > --- a/drivers/android/binder_alloc.c
> > > +++ b/drivers/android/binder_alloc.c
> > > @@ -1005,7 +1005,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
> > > goto err_mmget;
> > > if (!mmap_read_trylock(mm))
> > > goto err_mmap_read_lock_failed;
> > > - vma = binder_alloc_get_vma(alloc);
> > > + vma = vma_lookup(mm, page_addr);
> > > + if (vma && vma != binder_alloc_get_vma(alloc))
> > > + goto err_invalid_vma;
> >
> > Doesn't this need to be:
> > if (!vma || vma != binder_alloc_get_vma(alloc))
> >
> > This way, we catch a different vma and a NULL vma.
> >
> > Or even, just:
> > if (vma != binder_alloc_get_vma(alloc))
> >
> > if the alloc vma cannot be NULL?
> >
>
> If the vma_lookup() is NULL then we still need to isolate and free the
> given binder page and we obviously skip the zap() in this case.

I would have thought if there was no VMA, then the entire process could
be avoided. Thanks for clarifying.

>
> However, if we receive a random unexpected vma because of a corrupted
> address or similar, then the whole process is skipped.
>
> Thus, why we use the check above.
>
> --
> Carlos Llamas

2023-12-01 06:52:29

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 05/21] binder: fix trivial typo of binder_free_buf_locked()

On Tue, Nov 07, 2023 at 09:08:05AM +0000, Alice Ryhl wrote:
> It's a bit confusing that the pair of methods binder_alloc_free_buf and
> binder_free_buf_locked are inconsistent in whether they user the alloc_
> prefix. It might be worth it to change them to be consistent?

Right, the prefix is concistent across the APIs but not really for the
local function names. I wouldn't mind adding the "alloc" part, it just
seemed easier to fix the comment and backport that.

--
Carlos Llamas

2023-12-01 07:02:42

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 08/21] binder: keep vma addresses type as unsigned long

On Tue, Nov 07, 2023 at 09:08:13AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > - seq_printf(m, " size %zd:%zd data %pK\n",
> > + seq_printf(m, " size %zd:%zd data %lx\n",
> > buffer->data_size, buffer->offsets_size,
> > buffer->user_data);
>
> This changes all of the print operations to use %lx instead of %pK,
> which means that the addresses are no longer being hidden when using
> kptr_restrict.
>
> Since the pointers are all userspace pointers, it's not clear to me what
> the consequences of this are. However, I'd like to confirm whether this
> is an intentional change?

I confirm the change was intentional, we are giving the impression that
these are kernel pointers when they are not. However, I do think your
concern is valid. I've added a patch to v2 to deal with this.

I can tell you we are already logging the unhashed userspace addresses
in other places and we should probably fix those too.

--
Carlos Llamas

2023-12-01 07:11:11

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 10/21] binder: do unlocked work in binder_alloc_new_buf()

On Tue, Nov 07, 2023 at 09:08:18AM +0000, Alice Ryhl wrote:
> I found a few issues in this patch:
>
> Consolidating the overflow check into one if statement like this doesn't
> catch all cases of integer overflow. For example, if all three sizes are
> 9223372036854775816, then the computed size will be 9223372036854775832,
> so this would not trigger the overflow check.

Thanks for pointing this out, you are right.

I don't understand the reasoning behind using size_t for the uapi. It
just made things more complicated than needed. These sizes are much
larger than the maximum buffer size of SZ_4M.

Anyway, I've fixed this for v2.

>
> Carlos Llamas <[email protected]> writes:
> > mutex_unlock(&alloc->mutex);
> > +
> > + if (IS_ERR(buffer))
> > + goto out;
> > +
> > + buffer->data_size = data_size;
> > + buffer->offsets_size = offsets_size;
> > + buffer->async_transaction = is_async;
> > + buffer->extra_buffers_size = extra_buffers_size;
> > + buffer->pid = pid;
>
> With this change, if there is a concurrent call to
> debug_low_async_space_locked, then there is a data race on the
> async_transaction field. Similarly for print_binder_buffer.
>
> Perhaps these writes should be moved before the mutex_unlock?

Also fixed, thanks!

--
Carlos Llamas

2023-12-01 07:12:31

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 13/21] binder: relocate low space calculation

On Tue, Nov 07, 2023 at 09:08:26AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > Move the low async space calculation to debug_low_async_space_locked().
> > This logic not only fits better here but also offloads some of the many
> > tasks currently done in binder_alloc_new_buf_locked().
> >
> > No functional change in this patch.
> >
> > Signed-off-by: Carlos Llamas <[email protected]>
>
> One suggestion below, but I'm fine either way.
>
> Reviewed-by: Alice Ryhl <[email protected]>
>
>
> Carlos Llamas <[email protected]> writes:
> > + if (debug_low_async_space_locked(alloc))
> > + buffer->oneway_spam_suspect = true;
>
> You could avoid a branch here like this:
>

Sure, sounds good to me.

--
Carlos Llamas

2023-12-01 07:16:21

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 14/21] binder: do not add pages to LRU in release path

On Tue, Nov 07, 2023 at 09:08:29AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > In binder_alloc_deferred_release() pages are added to the LRU list via
> > binder_free_buf_locked(). However, this is pointless because these pages
> > are kfree'd immediately afterwards. Add an option to skip the LRU list.
>
> They aren't freed with kfree, buf with __free_page.
>
> > Signed-off-by: Carlos Llamas <[email protected]>
>
> The change itself looks correct, so I'll give my tag for this:
>
> Reviewed-by: Alice Ryhl <[email protected]>
>
> But I'm wondering whether process cleanup really is so performance
> intensive to justify the added complexity of this?

So, this was needed on an earlier version of the patchset and I was
hoping that it would also help with an issue reported here:
https://lore.kernel.org/all/[email protected]/

However, I do agree that it is unecessary at this stage so I've decided
to drop it from v2.

Thanks,
--
Carlos Llamas

2023-12-01 07:19:20

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 16/21] binder: refactor page range allocation

On Tue, Nov 07, 2023 at 09:08:35AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > Instead of looping through the page range twice to first determine if
> > the mmap lock is required, simply do it per-page as needed. Split out
> > all this logic into a separate binder_get_user_page_remote() function.
> >
> > Signed-off-by: Carlos Llamas <[email protected]>
>
> One nit, otherwise looks okay to me.
> Reviewed-by: Alice Ryhl <[email protected]>
>
> Carlos Llamas <[email protected]> writes:
> > + unsigned long index;
> > - size_t index;
>
> You're changing the type of index to unsigned long here, but it seems to
> me that size_t is the correct type.

Perhaps size_t made sense before the addresses were also moved to
unsigned long. I think this is a better fit now.

--
Carlos Llamas

2023-12-01 07:21:40

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 17/21] binder: malloc new_buffer outside of locks

On Tue, Nov 07, 2023 at 09:08:38AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > Preallocate new_buffer before acquiring the alloc->mutex and hand it
> > down to binder_alloc_new_buf_locked(). The new buffer will be used in
> > the vast majority of requests (measured at 98.2% in field data). The
> > buffer is discarded otherwise. This change is required in preparation
> > for transitioning alloc->mutex into a spinlock in subsequent commits.
> >
> > Signed-off-by: Carlos Llamas <[email protected]>
>
> You also need to free the new buffer here:
>
> if (unlikely(!best_fit)) {
> binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
> "%d: binder_alloc_buf size %zd failed, no address space\n",
> alloc->pid, size);
> debug_no_space_locked(alloc);
> return ERR_PTR(-ENOSPC);
> }

Ouch! this is true and there is a second instance that needs the kfree
as well. Thanks for catching it.

--
Carlos Llamas

2023-12-01 07:39:35

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 19/21] binder: perform page allocation outside of locks

On Tue, Nov 07, 2023 at 09:08:43AM +0000, Alice Ryhl wrote:
> I would really like a comment on each function explaining that:
>
> * The binder_allocate_page_range function ensures that existing pages
> will not be reclaimed by the shrinker.
> * The binder_get_page_range function ensures that missing pages are
> allocated and inserted.

Ok, I think I rather go for a better naming than compensating through
comments, so I came up with the following names:
- binder_lru_freelist_{add,del}()
- binder_install_buffer_pages()

There will be more details in the v2. The new names give a clear
separation of the scope of these function.

> > mmap_write_lock(alloc->mm);
> > + if (lru_page->page_ptr)
> > + goto out;
>
> Another comment that I'd like to see somewhere is one that says
> something along these lines:
>
> Multiple processes may call `binder_get_user_page_remote` on the
> same page in parallel. When this happens, one of them will allocate
> the page and insert it, and the other process will use the mmap
> write lock to wait for the insertion to complete. This means that we
> can't use a mmap read lock here.
>

I've added a shorter version of this to v2, thanks.

> > + /* mark page insertion complete and safe to acquire */
> > + smp_store_release(&lru_page->page_ptr, page);
> > [snip]
> > + /* check if page insertion is marked complete by release */
> > + if (smp_load_acquire(&page->page_ptr))
> > + continue;
>
> We already discussed this when I asked you to make this an acquire /
> release operation so that it isn't racy, but it could use a comment
> explaining its purpose.

I've wrapped these calls into inline functions with better names in v2.
The purpose should now be evident.

>
> > mmap_write_lock(alloc->mm);
> > + if (lru_page->page_ptr)
> > + goto out;
> > +
> > if (!alloc->vma) {
> > pr_err("%d: %s failed, no vma\n", alloc->pid, __func__);
> > ret = -ESRCH;
> > goto out;
> > }
> >
> > page = alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO);
> > if (!page) {
> > pr_err("%d: failed to allocate page\n", alloc->pid);
> > ret = -ENOMEM;
> > goto out;
> > }
>
> Maybe it would be worth to allocate the page before taking the mmap
> write lock? It has the disadvantage that you may have to immediately
> deallocate it if we trigger the `if (lru_page->page_ptr) goto out`
> branch, but that shouldn't happen that often, and it would reduce the
> amount of time we spend holding the mmap write lock.

If we sleep on alloc_page() then chances are that having other tasks
allocating more pages could create more memory pressure. In some cases
this would be unecessary (e.g. if it's the same page). I do think this
could happen often since buffer requests tend to be < PAGE_SIZE and
adjecent too. I'll look into this with more detail and send a follow up
patch if needed. Thanks!

--
Carlos Llamas


2023-12-01 07:42:34

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 20/21] binder: reverse locking order in shrinker callback

On Tue, Nov 07, 2023 at 09:08:46AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> This reverses the order so that __free_page is called before
> zap_page_range_single. Is that okay?

It is not OK. Userspace would still have access to the page between
__free_page() and zap_page() calls. This is bad, so thanks for catching
this.

> Another option would be to do something along these lines:
>
> page_to_free = page->page_ptr;
> page->page_ptr = NULL;
>
> [snip]
>
> mmap_read_unlock(mm);
> mmput_async(mm);
> __free_page(page_to_free);
>
> This way you wouldn't reverse the order. Also, you reduce the amount of
> time you spend holding the mmap read lock, since the page is freed
> without holding the lock.
>

Thanks, I've added this approach to v2.

--
Carlos Llamas

2023-12-01 07:46:47

by Carlos Llamas

[permalink] [raw]
Subject: Re: [PATCH 21/21] binder: switch alloc->mutex to spinlock_t

On Tue, Nov 07, 2023 at 09:08:49AM +0000, Alice Ryhl wrote:
> Carlos Llamas <[email protected]> writes:
> > The alloc->mutex is a highly contended lock that causes performance
> > issues on Android devices. When a low-priority task is given this lock
> > and it sleeps, it becomes difficult for the task to wakeup and complete
> > its work. This delays other tasks that are also waiting on the mutex.
>
> Grammar nit: "to wake up"

OK.

>
> > The problem gets worse when there is memory pressure in the system,
> > because this increases the contention on the alloc->mutex while the
> > shrinker reclaims binder pages.
> >
> > Switching to a spinlock helps to keep the waiters running and avoids the
> > overhead of waking up tasks. This significantly improves the transaction
> > latency when the problematic scenario occurs.
> >
> > [snip]
> >
> > Note that it is only possible to convert this lock after a series of
> > changes made by previous patches. These mainly include refactoring the
> > sections that might_sleep() and changing the locking order with the
> > mmap_lock amongst others.
> >
> > Signed-off-by: Carlos Llamas <[email protected]>
>
> Nice!
>
> Reviewed-by: Alice Ryhl <[email protected]>
>
> > * binder_free_buf_locked(). However, that could
> > - * increase contention for the alloc mutex if clear_on_free
> > - * is used frequently for large buffers. The mutex is not
> > + * increase contention for the alloc->lock if clear_on_free
> > + * is used frequently for large buffers. This lock is not
>
> Grammar nit: Shouldn't this say "However, that could increase contention
> on alloc->lock if clear_on_free is used frequently for large buffers."?

Do you mean "contention for" vs "contention on"? They both seem correct
to me but this was written by Todd, so I'd trust his english much more
than mine.

--
Carlos Llamas