Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753462AbdDCNsL (ORCPT ); Mon, 3 Apr 2017 09:48:11 -0400 Received: from seldsegrel01.sonyericsson.com ([37.139.156.29]:8805 "EHLO SELDSEGREL01.sonyericsson.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753366AbdDCNsJ (ORCPT ); Mon, 3 Apr 2017 09:48:09 -0400 Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read() To: Douglas Anderson , , , References: <20170331175341.19889-1-dianders@chromium.org> CC: , , From: peter enderborg Message-ID: Date: Mon, 3 Apr 2017 15:47:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170331175341.19889-1-dianders@chromium.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 20231 Lines: 506 On 03/31/2017 07:53 PM, Douglas Anderson wrote: > Sometimes when we're out of memory the OOM killer decides to kill a > process that's in binder_thread_read(). If we happen to be waiting > for work we'll get the kill signal and wake up. That's good. ...but > then we try to grab the binder lock before we return. That's bad. > > The problem is that someone else might be holding the one true global > binder lock. If that one other process is blocked then we can't > finish exiting. In the worst case, the other process might be blocked > waiting for memory. In that case we'll have a really hard time > exiting. > > On older kernels that don't have the OOM reaper (or something > similar), like kernel 4.4, this is a really big problem and we end up > with a simple deadlock because: > * Once we pick a process to OOM kill we won't pick another--we first > wait for the process we picked to die. The reasoning is that we've > given the doomed process access to special memory pools so it can > quit quickly and we don't have special pool memory to go around. > * We don't have any type of "special access donation" that would give > the mutex holder our special access. > > On kernel 4.4 w/ binder patches, we easily see this happen: > > We just attempted this OOM kill: > Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB > > The doomed task: > Stack traceback for pid 4132 > 4132 3380 0 0 D Binder_1 > Call trace: > __switch_to+0x9c/0xa8 > __schedule+0x504/0x740 > schedule+0x88/0xa8 > schedule_preempt_disabled+0x28/0x44 > __mutex_lock_slowpath+0xf8/0x1a4 > mutex_lock+0x4c/0x68 > binder_thread_read+0x68c/0x11bc > binder_ioctl_write_read.constprop.46+0x1e8/0x318 > binder_ioctl+0x370/0x778 > compat_SyS_ioctl+0x134/0x10ac > el0_svc_naked+0x24/0x28 > > The binder lock holder: > 4001 3380 0 4 R Binder_7 > Call trace: > __switch_to+0x9c/0xa8 > __schedule+0x504/0x740 > preempt_schedule_common+0x28/0x48 > preempt_schedule+0x24/0x2c > _raw_spin_unlock+0x44/0x50 > list_lru_count_one+0x40/0x50 > super_cache_count+0x68/0xa0 > shrink_slab.part.54+0xfc/0x4a4 > shrink_zone+0xa8/0x198 > try_to_free_pages+0x408/0x590 > __alloc_pages_nodemask+0x60c/0x95c > __read_swap_cache_async+0x98/0x214 > read_swap_cache_async+0x48/0x7c > swapin_readahead+0x188/0x1b8 > handle_mm_fault+0xbf8/0x1050 > do_page_fault+0x140/0x2dc > do_mem_abort+0x64/0xd8 > Exception stack: < ... cut ... > > el1_da+0x18/0x78 > binder_ioctl+0x370/0x778 > compat_SyS_ioctl+0x134/0x10ac > el0_svc_naked+0x24/0x28 > > There's really not a lot of reason to grab the binder lock when we're > just going to exit anyway. Add a little bit of complexity to the code > to allow binder_thread_read() to sometimes return without the lock > held. > > NOTE: to do this safely we need to make sure that we can safely do > these things _without_ the global binder lock: > * Muck with proc->ready_threads > * Clear a bitfield in thread->looper > > It appears that those two operations don't need to be done together > and it's OK to have different locking solutions for the two. Thus: > > 1. We change thread->looper to atomic_t and use atomic ops to muck > with it. This means we're not clobbering someone else's work with > our read/modify/write. > > Note that I haven't confirmed that every modify of "thread->looper" > can be done without the binder mutex or without some > coarser-grained lock. ...but clearing the > BINDER_LOOPER_STATE_WAITING bit should be fine. The only place > looking at it is binder_deferred_flush(). Also: while erroring out > we also may clear BINDER_LOOPER_STATE_NEED_RETURN. This looks to > be OK, though the code isn't trivial to follow. > > 2. We add a new fine-grained mutex (per "proc" structure) to guard all > of the "_threads" counters. Technically the only value we're > modifying is "proc->ready_threads" and we're just decrementing it > (so maybe we could use atomic_t here, too?). ...but it appears > that all of the "_threads" work together so protecting them > together seems to make the most sense. > > Note that to avoid deadlock: > * We never first grab the fine grained mutex and _then_ the binder > mutex. > * We always grab the fine grained mutex for short periods and never > do anything blocking while holding it. > > To sum it all up: > > 1. This patch does improve the chances that we will be able to kill a > task quickly when we want to. That means this patch has some > merit. > 2. Though this patch has merit, it isn't isn't actually all that > critical because: > 2a) On newer kernels the OOM reaper should keep us from getting > deadlocked. > 2b) Even on old kernels there are other deadlock cases we hit even > if we fix binder in this way. > > Signed-off-by: Douglas Anderson > --- > It's unclear to me if we really want this patch since, as per the > description, it's not all that critical. I also don't personally have > enough context with Binder to prove that every change it makes is > guaranteed not to screw something up. > > I post it in case someone is interested or someone who knows the code > well can say "yes, this is right and good". :) > > This patch was tested against the Chrome OS 4.4 kernel. It was only > compile-tested against upstream since I don't have binder up and > running there. > > drivers/android/binder.c | 116 +++++++++++++++++++++++++++++++++-------------- > 1 file changed, 81 insertions(+), 35 deletions(-) > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > index aae4d8d4be36..220b74b7100d 100644 > --- a/drivers/android/binder.c > +++ b/drivers/android/binder.c > @@ -18,6 +18,7 @@ > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > #include > +#include > #include > #include > #include > @@ -347,10 +348,13 @@ struct binder_proc { > wait_queue_head_t wait; > struct binder_stats stats; > struct list_head delivered_death; > + > int max_threads; > int requested_threads; > int requested_threads_started; > int ready_threads; > + struct mutex threads_lock; > + > long default_priority; > struct dentry *debugfs_entry; > struct binder_context *context; > @@ -369,7 +373,7 @@ struct binder_thread { > struct binder_proc *proc; > struct rb_node rb_node; > int pid; > - int looper; > + atomic_t looper; > struct binder_transaction *transaction_stack; > struct list_head todo; > uint32_t return_error; /* Write failed, return error code in read buf */ > @@ -458,6 +462,15 @@ static inline void binder_lock(const char *tag) > trace_binder_locked(tag); > } > > +static inline int __must_check binder_lock_interruptible(const char *tag) > +{ > + trace_binder_lock(tag); > + if (mutex_lock_interruptible(&binder_main_lock)) > + return -ERESTARTSYS; > + trace_binder_locked(tag); > + return 0; > +} > + > static inline void binder_unlock(const char *tag) > { > trace_binder_unlock(tag); > @@ -2450,39 +2463,41 @@ static int binder_thread_write(struct binder_proc *proc, > } > > case BC_REGISTER_LOOPER: > + mutex_lock(&proc->threads_lock); > binder_debug(BINDER_DEBUG_THREADS, > "%d:%d BC_REGISTER_LOOPER\n", > proc->pid, thread->pid); > - if (thread->looper & BINDER_LOOPER_STATE_ENTERED) { > - thread->looper |= BINDER_LOOPER_STATE_INVALID; > + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_ENTERED) { > + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper); > binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n", > proc->pid, thread->pid); > } else if (proc->requested_threads == 0) { > - thread->looper |= BINDER_LOOPER_STATE_INVALID; > + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper); > binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called without request\n", > proc->pid, thread->pid); > } else { > proc->requested_threads--; > proc->requested_threads_started++; > } > - thread->looper |= BINDER_LOOPER_STATE_REGISTERED; > + atomic_or(BINDER_LOOPER_STATE_REGISTERED, &thread->looper); > + mutex_unlock(&proc->threads_lock); > break; > case BC_ENTER_LOOPER: > binder_debug(BINDER_DEBUG_THREADS, > "%d:%d BC_ENTER_LOOPER\n", > proc->pid, thread->pid); > - if (thread->looper & BINDER_LOOPER_STATE_REGISTERED) { > - thread->looper |= BINDER_LOOPER_STATE_INVALID; > + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_REGISTERED) { > + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper); > binder_user_error("%d:%d ERROR: BC_ENTER_LOOPER called after BC_REGISTER_LOOPER\n", > proc->pid, thread->pid); > } > - thread->looper |= BINDER_LOOPER_STATE_ENTERED; > + atomic_or(BINDER_LOOPER_STATE_ENTERED, &thread->looper); > break; > case BC_EXIT_LOOPER: > binder_debug(BINDER_DEBUG_THREADS, > "%d:%d BC_EXIT_LOOPER\n", > proc->pid, thread->pid); > - thread->looper |= BINDER_LOOPER_STATE_EXITED; > + atomic_or(BINDER_LOOPER_STATE_EXITED, &thread->looper); > break; > > case BC_REQUEST_DEATH_NOTIFICATION: > @@ -2538,7 +2553,7 @@ static int binder_thread_write(struct binder_proc *proc, > ref->death = death; > if (ref->node->proc == NULL) { > ref->death->work.type = BINDER_WORK_DEAD_BINDER; > - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > list_add_tail(&ref->death->work.entry, &thread->todo); > } else { > list_add_tail(&ref->death->work.entry, &proc->todo); > @@ -2562,7 +2577,7 @@ static int binder_thread_write(struct binder_proc *proc, > ref->death = NULL; > if (list_empty(&death->work.entry)) { > death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION; > - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > list_add_tail(&death->work.entry, &thread->todo); > } else { > list_add_tail(&death->work.entry, &proc->todo); > @@ -2604,7 +2619,7 @@ static int binder_thread_write(struct binder_proc *proc, > list_del_init(&death->work.entry); > if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) { > death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION; > - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) { > list_add_tail(&death->work.entry, &thread->todo); > } else { > list_add_tail(&death->work.entry, &proc->todo); > @@ -2638,19 +2653,20 @@ static int binder_has_proc_work(struct binder_proc *proc, > struct binder_thread *thread) > { > return !list_empty(&proc->todo) || > - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN); > + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN); > } > > static int binder_has_thread_work(struct binder_thread *thread) > { > return !list_empty(&thread->todo) || thread->return_error != BR_OK || > - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN); > + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN); > } > > static int binder_thread_read(struct binder_proc *proc, > struct binder_thread *thread, > binder_uintptr_t binder_buffer, size_t size, > - binder_size_t *consumed, int non_block) > + binder_size_t *consumed, int non_block, > + bool *locked) > { > void __user *buffer = (void __user *)(uintptr_t)binder_buffer; > void __user *ptr = buffer + *consumed; > @@ -2687,21 +2703,24 @@ static int binder_thread_read(struct binder_proc *proc, > goto done; > } > > - > - thread->looper |= BINDER_LOOPER_STATE_WAITING; > + atomic_or(BINDER_LOOPER_STATE_WAITING, &thread->looper); > + mutex_lock(&proc->threads_lock); > if (wait_for_proc_work) > proc->ready_threads++; > + mutex_unlock(&proc->threads_lock); > > binder_unlock(__func__); > + *locked = false; > > trace_binder_wait_for_work(wait_for_proc_work, > !!thread->transaction_stack, > !list_empty(&thread->todo)); > if (wait_for_proc_work) { > - if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED | > - BINDER_LOOPER_STATE_ENTERED))) { > + if (!(atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | > + BINDER_LOOPER_STATE_ENTERED))) { > binder_user_error("%d:%d ERROR: Thread waiting for process work before calling BC_REGISTER_LOOPER or BC_ENTER_LOOPER (state %x)\n", > - proc->pid, thread->pid, thread->looper); > + proc->pid, thread->pid, > + atomic_read(&thread->looper)); > wait_event_interruptible(binder_user_error_wait, > binder_stop_on_user_error < 2); > } > @@ -2719,14 +2738,23 @@ static int binder_thread_read(struct binder_proc *proc, > ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread)); > } > > - binder_lock(__func__); > - > + /* > + * Update these _without_ grabbing the binder lock since we might be > + * about to return an error. > + */ > + mutex_lock(&proc->threads_lock); > if (wait_for_proc_work) > proc->ready_threads--; > - thread->looper &= ~BINDER_LOOPER_STATE_WAITING; > + mutex_unlock(&proc->threads_lock); > + atomic_and(~BINDER_LOOPER_STATE_WAITING, &thread->looper); > + > + if (ret) > + return ret; > > + ret = binder_lock_interruptible(__func__); > if (ret) > return ret; > + *locked = true; > > while (1) { > uint32_t cmd; > @@ -2743,7 +2771,7 @@ static int binder_thread_read(struct binder_proc *proc, > } else { > /* no data added */ > if (ptr - buffer == 4 && > - !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN)) > + !(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN)) > goto retry; > break; > } > @@ -2957,19 +2985,23 @@ static int binder_thread_read(struct binder_proc *proc, > done: > > *consumed = ptr - buffer; > + mutex_lock(&proc->threads_lock); > if (proc->requested_threads + proc->ready_threads == 0 && > proc->requested_threads_started < proc->max_threads && > - (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | > + (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | > BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */ > /*spawn a new thread if we leave this out */) { > proc->requested_threads++; > binder_debug(BINDER_DEBUG_THREADS, > "%d:%d BR_SPAWN_LOOPER\n", > proc->pid, thread->pid); > - if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) > + if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) { > + mutex_unlock(&proc->threads_lock); > return -EFAULT; > + } > binder_stat_br(proc, thread, BR_SPAWN_LOOPER); > } > + mutex_unlock(&proc->threads_lock); > return 0; > } > > @@ -3051,7 +3083,7 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc) > INIT_LIST_HEAD(&thread->todo); > rb_link_node(&thread->rb_node, parent, p); > rb_insert_color(&thread->rb_node, &proc->threads); > - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN; > + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper); > thread->return_error = BR_OK; > thread->return_error2 = BR_OK; > } > @@ -3133,7 +3165,8 @@ static unsigned int binder_poll(struct file *filp, > > static int binder_ioctl_write_read(struct file *filp, > unsigned int cmd, unsigned long arg, > - struct binder_thread *thread) > + struct binder_thread *thread, > + bool *locked) > { > int ret = 0; > struct binder_proc *proc = filp->private_data; > @@ -3172,7 +3205,7 @@ static int binder_ioctl_write_read(struct file *filp, > ret = binder_thread_read(proc, thread, bwr.read_buffer, > bwr.read_size, > &bwr.read_consumed, > - filp->f_flags & O_NONBLOCK); > + filp->f_flags & O_NONBLOCK, locked); > trace_binder_read_done(ret); > if (!list_empty(&proc->todo)) > wake_up_interruptible(&proc->wait); > @@ -3243,6 +3276,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > struct binder_thread *thread; > unsigned int size = _IOC_SIZE(cmd); > void __user *ubuf = (void __user *)arg; > + bool locked; > > /*pr_info("binder_ioctl: %d:%d %x %lx\n", > proc->pid, current->pid, cmd, arg);*/ > @@ -3258,6 +3292,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > goto err_unlocked; > > binder_lock(__func__); > + locked = true; > thread = binder_get_thread(proc); > if (thread == NULL) { > ret = -ENOMEM; > @@ -3266,15 +3301,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > switch (cmd) { > case BINDER_WRITE_READ: > - ret = binder_ioctl_write_read(filp, cmd, arg, thread); > + ret = binder_ioctl_write_read(filp, cmd, arg, thread, &locked); > if (ret) > goto err; > break; > case BINDER_SET_MAX_THREADS: > + mutex_lock(&proc->threads_lock); > if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) { > ret = -EINVAL; > + mutex_unlock(&proc->threads_lock); > goto err; > } > + mutex_unlock(&proc->threads_lock); > break; > case BINDER_SET_CONTEXT_MGR: > ret = binder_ioctl_set_ctx_mgr(filp); > @@ -3308,8 +3346,9 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > ret = 0; > err: > if (thread) > - thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN; > - binder_unlock(__func__); > + atomic_and(~BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper); > + if (locked) > + binder_unlock(__func__); > wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2); > if (ret && ret != -ERESTARTSYS) > pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret); > @@ -3473,6 +3512,7 @@ static int binder_open(struct inode *nodp, struct file *filp) > binder_dev = container_of(filp->private_data, struct binder_device, > miscdev); > proc->context = &binder_dev->context; > + mutex_init(&proc->threads_lock); > > binder_lock(__func__); > > @@ -3521,8 +3561,9 @@ static void binder_deferred_flush(struct binder_proc *proc) > for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) { > struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node); > > - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN; > - if (thread->looper & BINDER_LOOPER_STATE_WAITING) { > + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper); > + if (atomic_read(&thread->looper) & > + BINDER_LOOPER_STATE_WAITING) { > wake_up_interruptible(&thread->wait); > wake_count++; > } > @@ -3827,7 +3868,8 @@ static void print_binder_thread(struct seq_file *m, > size_t start_pos = m->count; > size_t header_pos; > > - seq_printf(m, " thread %d: l %02x\n", thread->pid, thread->looper); > + seq_printf(m, " thread %d: l %02x\n", thread->pid, > + atomic_read(&thread->looper)); > header_pos = m->count; > t = thread->transaction_stack; > while (t) { > @@ -4019,6 +4061,8 @@ static void print_binder_proc_stats(struct seq_file *m, > struct rb_node *n; > int count, strong, weak; > > + mutex_lock(&proc->threads_lock); > + > seq_printf(m, "proc %d\n", proc->pid); > seq_printf(m, "context %s\n", proc->context->name); > count = 0; > @@ -4064,6 +4108,8 @@ static void print_binder_proc_stats(struct seq_file *m, > seq_printf(m, " pending transactions: %d\n", count); > > print_binder_stats(m, " ", &proc->stats); > + > + mutex_unlock(&proc->threads_lock); > } > > I think the real problem is that get locked in kernel space waiting for a mutex lock. It should use mutex_lock_interruptible.