2017-03-31 17:54:31

by Doug Anderson

[permalink] [raw]
Subject: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Sometimes when we're out of memory the OOM killer decides to kill a
process that's in binder_thread_read(). If we happen to be waiting
for work we'll get the kill signal and wake up. That's good. ...but
then we try to grab the binder lock before we return. That's bad.

The problem is that someone else might be holding the one true global
binder lock. If that one other process is blocked then we can't
finish exiting. In the worst case, the other process might be blocked
waiting for memory. In that case we'll have a really hard time
exiting.

On older kernels that don't have the OOM reaper (or something
similar), like kernel 4.4, this is a really big problem and we end up
with a simple deadlock because:
* Once we pick a process to OOM kill we won't pick another--we first
wait for the process we picked to die. The reasoning is that we've
given the doomed process access to special memory pools so it can
quit quickly and we don't have special pool memory to go around.
* We don't have any type of "special access donation" that would give
the mutex holder our special access.

On kernel 4.4 w/ binder patches, we easily see this happen:

We just attempted this OOM kill:
Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB

The doomed task:
Stack traceback for pid 4132
4132 3380 0 0 D Binder_1
Call trace:
__switch_to+0x9c/0xa8
__schedule+0x504/0x740
schedule+0x88/0xa8
schedule_preempt_disabled+0x28/0x44
__mutex_lock_slowpath+0xf8/0x1a4
mutex_lock+0x4c/0x68
binder_thread_read+0x68c/0x11bc
binder_ioctl_write_read.constprop.46+0x1e8/0x318
binder_ioctl+0x370/0x778
compat_SyS_ioctl+0x134/0x10ac
el0_svc_naked+0x24/0x28

The binder lock holder:
4001 3380 0 4 R Binder_7
Call trace:
__switch_to+0x9c/0xa8
__schedule+0x504/0x740
preempt_schedule_common+0x28/0x48
preempt_schedule+0x24/0x2c
_raw_spin_unlock+0x44/0x50
list_lru_count_one+0x40/0x50
super_cache_count+0x68/0xa0
shrink_slab.part.54+0xfc/0x4a4
shrink_zone+0xa8/0x198
try_to_free_pages+0x408/0x590
__alloc_pages_nodemask+0x60c/0x95c
__read_swap_cache_async+0x98/0x214
read_swap_cache_async+0x48/0x7c
swapin_readahead+0x188/0x1b8
handle_mm_fault+0xbf8/0x1050
do_page_fault+0x140/0x2dc
do_mem_abort+0x64/0xd8
Exception stack: < ... cut ... >
el1_da+0x18/0x78
binder_ioctl+0x370/0x778
compat_SyS_ioctl+0x134/0x10ac
el0_svc_naked+0x24/0x28

There's really not a lot of reason to grab the binder lock when we're
just going to exit anyway. Add a little bit of complexity to the code
to allow binder_thread_read() to sometimes return without the lock
held.

NOTE: to do this safely we need to make sure that we can safely do
these things _without_ the global binder lock:
* Muck with proc->ready_threads
* Clear a bitfield in thread->looper

It appears that those two operations don't need to be done together
and it's OK to have different locking solutions for the two. Thus:

1. We change thread->looper to atomic_t and use atomic ops to muck
with it. This means we're not clobbering someone else's work with
our read/modify/write.

Note that I haven't confirmed that every modify of "thread->looper"
can be done without the binder mutex or without some
coarser-grained lock. ...but clearing the
BINDER_LOOPER_STATE_WAITING bit should be fine. The only place
looking at it is binder_deferred_flush(). Also: while erroring out
we also may clear BINDER_LOOPER_STATE_NEED_RETURN. This looks to
be OK, though the code isn't trivial to follow.

2. We add a new fine-grained mutex (per "proc" structure) to guard all
of the "_threads" counters. Technically the only value we're
modifying is "proc->ready_threads" and we're just decrementing it
(so maybe we could use atomic_t here, too?). ...but it appears
that all of the "_threads" work together so protecting them
together seems to make the most sense.

Note that to avoid deadlock:
* We never first grab the fine grained mutex and _then_ the binder
mutex.
* We always grab the fine grained mutex for short periods and never
do anything blocking while holding it.

To sum it all up:

1. This patch does improve the chances that we will be able to kill a
task quickly when we want to. That means this patch has some
merit.
2. Though this patch has merit, it isn't isn't actually all that
critical because:
2a) On newer kernels the OOM reaper should keep us from getting
deadlocked.
2b) Even on old kernels there are other deadlock cases we hit even
if we fix binder in this way.

Signed-off-by: Douglas Anderson <[email protected]>
---
It's unclear to me if we really want this patch since, as per the
description, it's not all that critical. I also don't personally have
enough context with Binder to prove that every change it makes is
guaranteed not to screw something up.

I post it in case someone is interested or someone who knows the code
well can say "yes, this is right and good". :)

This patch was tested against the Chrome OS 4.4 kernel. It was only
compile-tested against upstream since I don't have binder up and
running there.

drivers/android/binder.c | 116 +++++++++++++++++++++++++++++++++--------------
1 file changed, 81 insertions(+), 35 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index aae4d8d4be36..220b74b7100d 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -18,6 +18,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <asm/cacheflush.h>
+#include <linux/atomic.h>
#include <linux/fdtable.h>
#include <linux/file.h>
#include <linux/freezer.h>
@@ -347,10 +348,13 @@ struct binder_proc {
wait_queue_head_t wait;
struct binder_stats stats;
struct list_head delivered_death;
+
int max_threads;
int requested_threads;
int requested_threads_started;
int ready_threads;
+ struct mutex threads_lock;
+
long default_priority;
struct dentry *debugfs_entry;
struct binder_context *context;
@@ -369,7 +373,7 @@ struct binder_thread {
struct binder_proc *proc;
struct rb_node rb_node;
int pid;
- int looper;
+ atomic_t looper;
struct binder_transaction *transaction_stack;
struct list_head todo;
uint32_t return_error; /* Write failed, return error code in read buf */
@@ -458,6 +462,15 @@ static inline void binder_lock(const char *tag)
trace_binder_locked(tag);
}

+static inline int __must_check binder_lock_interruptible(const char *tag)
+{
+ trace_binder_lock(tag);
+ if (mutex_lock_interruptible(&binder_main_lock))
+ return -ERESTARTSYS;
+ trace_binder_locked(tag);
+ return 0;
+}
+
static inline void binder_unlock(const char *tag)
{
trace_binder_unlock(tag);
@@ -2450,39 +2463,41 @@ static int binder_thread_write(struct binder_proc *proc,
}

case BC_REGISTER_LOOPER:
+ mutex_lock(&proc->threads_lock);
binder_debug(BINDER_DEBUG_THREADS,
"%d:%d BC_REGISTER_LOOPER\n",
proc->pid, thread->pid);
- if (thread->looper & BINDER_LOOPER_STATE_ENTERED) {
- thread->looper |= BINDER_LOOPER_STATE_INVALID;
+ if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_ENTERED) {
+ atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n",
proc->pid, thread->pid);
} else if (proc->requested_threads == 0) {
- thread->looper |= BINDER_LOOPER_STATE_INVALID;
+ atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called without request\n",
proc->pid, thread->pid);
} else {
proc->requested_threads--;
proc->requested_threads_started++;
}
- thread->looper |= BINDER_LOOPER_STATE_REGISTERED;
+ atomic_or(BINDER_LOOPER_STATE_REGISTERED, &thread->looper);
+ mutex_unlock(&proc->threads_lock);
break;
case BC_ENTER_LOOPER:
binder_debug(BINDER_DEBUG_THREADS,
"%d:%d BC_ENTER_LOOPER\n",
proc->pid, thread->pid);
- if (thread->looper & BINDER_LOOPER_STATE_REGISTERED) {
- thread->looper |= BINDER_LOOPER_STATE_INVALID;
+ if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_REGISTERED) {
+ atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
binder_user_error("%d:%d ERROR: BC_ENTER_LOOPER called after BC_REGISTER_LOOPER\n",
proc->pid, thread->pid);
}
- thread->looper |= BINDER_LOOPER_STATE_ENTERED;
+ atomic_or(BINDER_LOOPER_STATE_ENTERED, &thread->looper);
break;
case BC_EXIT_LOOPER:
binder_debug(BINDER_DEBUG_THREADS,
"%d:%d BC_EXIT_LOOPER\n",
proc->pid, thread->pid);
- thread->looper |= BINDER_LOOPER_STATE_EXITED;
+ atomic_or(BINDER_LOOPER_STATE_EXITED, &thread->looper);
break;

case BC_REQUEST_DEATH_NOTIFICATION:
@@ -2538,7 +2553,7 @@ static int binder_thread_write(struct binder_proc *proc,
ref->death = death;
if (ref->node->proc == NULL) {
ref->death->work.type = BINDER_WORK_DEAD_BINDER;
- if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
+ if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
list_add_tail(&ref->death->work.entry, &thread->todo);
} else {
list_add_tail(&ref->death->work.entry, &proc->todo);
@@ -2562,7 +2577,7 @@ static int binder_thread_write(struct binder_proc *proc,
ref->death = NULL;
if (list_empty(&death->work.entry)) {
death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
- if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
+ if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
list_add_tail(&death->work.entry, &thread->todo);
} else {
list_add_tail(&death->work.entry, &proc->todo);
@@ -2604,7 +2619,7 @@ static int binder_thread_write(struct binder_proc *proc,
list_del_init(&death->work.entry);
if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) {
death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
- if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
+ if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
list_add_tail(&death->work.entry, &thread->todo);
} else {
list_add_tail(&death->work.entry, &proc->todo);
@@ -2638,19 +2653,20 @@ static int binder_has_proc_work(struct binder_proc *proc,
struct binder_thread *thread)
{
return !list_empty(&proc->todo) ||
- (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
+ (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
}

static int binder_has_thread_work(struct binder_thread *thread)
{
return !list_empty(&thread->todo) || thread->return_error != BR_OK ||
- (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
+ (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
}

static int binder_thread_read(struct binder_proc *proc,
struct binder_thread *thread,
binder_uintptr_t binder_buffer, size_t size,
- binder_size_t *consumed, int non_block)
+ binder_size_t *consumed, int non_block,
+ bool *locked)
{
void __user *buffer = (void __user *)(uintptr_t)binder_buffer;
void __user *ptr = buffer + *consumed;
@@ -2687,21 +2703,24 @@ static int binder_thread_read(struct binder_proc *proc,
goto done;
}

-
- thread->looper |= BINDER_LOOPER_STATE_WAITING;
+ atomic_or(BINDER_LOOPER_STATE_WAITING, &thread->looper);
+ mutex_lock(&proc->threads_lock);
if (wait_for_proc_work)
proc->ready_threads++;
+ mutex_unlock(&proc->threads_lock);

binder_unlock(__func__);
+ *locked = false;

trace_binder_wait_for_work(wait_for_proc_work,
!!thread->transaction_stack,
!list_empty(&thread->todo));
if (wait_for_proc_work) {
- if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
- BINDER_LOOPER_STATE_ENTERED))) {
+ if (!(atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
+ BINDER_LOOPER_STATE_ENTERED))) {
binder_user_error("%d:%d ERROR: Thread waiting for process work before calling BC_REGISTER_LOOPER or BC_ENTER_LOOPER (state %x)\n",
- proc->pid, thread->pid, thread->looper);
+ proc->pid, thread->pid,
+ atomic_read(&thread->looper));
wait_event_interruptible(binder_user_error_wait,
binder_stop_on_user_error < 2);
}
@@ -2719,14 +2738,23 @@ static int binder_thread_read(struct binder_proc *proc,
ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
}

- binder_lock(__func__);
-
+ /*
+ * Update these _without_ grabbing the binder lock since we might be
+ * about to return an error.
+ */
+ mutex_lock(&proc->threads_lock);
if (wait_for_proc_work)
proc->ready_threads--;
- thread->looper &= ~BINDER_LOOPER_STATE_WAITING;
+ mutex_unlock(&proc->threads_lock);
+ atomic_and(~BINDER_LOOPER_STATE_WAITING, &thread->looper);
+
+ if (ret)
+ return ret;

+ ret = binder_lock_interruptible(__func__);
if (ret)
return ret;
+ *locked = true;

while (1) {
uint32_t cmd;
@@ -2743,7 +2771,7 @@ static int binder_thread_read(struct binder_proc *proc,
} else {
/* no data added */
if (ptr - buffer == 4 &&
- !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN))
+ !(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN))
goto retry;
break;
}
@@ -2957,19 +2985,23 @@ static int binder_thread_read(struct binder_proc *proc,
done:

*consumed = ptr - buffer;
+ mutex_lock(&proc->threads_lock);
if (proc->requested_threads + proc->ready_threads == 0 &&
proc->requested_threads_started < proc->max_threads &&
- (thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
+ (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */
/*spawn a new thread if we leave this out */) {
proc->requested_threads++;
binder_debug(BINDER_DEBUG_THREADS,
"%d:%d BR_SPAWN_LOOPER\n",
proc->pid, thread->pid);
- if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer))
+ if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) {
+ mutex_unlock(&proc->threads_lock);
return -EFAULT;
+ }
binder_stat_br(proc, thread, BR_SPAWN_LOOPER);
}
+ mutex_unlock(&proc->threads_lock);
return 0;
}

@@ -3051,7 +3083,7 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc)
INIT_LIST_HEAD(&thread->todo);
rb_link_node(&thread->rb_node, parent, p);
rb_insert_color(&thread->rb_node, &proc->threads);
- thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
+ atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
thread->return_error = BR_OK;
thread->return_error2 = BR_OK;
}
@@ -3133,7 +3165,8 @@ static unsigned int binder_poll(struct file *filp,

static int binder_ioctl_write_read(struct file *filp,
unsigned int cmd, unsigned long arg,
- struct binder_thread *thread)
+ struct binder_thread *thread,
+ bool *locked)
{
int ret = 0;
struct binder_proc *proc = filp->private_data;
@@ -3172,7 +3205,7 @@ static int binder_ioctl_write_read(struct file *filp,
ret = binder_thread_read(proc, thread, bwr.read_buffer,
bwr.read_size,
&bwr.read_consumed,
- filp->f_flags & O_NONBLOCK);
+ filp->f_flags & O_NONBLOCK, locked);
trace_binder_read_done(ret);
if (!list_empty(&proc->todo))
wake_up_interruptible(&proc->wait);
@@ -3243,6 +3276,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
struct binder_thread *thread;
unsigned int size = _IOC_SIZE(cmd);
void __user *ubuf = (void __user *)arg;
+ bool locked;

/*pr_info("binder_ioctl: %d:%d %x %lx\n",
proc->pid, current->pid, cmd, arg);*/
@@ -3258,6 +3292,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
goto err_unlocked;

binder_lock(__func__);
+ locked = true;
thread = binder_get_thread(proc);
if (thread == NULL) {
ret = -ENOMEM;
@@ -3266,15 +3301,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)

switch (cmd) {
case BINDER_WRITE_READ:
- ret = binder_ioctl_write_read(filp, cmd, arg, thread);
+ ret = binder_ioctl_write_read(filp, cmd, arg, thread, &locked);
if (ret)
goto err;
break;
case BINDER_SET_MAX_THREADS:
+ mutex_lock(&proc->threads_lock);
if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) {
ret = -EINVAL;
+ mutex_unlock(&proc->threads_lock);
goto err;
}
+ mutex_unlock(&proc->threads_lock);
break;
case BINDER_SET_CONTEXT_MGR:
ret = binder_ioctl_set_ctx_mgr(filp);
@@ -3308,8 +3346,9 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
ret = 0;
err:
if (thread)
- thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN;
- binder_unlock(__func__);
+ atomic_and(~BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
+ if (locked)
+ binder_unlock(__func__);
wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2);
if (ret && ret != -ERESTARTSYS)
pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret);
@@ -3473,6 +3512,7 @@ static int binder_open(struct inode *nodp, struct file *filp)
binder_dev = container_of(filp->private_data, struct binder_device,
miscdev);
proc->context = &binder_dev->context;
+ mutex_init(&proc->threads_lock);

binder_lock(__func__);

@@ -3521,8 +3561,9 @@ static void binder_deferred_flush(struct binder_proc *proc)
for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) {
struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node);

- thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
- if (thread->looper & BINDER_LOOPER_STATE_WAITING) {
+ atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
+ if (atomic_read(&thread->looper) &
+ BINDER_LOOPER_STATE_WAITING) {
wake_up_interruptible(&thread->wait);
wake_count++;
}
@@ -3827,7 +3868,8 @@ static void print_binder_thread(struct seq_file *m,
size_t start_pos = m->count;
size_t header_pos;

- seq_printf(m, " thread %d: l %02x\n", thread->pid, thread->looper);
+ seq_printf(m, " thread %d: l %02x\n", thread->pid,
+ atomic_read(&thread->looper));
header_pos = m->count;
t = thread->transaction_stack;
while (t) {
@@ -4019,6 +4061,8 @@ static void print_binder_proc_stats(struct seq_file *m,
struct rb_node *n;
int count, strong, weak;

+ mutex_lock(&proc->threads_lock);
+
seq_printf(m, "proc %d\n", proc->pid);
seq_printf(m, "context %s\n", proc->context->name);
count = 0;
@@ -4064,6 +4108,8 @@ static void print_binder_proc_stats(struct seq_file *m,
seq_printf(m, " pending transactions: %d\n", count);

print_binder_stats(m, " ", &proc->stats);
+
+ mutex_unlock(&proc->threads_lock);
}


--
2.12.2.564.g063fe858b8-goog


2017-03-31 19:30:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On Fri, Mar 31, 2017 at 10:53:41AM -0700, Douglas Anderson wrote:
> Sometimes when we're out of memory the OOM killer decides to kill a
> process that's in binder_thread_read(). If we happen to be waiting
> for work we'll get the kill signal and wake up. That's good. ...but
> then we try to grab the binder lock before we return. That's bad.
>
> The problem is that someone else might be holding the one true global
> binder lock. If that one other process is blocked then we can't
> finish exiting. In the worst case, the other process might be blocked
> waiting for memory. In that case we'll have a really hard time
> exiting.
>
> On older kernels that don't have the OOM reaper (or something
> similar), like kernel 4.4, this is a really big problem and we end up
> with a simple deadlock because:
> * Once we pick a process to OOM kill we won't pick another--we first
> wait for the process we picked to die. The reasoning is that we've
> given the doomed process access to special memory pools so it can
> quit quickly and we don't have special pool memory to go around.
> * We don't have any type of "special access donation" that would give
> the mutex holder our special access.
>
> On kernel 4.4 w/ binder patches, we easily see this happen:

<snip>

How does your change interact with the recent "break up the binder big
lock" patchset:
https://android-review.googlesource.com/#/c/354698/

Have you tried that series out to see if it helps out any?

thanks,

greg k-h

2017-03-31 21:00:17

by Doug Anderson

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Hi,

On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <[email protected]> wrote:
> On Fri, Mar 31, 2017 at 10:53:41AM -0700, Douglas Anderson wrote:
>> Sometimes when we're out of memory the OOM killer decides to kill a
>> process that's in binder_thread_read(). If we happen to be waiting
>> for work we'll get the kill signal and wake up. That's good. ...but
>> then we try to grab the binder lock before we return. That's bad.
>>
>> The problem is that someone else might be holding the one true global
>> binder lock. If that one other process is blocked then we can't
>> finish exiting. In the worst case, the other process might be blocked
>> waiting for memory. In that case we'll have a really hard time
>> exiting.
>>
>> On older kernels that don't have the OOM reaper (or something
>> similar), like kernel 4.4, this is a really big problem and we end up
>> with a simple deadlock because:
>> * Once we pick a process to OOM kill we won't pick another--we first
>> wait for the process we picked to die. The reasoning is that we've
>> given the doomed process access to special memory pools so it can
>> quit quickly and we don't have special pool memory to go around.
>> * We don't have any type of "special access donation" that would give
>> the mutex holder our special access.
>>
>> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> <snip>
>
> How does your change interact with the recent "break up the binder big
> lock" patchset:
> https://android-review.googlesource.com/#/c/354698/
>
> Have you tried that series out to see if it helps out any?

I wasn't aware of that patchset. Someone else on my team mentioned
that fine-grained locking was being worked on but I didn't know
patches were actually posted... Probably it makes sense to just drop
my patch, then. It was only making things marginally better even on
kernel 4.4 because I would just hit the next task that would refuse to
quit for a non-binder related reason. :(

BTW: I presume that nobody has decided that it would be a wise idea to
pick the OOM reaper code back to any stable trees? It seemed a bit
too scary to me, so I wrote a dumber (but easier to backport) solution
that avoided the deadlocks I was seeing. http://crosreview.com/465189
and the 3 patches above it in case anyone else stumbles on this thread
and is curious.


-Doug

2017-04-01 06:49:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <[email protected]> wrote:
> BTW: I presume that nobody has decided that it would be a wise idea to
> pick the OOM reaper code back to any stable trees? It seemed a bit
> too scary to me, so I wrote a dumber (but easier to backport) solution
> that avoided the deadlocks I was seeing. http://crosreview.com/465189
> and the 3 patches above it in case anyone else stumbles on this thread
> and is curious.

What specific upstream OOM patches are you referring to? I'm always
glad to review patches for stable kernels, just email
[email protected] the git commit ids and we can take it from there.

thanks,

greg k-h

2017-04-02 02:34:58

by Doug Anderson

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Hi,

On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <[email protected]> wrote:
> On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
>> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <[email protected]> wrote:
>> BTW: I presume that nobody has decided that it would be a wise idea to
>> pick the OOM reaper code back to any stable trees? It seemed a bit
>> too scary to me, so I wrote a dumber (but easier to backport) solution
>> that avoided the deadlocks I was seeing. http://crosreview.com/465189
>> and the 3 patches above it in case anyone else stumbles on this thread
>> and is curious.
>
> What specific upstream OOM patches are you referring to? I'm always
> glad to review patches for stable kernels, just email
> [email protected] the git commit ids and we can take it from there.

+stable

I was wondering about the concept of porting the OOM Reaper back to
older kernels. The OOM reaper was originally introduced in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855

Basically the problem described in that patch exists in many older
kernels and I've certainly seen crashes related to this in 3.10, but I
believe older kernels see the same problems too.

Personally I wouldn't know exactly which patches were important to
backport and how far to go. One could arbitrarily try to backport up
to 4.6.7 (since 4.6 was the first kernel to really have the OOM
reaper) and ignore all the reaper fixes that landed since then. This
would probably be doable for kernel 4.4, though if anyone was trying
to support older kernels it might get harder.


-Doug

2017-04-03 13:25:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On Sat, Apr 01, 2017 at 07:34:53PM -0700, Doug Anderson wrote:
> Hi,
>
> On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <[email protected]> wrote:
> > On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
> >> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <[email protected]> wrote:
> >> BTW: I presume that nobody has decided that it would be a wise idea to
> >> pick the OOM reaper code back to any stable trees? It seemed a bit
> >> too scary to me, so I wrote a dumber (but easier to backport) solution
> >> that avoided the deadlocks I was seeing. http://crosreview.com/465189
> >> and the 3 patches above it in case anyone else stumbles on this thread
> >> and is curious.
> >
> > What specific upstream OOM patches are you referring to? I'm always
> > glad to review patches for stable kernels, just email
> > [email protected] the git commit ids and we can take it from there.
>
> +stable
>
> I was wondering about the concept of porting the OOM Reaper back to
> older kernels. The OOM reaper was originally introduced in:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855
>
> Basically the problem described in that patch exists in many older
> kernels and I've certainly seen crashes related to this in 3.10, but I
> believe older kernels see the same problems too.
>
> Personally I wouldn't know exactly which patches were important to
> backport and how far to go. One could arbitrarily try to backport up
> to 4.6.7 (since 4.6 was the first kernel to really have the OOM
> reaper) and ignore all the reaper fixes that landed since then. This
> would probably be doable for kernel 4.4, though if anyone was trying
> to support older kernels it might get harder.

Well, I would need someone to give me a list of commits, and actually
test it to see if it is something that people use/want before I can
queue anything up for a stable release...

{hint}

thanks,

greg k-h

2017-04-03 13:48:11

by peter enderborg

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On 03/31/2017 07:53 PM, Douglas Anderson wrote:
> Sometimes when we're out of memory the OOM killer decides to kill a
> process that's in binder_thread_read(). If we happen to be waiting
> for work we'll get the kill signal and wake up. That's good. ...but
> then we try to grab the binder lock before we return. That's bad.
>
> The problem is that someone else might be holding the one true global
> binder lock. If that one other process is blocked then we can't
> finish exiting. In the worst case, the other process might be blocked
> waiting for memory. In that case we'll have a really hard time
> exiting.
>
> On older kernels that don't have the OOM reaper (or something
> similar), like kernel 4.4, this is a really big problem and we end up
> with a simple deadlock because:
> * Once we pick a process to OOM kill we won't pick another--we first
> wait for the process we picked to die. The reasoning is that we've
> given the doomed process access to special memory pools so it can
> quit quickly and we don't have special pool memory to go around.
> * We don't have any type of "special access donation" that would give
> the mutex holder our special access.
>
> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> We just attempted this OOM kill:
> Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB
>
> The doomed task:
> Stack traceback for pid 4132
> 4132 3380 0 0 D Binder_1
> Call trace:
> __switch_to+0x9c/0xa8
> __schedule+0x504/0x740
> schedule+0x88/0xa8
> schedule_preempt_disabled+0x28/0x44
> __mutex_lock_slowpath+0xf8/0x1a4
> mutex_lock+0x4c/0x68
> binder_thread_read+0x68c/0x11bc
> binder_ioctl_write_read.constprop.46+0x1e8/0x318
> binder_ioctl+0x370/0x778
> compat_SyS_ioctl+0x134/0x10ac
> el0_svc_naked+0x24/0x28
>
> The binder lock holder:
> 4001 3380 0 4 R Binder_7
> Call trace:
> __switch_to+0x9c/0xa8
> __schedule+0x504/0x740
> preempt_schedule_common+0x28/0x48
> preempt_schedule+0x24/0x2c
> _raw_spin_unlock+0x44/0x50
> list_lru_count_one+0x40/0x50
> super_cache_count+0x68/0xa0
> shrink_slab.part.54+0xfc/0x4a4
> shrink_zone+0xa8/0x198
> try_to_free_pages+0x408/0x590
> __alloc_pages_nodemask+0x60c/0x95c
> __read_swap_cache_async+0x98/0x214
> read_swap_cache_async+0x48/0x7c
> swapin_readahead+0x188/0x1b8
> handle_mm_fault+0xbf8/0x1050
> do_page_fault+0x140/0x2dc
> do_mem_abort+0x64/0xd8
> Exception stack: < ... cut ... >
> el1_da+0x18/0x78
> binder_ioctl+0x370/0x778
> compat_SyS_ioctl+0x134/0x10ac
> el0_svc_naked+0x24/0x28
>
> There's really not a lot of reason to grab the binder lock when we're
> just going to exit anyway. Add a little bit of complexity to the code
> to allow binder_thread_read() to sometimes return without the lock
> held.
>
> NOTE: to do this safely we need to make sure that we can safely do
> these things _without_ the global binder lock:
> * Muck with proc->ready_threads
> * Clear a bitfield in thread->looper
>
> It appears that those two operations don't need to be done together
> and it's OK to have different locking solutions for the two. Thus:
>
> 1. We change thread->looper to atomic_t and use atomic ops to muck
> with it. This means we're not clobbering someone else's work with
> our read/modify/write.
>
> Note that I haven't confirmed that every modify of "thread->looper"
> can be done without the binder mutex or without some
> coarser-grained lock. ...but clearing the
> BINDER_LOOPER_STATE_WAITING bit should be fine. The only place
> looking at it is binder_deferred_flush(). Also: while erroring out
> we also may clear BINDER_LOOPER_STATE_NEED_RETURN. This looks to
> be OK, though the code isn't trivial to follow.
>
> 2. We add a new fine-grained mutex (per "proc" structure) to guard all
> of the "_threads" counters. Technically the only value we're
> modifying is "proc->ready_threads" and we're just decrementing it
> (so maybe we could use atomic_t here, too?). ...but it appears
> that all of the "_threads" work together so protecting them
> together seems to make the most sense.
>
> Note that to avoid deadlock:
> * We never first grab the fine grained mutex and _then_ the binder
> mutex.
> * We always grab the fine grained mutex for short periods and never
> do anything blocking while holding it.
>
> To sum it all up:
>
> 1. This patch does improve the chances that we will be able to kill a
> task quickly when we want to. That means this patch has some
> merit.
> 2. Though this patch has merit, it isn't isn't actually all that
> critical because:
> 2a) On newer kernels the OOM reaper should keep us from getting
> deadlocked.
> 2b) Even on old kernels there are other deadlock cases we hit even
> if we fix binder in this way.
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> It's unclear to me if we really want this patch since, as per the
> description, it's not all that critical. I also don't personally have
> enough context with Binder to prove that every change it makes is
> guaranteed not to screw something up.
>
> I post it in case someone is interested or someone who knows the code
> well can say "yes, this is right and good". :)
>
> This patch was tested against the Chrome OS 4.4 kernel. It was only
> compile-tested against upstream since I don't have binder up and
> running there.
>
> drivers/android/binder.c | 116 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 81 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index aae4d8d4be36..220b74b7100d 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -18,6 +18,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <asm/cacheflush.h>
> +#include <linux/atomic.h>
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/freezer.h>
> @@ -347,10 +348,13 @@ struct binder_proc {
> wait_queue_head_t wait;
> struct binder_stats stats;
> struct list_head delivered_death;
> +
> int max_threads;
> int requested_threads;
> int requested_threads_started;
> int ready_threads;
> + struct mutex threads_lock;
> +
> long default_priority;
> struct dentry *debugfs_entry;
> struct binder_context *context;
> @@ -369,7 +373,7 @@ struct binder_thread {
> struct binder_proc *proc;
> struct rb_node rb_node;
> int pid;
> - int looper;
> + atomic_t looper;
> struct binder_transaction *transaction_stack;
> struct list_head todo;
> uint32_t return_error; /* Write failed, return error code in read buf */
> @@ -458,6 +462,15 @@ static inline void binder_lock(const char *tag)
> trace_binder_locked(tag);
> }
>
> +static inline int __must_check binder_lock_interruptible(const char *tag)
> +{
> + trace_binder_lock(tag);
> + if (mutex_lock_interruptible(&binder_main_lock))
> + return -ERESTARTSYS;
> + trace_binder_locked(tag);
> + return 0;
> +}
> +
> static inline void binder_unlock(const char *tag)
> {
> trace_binder_unlock(tag);
> @@ -2450,39 +2463,41 @@ static int binder_thread_write(struct binder_proc *proc,
> }
>
> case BC_REGISTER_LOOPER:
> + mutex_lock(&proc->threads_lock);
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_REGISTER_LOOPER\n",
> proc->pid, thread->pid);
> - if (thread->looper & BINDER_LOOPER_STATE_ENTERED) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_ENTERED) {
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n",
> proc->pid, thread->pid);
> } else if (proc->requested_threads == 0) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called without request\n",
> proc->pid, thread->pid);
> } else {
> proc->requested_threads--;
> proc->requested_threads_started++;
> }
> - thread->looper |= BINDER_LOOPER_STATE_REGISTERED;
> + atomic_or(BINDER_LOOPER_STATE_REGISTERED, &thread->looper);
> + mutex_unlock(&proc->threads_lock);
> break;
> case BC_ENTER_LOOPER:
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_ENTER_LOOPER\n",
> proc->pid, thread->pid);
> - if (thread->looper & BINDER_LOOPER_STATE_REGISTERED) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_REGISTERED) {
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_ENTER_LOOPER called after BC_REGISTER_LOOPER\n",
> proc->pid, thread->pid);
> }
> - thread->looper |= BINDER_LOOPER_STATE_ENTERED;
> + atomic_or(BINDER_LOOPER_STATE_ENTERED, &thread->looper);
> break;
> case BC_EXIT_LOOPER:
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_EXIT_LOOPER\n",
> proc->pid, thread->pid);
> - thread->looper |= BINDER_LOOPER_STATE_EXITED;
> + atomic_or(BINDER_LOOPER_STATE_EXITED, &thread->looper);
> break;
>
> case BC_REQUEST_DEATH_NOTIFICATION:
> @@ -2538,7 +2553,7 @@ static int binder_thread_write(struct binder_proc *proc,
> ref->death = death;
> if (ref->node->proc == NULL) {
> ref->death->work.type = BINDER_WORK_DEAD_BINDER;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&ref->death->work.entry, &thread->todo);
> } else {
> list_add_tail(&ref->death->work.entry, &proc->todo);
> @@ -2562,7 +2577,7 @@ static int binder_thread_write(struct binder_proc *proc,
> ref->death = NULL;
> if (list_empty(&death->work.entry)) {
> death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&death->work.entry, &thread->todo);
> } else {
> list_add_tail(&death->work.entry, &proc->todo);
> @@ -2604,7 +2619,7 @@ static int binder_thread_write(struct binder_proc *proc,
> list_del_init(&death->work.entry);
> if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) {
> death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&death->work.entry, &thread->todo);
> } else {
> list_add_tail(&death->work.entry, &proc->todo);
> @@ -2638,19 +2653,20 @@ static int binder_has_proc_work(struct binder_proc *proc,
> struct binder_thread *thread)
> {
> return !list_empty(&proc->todo) ||
> - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
> }
>
> static int binder_has_thread_work(struct binder_thread *thread)
> {
> return !list_empty(&thread->todo) || thread->return_error != BR_OK ||
> - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
> }
>
> static int binder_thread_read(struct binder_proc *proc,
> struct binder_thread *thread,
> binder_uintptr_t binder_buffer, size_t size,
> - binder_size_t *consumed, int non_block)
> + binder_size_t *consumed, int non_block,
> + bool *locked)
> {
> void __user *buffer = (void __user *)(uintptr_t)binder_buffer;
> void __user *ptr = buffer + *consumed;
> @@ -2687,21 +2703,24 @@ static int binder_thread_read(struct binder_proc *proc,
> goto done;
> }
>
> -
> - thread->looper |= BINDER_LOOPER_STATE_WAITING;
> + atomic_or(BINDER_LOOPER_STATE_WAITING, &thread->looper);
> + mutex_lock(&proc->threads_lock);
> if (wait_for_proc_work)
> proc->ready_threads++;
> + mutex_unlock(&proc->threads_lock);
>
> binder_unlock(__func__);
> + *locked = false;
>
> trace_binder_wait_for_work(wait_for_proc_work,
> !!thread->transaction_stack,
> !list_empty(&thread->todo));
> if (wait_for_proc_work) {
> - if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> - BINDER_LOOPER_STATE_ENTERED))) {
> + if (!(atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
> + BINDER_LOOPER_STATE_ENTERED))) {
> binder_user_error("%d:%d ERROR: Thread waiting for process work before calling BC_REGISTER_LOOPER or BC_ENTER_LOOPER (state %x)\n",
> - proc->pid, thread->pid, thread->looper);
> + proc->pid, thread->pid,
> + atomic_read(&thread->looper));
> wait_event_interruptible(binder_user_error_wait,
> binder_stop_on_user_error < 2);
> }
> @@ -2719,14 +2738,23 @@ static int binder_thread_read(struct binder_proc *proc,
> ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
> }
>
> - binder_lock(__func__);
> -
> + /*
> + * Update these _without_ grabbing the binder lock since we might be
> + * about to return an error.
> + */
> + mutex_lock(&proc->threads_lock);
> if (wait_for_proc_work)
> proc->ready_threads--;
> - thread->looper &= ~BINDER_LOOPER_STATE_WAITING;
> + mutex_unlock(&proc->threads_lock);
> + atomic_and(~BINDER_LOOPER_STATE_WAITING, &thread->looper);
> +
> + if (ret)
> + return ret;
>
> + ret = binder_lock_interruptible(__func__);
> if (ret)
> return ret;
> + *locked = true;
>
> while (1) {
> uint32_t cmd;
> @@ -2743,7 +2771,7 @@ static int binder_thread_read(struct binder_proc *proc,
> } else {
> /* no data added */
> if (ptr - buffer == 4 &&
> - !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN))
> + !(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN))
> goto retry;
> break;
> }
> @@ -2957,19 +2985,23 @@ static int binder_thread_read(struct binder_proc *proc,
> done:
>
> *consumed = ptr - buffer;
> + mutex_lock(&proc->threads_lock);
> if (proc->requested_threads + proc->ready_threads == 0 &&
> proc->requested_threads_started < proc->max_threads &&
> - (thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> + (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
> BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */
> /*spawn a new thread if we leave this out */) {
> proc->requested_threads++;
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BR_SPAWN_LOOPER\n",
> proc->pid, thread->pid);
> - if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer))
> + if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) {
> + mutex_unlock(&proc->threads_lock);
> return -EFAULT;
> + }
> binder_stat_br(proc, thread, BR_SPAWN_LOOPER);
> }
> + mutex_unlock(&proc->threads_lock);
> return 0;
> }
>
> @@ -3051,7 +3083,7 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc)
> INIT_LIST_HEAD(&thread->todo);
> rb_link_node(&thread->rb_node, parent, p);
> rb_insert_color(&thread->rb_node, &proc->threads);
> - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> thread->return_error = BR_OK;
> thread->return_error2 = BR_OK;
> }
> @@ -3133,7 +3165,8 @@ static unsigned int binder_poll(struct file *filp,
>
> static int binder_ioctl_write_read(struct file *filp,
> unsigned int cmd, unsigned long arg,
> - struct binder_thread *thread)
> + struct binder_thread *thread,
> + bool *locked)
> {
> int ret = 0;
> struct binder_proc *proc = filp->private_data;
> @@ -3172,7 +3205,7 @@ static int binder_ioctl_write_read(struct file *filp,
> ret = binder_thread_read(proc, thread, bwr.read_buffer,
> bwr.read_size,
> &bwr.read_consumed,
> - filp->f_flags & O_NONBLOCK);
> + filp->f_flags & O_NONBLOCK, locked);
> trace_binder_read_done(ret);
> if (!list_empty(&proc->todo))
> wake_up_interruptible(&proc->wait);
> @@ -3243,6 +3276,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> struct binder_thread *thread;
> unsigned int size = _IOC_SIZE(cmd);
> void __user *ubuf = (void __user *)arg;
> + bool locked;
>
> /*pr_info("binder_ioctl: %d:%d %x %lx\n",
> proc->pid, current->pid, cmd, arg);*/
> @@ -3258,6 +3292,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> goto err_unlocked;
>
> binder_lock(__func__);
> + locked = true;
> thread = binder_get_thread(proc);
> if (thread == NULL) {
> ret = -ENOMEM;
> @@ -3266,15 +3301,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>
> switch (cmd) {
> case BINDER_WRITE_READ:
> - ret = binder_ioctl_write_read(filp, cmd, arg, thread);
> + ret = binder_ioctl_write_read(filp, cmd, arg, thread, &locked);
> if (ret)
> goto err;
> break;
> case BINDER_SET_MAX_THREADS:
> + mutex_lock(&proc->threads_lock);
> if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) {
> ret = -EINVAL;
> + mutex_unlock(&proc->threads_lock);
> goto err;
> }
> + mutex_unlock(&proc->threads_lock);
> break;
> case BINDER_SET_CONTEXT_MGR:
> ret = binder_ioctl_set_ctx_mgr(filp);
> @@ -3308,8 +3346,9 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> ret = 0;
> err:
> if (thread)
> - thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN;
> - binder_unlock(__func__);
> + atomic_and(~BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> + if (locked)
> + binder_unlock(__func__);
> wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2);
> if (ret && ret != -ERESTARTSYS)
> pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret);
> @@ -3473,6 +3512,7 @@ static int binder_open(struct inode *nodp, struct file *filp)
> binder_dev = container_of(filp->private_data, struct binder_device,
> miscdev);
> proc->context = &binder_dev->context;
> + mutex_init(&proc->threads_lock);
>
> binder_lock(__func__);
>
> @@ -3521,8 +3561,9 @@ static void binder_deferred_flush(struct binder_proc *proc)
> for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) {
> struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node);
>
> - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> - if (thread->looper & BINDER_LOOPER_STATE_WAITING) {
> + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> + if (atomic_read(&thread->looper) &
> + BINDER_LOOPER_STATE_WAITING) {
> wake_up_interruptible(&thread->wait);
> wake_count++;
> }
> @@ -3827,7 +3868,8 @@ static void print_binder_thread(struct seq_file *m,
> size_t start_pos = m->count;
> size_t header_pos;
>
> - seq_printf(m, " thread %d: l %02x\n", thread->pid, thread->looper);
> + seq_printf(m, " thread %d: l %02x\n", thread->pid,
> + atomic_read(&thread->looper));
> header_pos = m->count;
> t = thread->transaction_stack;
> while (t) {
> @@ -4019,6 +4061,8 @@ static void print_binder_proc_stats(struct seq_file *m,
> struct rb_node *n;
> int count, strong, weak;
>
> + mutex_lock(&proc->threads_lock);
> +
> seq_printf(m, "proc %d\n", proc->pid);
> seq_printf(m, "context %s\n", proc->context->name);
> count = 0;
> @@ -4064,6 +4108,8 @@ static void print_binder_proc_stats(struct seq_file *m,
> seq_printf(m, " pending transactions: %d\n", count);
>
> print_binder_stats(m, " ", &proc->stats);
> +
> + mutex_unlock(&proc->threads_lock);
> }
>
>

I think the real problem is that get locked in kernel space waiting for a mutex lock. It should use mutex_lock_interruptible.

2017-04-03 18:09:39

by Doug Anderson

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Hi,

On Mon, Apr 3, 2017 at 6:25 AM, Greg KH <[email protected]> wrote:
> On Sat, Apr 01, 2017 at 07:34:53PM -0700, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, Mar 31, 2017 at 11:48 PM, Greg KH <[email protected]> wrote:
>> > On Fri, Mar 31, 2017 at 02:00:13PM -0700, Doug Anderson wrote:
>> >> On Fri, Mar 31, 2017 at 12:29 PM, Greg KH <[email protected]> wrote:
>> >> BTW: I presume that nobody has decided that it would be a wise idea to
>> >> pick the OOM reaper code back to any stable trees? It seemed a bit
>> >> too scary to me, so I wrote a dumber (but easier to backport) solution
>> >> that avoided the deadlocks I was seeing. http://crosreview.com/465189
>> >> and the 3 patches above it in case anyone else stumbles on this thread
>> >> and is curious.
>> >
>> > What specific upstream OOM patches are you referring to? I'm always
>> > glad to review patches for stable kernels, just email
>> > [email protected] the git commit ids and we can take it from there.
>>
>> +stable
>>
>> I was wondering about the concept of porting the OOM Reaper back to
>> older kernels. The OOM reaper was originally introduced in:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/oom_kill.c?id=aac453635549699c13a84ea1456d5b0e574ef855
>>
>> Basically the problem described in that patch exists in many older
>> kernels and I've certainly seen crashes related to this in 3.10, but I
>> believe older kernels see the same problems too.
>>
>> Personally I wouldn't know exactly which patches were important to
>> backport and how far to go. One could arbitrarily try to backport up
>> to 4.6.7 (since 4.6 was the first kernel to really have the OOM
>> reaper) and ignore all the reaper fixes that landed since then. This
>> would probably be doable for kernel 4.4, though if anyone was trying
>> to support older kernels it might get harder.
>
> Well, I would need someone to give me a list of commits, and actually
> test it to see if it is something that people use/want before I can
> queue anything up for a stable release...
>
> {hint}

Here's a list of the patches between 4.5 and 4.6.7 that touch the oom_killer:

af8e15cc85a2 oom, oom_reaper: do not enqueue task if it is on the
oom_reaper_list head
bb29902a7515 oom, oom_reaper: protect oom_reaper_list using simpler way
e26796066fdf oom: make oom_reaper freezable
29c696e1c6ec oom: make oom_reaper_list single linked
855b01832573 oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task
03049269de43 mm, oom_reaper: implement OOM victims queuing
bc448e897b6d mm, oom_reaper: report success/failure
36324a990cf5 oom: clear TIF_MEMDIE after oom_reaper managed to unmap
the address space
aac453635549 mm, oom: introduce oom reaper
6a618957ad17 mm: oom_kill: don't ignore oom score on exiting tasks
6afcf2895e6f mm,oom: make oom_killer_disable() killable

69b27baf00fa sched: add schedule_timeout_idle()


A few of those are code cleanups or are fixing OOM killer bugs
unrelated to the problem at hand (OOM killer getting stuck forever
because a task won't die), but it may still make sense to take them
just to get the oom killer into a consistent / known state (AKA
4.6.7).


The problem (and the reason I was so hesitant to provide a list of
patches) is that I personally am not a expert on mm. Thus:

1. I don't know if there are any subtle dependencies on other "mm"
patches. I can pick the patches above to a 4.4 kernel with only minor
conflicts (plus a fix not to look at MM_SHMEMPAGES, which didn't exist
in 4.4) and they seem to work OK, but with "mm" I always am worried
about minor changes in some other bit of mm code that might be needed
to make all the corner cases work right.

2. There are plenty of other fixes to the OOM reaper in 4.7+. For
instance this patch:

e2fe14564d33 oom_reaper: close race with exiting task

References a "Fixes" for the original patch that introduced the OOM
reaper, but there are 11 other patches between 4.6 and 4.7 that also
sound like they fix some important bugs and I just don't know if they
are important things to bring back to linux stable or not... Then
another 13 patches between 4.7 and 4.8.

Maybe +Michal Hocko would have some opinions of which OOM Reaper
patches would be good for picking into linux stable?


-Doug

2017-04-04 07:40:34

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On Mon 03-04-17 11:09:32, Doug Anderson wrote:
[...]
> Maybe +Michal Hocko would have some opinions of which OOM Reaper
> patches would be good for picking into linux stable?

I am lacking context here but please note that the OOM reaper patches
alone are not sufficient to make the OOM handling lockup free. There
were quite some changes in the core OOM killer handling to make this
possible. This has been work throughout the last year and it will be
really non-trivial to backport to stable trees.

So the primary question is what are you trying to achieve?
--
Michal Hocko
SUSE Labs

2017-04-04 15:10:08

by Doug Anderson

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

Hi,

On Tue, Apr 4, 2017 at 12:40 AM, Michal Hocko <[email protected]> wrote:
> On Mon 03-04-17 11:09:32, Doug Anderson wrote:
> [...]
>> Maybe +Michal Hocko would have some opinions of which OOM Reaper
>> patches would be good for picking into linux stable?
>
> I am lacking context here but please note that the OOM reaper patches
> alone are not sufficient to make the OOM handling lockup free. There
> were quite some changes in the core OOM killer handling to make this
> possible. This has been work throughout the last year and it will be
> really non-trivial to backport to stable trees.

Yeah, that was the impression I got.


> So the primary question is what are you trying to achieve?

Ideally it would be nice to bring back patches to make the OOM
handling lockup free. However, presuming that's not feasible then at
least it would be nice if we could get some minimal set of patches
that:

- Isn't too scary to backport.
- Handles the low hanging fruit.
- Is fairly self contained.

For Chrome OS we've got devices on a number of different kernels
ranging from 3.8 - 4.4. Due to changes in userspace, it appears much
more likely that we wedge the OOM killer these days, so my main goal
is to avoid this in most cases.

Right now for Chrome OS I have patches that look like this:

* https://chromium-review.googlesource.com/465186
UPSTREAM: mm: oom_kill: don't ignore oom score on exiting tasks

* https://chromium-review.googlesource.com/465187
CHROMIUM: DROP: mm/oom_kill: Don't kill a subthread in our place if
we still ...

* https://chromium-review.googlesource.com/465188
CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place

* https://chromium-review.googlesource.com/465189
CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

...and those seem to fit the bill for us, but:

1. It would be nice if other users of linuxstable could benefit.

2. I know the above patches are not as ideal as the work that has
happened upstream, so of course I'd prefer to get the upstream
solution.

3. I always appreciate being closer to the upstream solution which
means we get more people looking at the code and more people testing
the code.


-Doug

2017-04-04 15:29:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

On Tue 04-04-17 08:10:03, Doug Anderson wrote:
> Hi,
>
> On Tue, Apr 4, 2017 at 12:40 AM, Michal Hocko <[email protected]> wrote:
[...]
> > So the primary question is what are you trying to achieve?
>
> Ideally it would be nice to bring back patches to make the OOM
> handling lockup free. However, presuming that's not feasible then at
> least it would be nice if we could get some minimal set of patches
> that:
>
> - Isn't too scary to backport.
> - Handles the low hanging fruit.
> - Is fairly self contained.

oom_reaper as introduced by aac4536355496 and 4 follow up patches should
be a good yet not too scary to start.

> 1. It would be nice if other users of linuxstable could benefit.

well most of the oom fixes are rather subtle and it took quite some time
to do the properly so I am not really sure we want to take risk. I would
rather handle those issues and backport specific fixes for the issue.

> 2. I know the above patches are not as ideal as the work that has
> happened upstream, so of course I'd prefer to get the upstream
> solution.
>
> 3. I always appreciate being closer to the upstream solution which
> means we get more people looking at the code and more people testing
> the code.

I can hardly help you more than offer to use the upstream code. I fully
realize that this is hard and I am facing similar issues with enterprise
kernel @Suse but so far I have seen OOMs behaving mostly OK except for
extreme cases which usually do not happen enough to take the risk and
backport non-trivial changes to the stable trees.
--
Michal Hocko
SUSE Labs