Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755530AbdCaVAR (ORCPT ); Fri, 31 Mar 2017 17:00:17 -0400 Received: from mail-wm0-f42.google.com ([74.125.82.42]:34932 "EHLO mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753430AbdCaVAP (ORCPT ); Fri, 31 Mar 2017 17:00:15 -0400 MIME-Version: 1.0 In-Reply-To: <20170331192944.GB9744@kroah.com> References: <20170331175341.19889-1-dianders@chromium.org> <20170331192944.GB9744@kroah.com> From: Doug Anderson Date: Fri, 31 Mar 2017 14:00:13 -0700 X-Google-Sender-Auth: OpOKqD6-OVaG0DTqq6MkaZuBiEo Message-ID: Subject: Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read() To: Greg KH Cc: arve@android.com, riandrews@android.com, tkjos@google.com, devel@driverdev.osuosl.org, "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2367 Lines: 51 Hi, On Fri, Mar 31, 2017 at 12:29 PM, Greg KH wrote: > On Fri, Mar 31, 2017 at 10:53:41AM -0700, Douglas Anderson wrote: >> Sometimes when we're out of memory the OOM killer decides to kill a >> process that's in binder_thread_read(). If we happen to be waiting >> for work we'll get the kill signal and wake up. That's good. ...but >> then we try to grab the binder lock before we return. That's bad. >> >> The problem is that someone else might be holding the one true global >> binder lock. If that one other process is blocked then we can't >> finish exiting. In the worst case, the other process might be blocked >> waiting for memory. In that case we'll have a really hard time >> exiting. >> >> On older kernels that don't have the OOM reaper (or something >> similar), like kernel 4.4, this is a really big problem and we end up >> with a simple deadlock because: >> * Once we pick a process to OOM kill we won't pick another--we first >> wait for the process we picked to die. The reasoning is that we've >> given the doomed process access to special memory pools so it can >> quit quickly and we don't have special pool memory to go around. >> * We don't have any type of "special access donation" that would give >> the mutex holder our special access. >> >> On kernel 4.4 w/ binder patches, we easily see this happen: > > > > How does your change interact with the recent "break up the binder big > lock" patchset: > https://android-review.googlesource.com/#/c/354698/ > > Have you tried that series out to see if it helps out any? I wasn't aware of that patchset. Someone else on my team mentioned that fine-grained locking was being worked on but I didn't know patches were actually posted... Probably it makes sense to just drop my patch, then. It was only making things marginally better even on kernel 4.4 because I would just hit the next task that would refuse to quit for a non-binder related reason. :( BTW: I presume that nobody has decided that it would be a wise idea to pick the OOM reaper code back to any stable trees? It seemed a bit too scary to me, so I wrote a dumber (but easier to backport) solution that avoided the deadlocks I was seeing. http://crosreview.com/465189 and the 3 patches above it in case anyone else stumbles on this thread and is curious. -Doug