Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751454AbdIKT7R (ORCPT ); Mon, 11 Sep 2017 15:59:17 -0400 Received: from mail-it0-f42.google.com ([209.85.214.42]:37903 "EHLO mail-it0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751031AbdIKT7P (ORCPT ); Mon, 11 Sep 2017 15:59:15 -0400 X-Google-Smtp-Source: AOwi7QBm0RnJJrx1MPr+eB4hVcPsGxaOkddb4qSkTDVH+IEpeffST0nn/veS+pcyEBsC0udQcX7jNPfkwsqsHhry/GY= MIME-Version: 1.0 In-Reply-To: References: <20170905172152.36227-1-tkjos@google.com> From: Todd Kjos Date: Mon, 11 Sep 2017 12:59:13 -0700 Message-ID: Subject: Re: [PATCH] binder: fix memory corruption in binder_transaction binder To: Amit Pundir Cc: Todd Kjos , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , "open list:ANDROID DRIVERS" , lkml , Martijn Coenen , Xu Yiping , gengyanping@hisilicon.com, shiwanglai@hisilicon.com, John Stultz Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4116 Lines: 106 Amit, I tested with https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct. I added a pr_info() above the patch's single line change and in binder_init (so I could easily prove that I was running the correct kernel). First I did 10 reboots with the patch. I saw one failure to reach the Android home screen in boot #7 (but the new line of code was never reached, so the patch cannot be the cause)... so 9 out of 10 reboots were fine and the failure does not point to this patch. Then I did 10 reboots without the patch. No failures. Then 10 more with the patch. No failures. Then with the patch: power-on, reboot twice, no failures (repeat, no failures). I think the issue you are seeing cannot be caused by this patch -- take a look at it and see if you think its really possible... -Todd On Mon, Sep 11, 2017 at 9:55 AM, Amit Pundir wrote: > Hi Todd, > > On 11 September 2017 at 21:10, Todd Kjos wrote: >> (resend in plain-text mode -- sorry about that) >> >> Amit, >> >> Are you sure this patch is the culprit? That is pretty surprising >> since this change can only be hit in a uncommon case (the target node >> is valid when we start creating the transaction, but dead when we >> check right before sending it) so it is unlikely to be hit during a >> normal boot. It also fixes a corruption -- so if you were actually >> hitting the case, it would likely have caused issues before and not >> now. Take a look at it and see if you think it is really possible. >> >> I just booted hikey to Android with this patch 10 times in a row with >> no issues (used hikey-linaro 4.9 kernel which has this patch). > > Sorry for not being clear enough in the bug report. android-4.9 is > fine, I see this issue on linux mainline tree with this patch. > > I can reproduce it on John's minimal Android tree for hikey hosted > here https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/hikey-mainline-WIP > and hikey-llct (android-4.9 patchset rebased to mainline) tree hosted > here https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct. > I have already reverted this patch in hikey-llct so you have to revert > that revert to reproduce this issue on hikey-llct tree. > > Regards, > Amit Pundir > >> >> -Todd >> >>> On Mon, Sep 11, 2017 at 5:18 AM, Amit Pundir wrote: >>>> >>>> On 5 September 2017 at 22:51, Todd Kjos wrote: >>>> > From: Xu YiPing >>>> > >>>> > commit 7a4408c6bd3e ("binder: make sure accesses to proc/thread are >>>> > safe") made a change to enqueue tcomplete to thread->todo before >>>> > enqueuing the transaction. However, in err_dead_proc_or_thread case, >>>> > the tcomplete is directly freed, without dequeued. It may cause the >>>> > thread->todo list to be corrupted. >>>> > >>>> > So, dequeue it before freeing. >>>> >>>> I see Android boot loops with this patch on hikey tracking >>>> linux/master branch. 1st boot is fine but hikey runs into an >>>> unexpected short boot loops on 2nd and successive boots. >>>> >>>> It takes about 3-4 iterations to finally come to sane state and boot >>>> to UI. I don't see this behaviour if I revert this patch. >>>> >>>> Regards, >>>> Amit Pundir >>>> >>>> > >>>> > Signed-off-by: Xu YiPing >>>> > Signed-off-by: Todd Kjos >>>> > --- >>>> > drivers/android/binder.c | 1 + >>>> > 1 file changed, 1 insertion(+) >>>> > >>>> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c >>>> > index d055b3f2a207..96cc28afa383 100644 >>>> > --- a/drivers/android/binder.c >>>> > +++ b/drivers/android/binder.c >>>> > @@ -3083,6 +3083,7 @@ static void binder_transaction(struct binder_proc >>>> > *proc, >>>> > err_dead_proc_or_thread: >>>> > return_error = BR_DEAD_REPLY; >>>> > return_error_line = __LINE__; >>>> > + binder_dequeue_work(proc, tcomplete); >>>> > err_translate_failed: >>>> > err_bad_object_type: >>>> > err_bad_offset: >>>> > -- >>>> > 2.14.1.581.gf28d330327-goog >>>> > >>> >>>