Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751335AbbL1B1J (ORCPT ); Sun, 27 Dec 2015 20:27:09 -0500 Received: from mail-pf0-f178.google.com ([209.85.192.178]:34445 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750803AbbL1B1F (ORCPT ); Sun, 27 Dec 2015 20:27:05 -0500 Date: Sun, 27 Dec 2015 17:27:03 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Figo Zhang cc: gregkh@linuxfoundation.org, mhocko@suse.com, linux-kernel@vger.kernel.org, arve@android.com, anton.vorontsov@linaro.org, kirill.shutemov@linux.intel.com, riandrews@android.com, devel@driverdev.osuosl.org Subject: Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory In-Reply-To: <1451207691-10313-1-git-send-email-tianfei.zhang@intel.com> Message-ID: References: <1451207691-10313-1-git-send-email-tianfei.zhang@intel.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5408 Lines: 117 On Sun, 27 Dec 2015, Figo Zhang wrote: > Android System UI hang when run heavy monkey stress test. > > V2: add more detail about how to re-produce this issue, the > important is install more than 100 apps/games. > > Re-produce step: > Run this monkey stress test script with more than 100 > apps/games installed: > > adb shell "monkey --ignore-crashes --ignore-timeouts > --kill-process-after-error --ignore-security-exceptions > --throttle 200 -v 20000000" > > kernel log: > [ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529 > [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill > [ 1526.272299] lowmem_d_timeout=4296194081 > [ 1526.272303] Killing 'dTi-lm' (27289), adj 647, > [ 1526.272303] to free 42520kB on behalf of 'servicemanager' (2365) because > [ 1526.272303] cache 137676kB is below limit 221184kB for oom_score_adj 529 > [ 1526.272303] Free memory is -39396kB above reserved > [ 1526.272304] lowmem_scan end: 128, 213da, return 10630 > [ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529 > [ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081 > [ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529 > [ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081 > [ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529 > [ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081 > [ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529 > [ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081 > [ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529 > [ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081 > [ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529 > [ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081 > [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529 > [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529 > [ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109 > [ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529 > [ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109 > [ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109 > > when this happened, the android system UI will hang, no process can be > select to kill. > > i found the the value of "lowmem_deathpending_timeout" will be modified > strangely, like in last killing, the value is 4296194081, but why not it > had changed to 4296195109? so it will cause the deadloop in low memory > state which will cause the android system UI hang, because no process will > be kill. > I'm assuming that you are loading the lowmem killer as a module since that's how you would modify lowmem_debug_level. It appears that lowmem_debug_level is 2 from your kernel log, otherwise part of the log is missing. I can tell this since you have a [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill line but not a line matching "send sigkill to %d (%s), adj %hd, size %d\n" with loglevel 1. I think changing lowmem_debug_level to 1 would help to understand this issue better. I think lowmem_deathpending_timeout is getting changed to 4296195109 at [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529 > HERE < [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529 However, it appears that the same process, dTi-lm, is still chosen for oom kill because lowmem_deathpending_timeout has expired. So this looks like a problem if the constantly chosen process cannot exit. It would have been helpful to have the stack of pid 27289 in the log to see where it was stuck. But I think it may be unrelated to lowmem_deathpending_timeout itself. We'd be better off selecting a different process to kill with something like this: diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -128,11 +128,15 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc) if (!p) continue; - if (test_tsk_thread_flag(p, TIF_MEMDIE) && - time_before_eq(jiffies, lowmem_deathpending_timeout)) { - task_unlock(p); - rcu_read_unlock(); - return 0; + if (test_tsk_thread_flag(p, TIF_MEMDIE)) { + if (time_before_eq(jiffies, + lowmem_deathpending_timeout)) { + task_unlock(p); + rcu_read_unlock(); + return 0; + } + /* Need to select a different process to kill */ + continue; } oom_score_adj = p->signal->oom_score_adj; if (oom_score_adj < min_score_adj) { But we need more information. Please make sure that lowmem_debug_level is 1, try to get a complete kernel log, and if possible please try to capture the stack of the process that can't exit (use /proc//stack) before trying the above patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/