Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3304408imm; Mon, 6 Aug 2018 02:17:01 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeRiHT/1Lcah/Mo1nMr2sAbpzealexLBPtW7dBmo/sNLOLwzCu2Du+OyOSn4OKYURifGUqs X-Received: by 2002:a63:d80f:: with SMTP id b15-v6mr13939470pgh.347.1533547021623; Mon, 06 Aug 2018 02:17:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533547021; cv=none; d=google.com; s=arc-20160816; b=ZJAodLmhkHBwK0kQQdkCQ+7J+/vNIzBpUQ0ddcZ8CVysSzAllJY8wC/AwfL8IERXU5 OQI88Qd33UJHGzvnXdNli8YzHtun0quv2N2RpFoeP1JyjqodjLY3yg4zvvX4xa7ou3zO AzHapjvuuIUJDL3fg0syCPiMO9QNHw+0Z3U/T6dE8ib6ex+ppQR2Myw1Oha9X8qyC6r0 LIcbYSg57Y96/ivMLiGhaSHPh1IFmVd/ZUAVM9c7kajKqsfvX6Gu1yBmesYpG6onhNfp +2WNq9c/ygs73+yssZ1ujQ/WrLZ0ZuaZ5qBp7I9beRupnanx40tgtfofqlEP+Ax+aVu1 AHWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=NOPw62YJy98/m3nKj316fIXfvupJLN3tlVafHjfBvug=; b=i1muW9D34BDtaYc/LcsccFM3BgthtxpNTX6xONiQtQz3QBAX7ALs72yRlJ/pVysbyE iEqa4w0Vy80ahuoIvWcleXdoOuiZ0YLKxjad4rZNqm3MlwXzE22VW5pBUz0STbLMtie8 h/mPhOzmLhvJ576+8tVh/fhFxp1Z4yuZ4XrkSakGdu1tpRkh7zHykhSMI1S8sYFlmal7 F2mlZP5fjqp/esYFkDg+FsnFCm6yEx8FYFzdjPWwcfu/3wrDrRJi78qnaKICjssRzR89 obR20oGTn9mVLkz0NsB48rki49CYeHsO9iO9Ea/PglBp4ajbR45vNydttDMJ3fIzSg2Z FJXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v9-v6si12298909pfg.123.2018.08.06.02.16.46; Mon, 06 Aug 2018 02:17:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728359AbeHFLYD (ORCPT + 99 others); Mon, 6 Aug 2018 07:24:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:45588 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726572AbeHFLYC (ORCPT ); Mon, 6 Aug 2018 07:24:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D7219AD67; Mon, 6 Aug 2018 09:15:52 +0000 (UTC) Date: Mon, 6 Aug 2018 11:15:52 +0200 From: Michal Hocko To: syzbot Cc: cgroups@vger.kernel.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, vdavydov.dev@gmail.com Subject: Re: WARNING in try_charge Message-ID: <20180806091552.GE19540@dhcp22.suse.cz> References: <0000000000005e979605729c1564@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0000000000005e979605729c1564@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 04-08-18 06:33:02, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: d1e0b8e0cb7a Add linux-next specific files for 20180725 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c770400000 > kernel config: https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33 > dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > > Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, > file-rss:32768kB, shmem-rss:0kB > oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, > file-rss:32000kB, shmem-rss:0kB More interesting stuff is higher in the kernel log : [ 366.435015] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 Are you sure you want to have hard limit set to 0? : [ 366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB There are only 3 pages charged to this memcg! : [ 366.487490] Tasks state (memory values in pages): : [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 366.501237] [ 23766] 0 23766 17620 8221 126976 0 0 syz-executor3 : [ 366.510367] [ 23767] 0 23767 17618 8218 126976 0 0 syz-executor2 : [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child : [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB : [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB The oom reaper cannot reclaim file backed memory from a large part. I assume this is are shared mappings which are living outside of memcg because of the counter. : [...] : [ 367.085870] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0 : [ 367.100073] memory: usage 112kB, limit 0kB, failcnt 1615 : [ 367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 367.138136] Tasks state (memory values in pages): : [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 367.151889] [ 23766] 0 23766 17620 8002 126976 0 0 syz-executor3 : [ 367.160946] [ 23767] 0 23767 17618 8218 126976 0 0 syz-executor2 : [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child : [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB : [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB : [ 367.202986] ------------[ cut here ]------------ : [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 : [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... This is unexpected though. We have killed a task (23767) which is trying to charge the memory which means it should trigger the charge retry and that one should force the charge /* * Unlike in global OOM situations, memcg is not in a physical * memory shortage. Allow dying and OOM-killed tasks to * bypass the last charges so that they can exit quickly and * free their memory. */ if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; There doesn't seem to be any other sign of OOM killer invocation which could then indeed lead to the warning as there is no other task to kill (both syz-executor[23] have been killed and oom_reaped already). So I would be curious what happened between 367.180119 which was the last successful oom invocation and 367.207845. An additional printk in mem_cgroup_out_of_memory might tell us more. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(&oom_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(&oc); mutex_unlock(&oom_lock); return ret; Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit and basically no memory charged by existing tasks is not going to fly and the warning is exactly to call that out. If this is some sort of race and we warn too eagerly is still to be checked and potentially fixed of course. -- Michal Hocko SUSE Labs