Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3897374imm; Mon, 6 Aug 2018 12:32:32 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfZJcwK4XVS6gVe85sk2VLepQoVA5Qrni3VmMJC2bnH+f+qEkwhgDur+2zfATpslIqpH81T X-Received: by 2002:a63:dc53:: with SMTP id f19-v6mr15808423pgj.56.1533583951930; Mon, 06 Aug 2018 12:32:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533583951; cv=none; d=google.com; s=arc-20160816; b=sx3GVrOmZyVJKWPfaxi5bgrVh+H3FvQxM3Baw8rtKPxuzSfRBEwnz0If7Cm5mjOcZS 5a0H//8Kc9dHa8n5QDrWg2BFrnX6DQDblQkEwBI55wwBbjU99xTPYR+ZvUD+VbZkh/h2 cIAXC2jLj58WvqHHLzHolSTbw+j7RybVdMFDAsCQsuq16vrlFTEULHud+bcZP+ytFz8g l70aqqBRpezph09UtxnthKL3pVAiD5Ieht0TTYpJjL1t9yV8JAsDLNy6gPAR874PFyXZ PFNXBfcVDiAD7wf1Rn4PpqawnHOYcjlDhp0vOaDDvSJnud7y7Iwseo9TNhnibWtsljHx cnZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=gc1PTOCTGyRdIhVWmd1WyMV0Ijp3gKKnpVpaVo5Fz6E=; b=GC1JoWlFrg6g9g5wi4nvXvS2U0X3F069OvXzRugVOoAVAOpTvh94anY6NMn9SR1Htm I5kdHMJVbzNEWgGt//HaylROE7CxL7Z4LKBI/3ZH6FS5wZrcZ46klwDxDS+LI43eCRBx 17u9l+H4un7Pdh4LREk+6sMQ2OE1avO16GKD1Vnh6OUkiHKWKcdkkcmPYJHv+MkgKJtg uJcH5lH44Swwu4tUdT5xIpIMdb2H080FbwF1axUQz9FzDxvv/ULgKljstRWEfE5kfoOy NjZidZLCLzWBM8WJhxA4TugUy4bsml33GvdeJ+5+Qzme/ogZUkMJ7VnXEIWPGf15x8my bRtA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si13336394pgj.128.2018.08.06.12.32.17; Mon, 06 Aug 2018 12:32:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732999AbeHFUGr (ORCPT + 99 others); Mon, 6 Aug 2018 16:06:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:37012 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728311AbeHFUGq (ORCPT ); Mon, 6 Aug 2018 16:06:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AFDCCADDD; Mon, 6 Aug 2018 17:56:33 +0000 (UTC) Date: Mon, 6 Aug 2018 19:56:27 +0200 From: Michal Hocko To: syzbot Cc: cgroups@vger.kernel.org, dvyukov@google.com, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, penguin-kernel@I-love.SAKURA.ne.jp, syzkaller-bugs@googlegroups.com, vdavydov.dev@gmail.com Subject: Re: WARNING in try_charge Message-ID: <20180806175627.GC10003@dhcp22.suse.cz> References: <0000000000006350880572c61e62@google.com> <20180806174410.GB10003@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180806174410.GB10003@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 06-08-18 19:44:10, Michal Hocko wrote: > On Mon 06-08-18 08:42:02, syzbot wrote: > > Hello, > > > > syzbot has tested the proposed patch but the reproducer still triggered > > crash: > > WARNING in try_charge > > > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > > file-rss:0kB, shmem-rss:0kB > > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > > file-rss:0kB, shmem-rss:0kB > > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > > Thank you. This is useful. The full oom picture is this > : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. oom_victim=0 > [...] > : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 > : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 > : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > : [ 65.963878] Tasks state (memory values in pages): > : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > : [ 65.977615] [ 6410] 0 6410 9427 532 61440 0 0 syz-executor5 > : [ 65.986647] Memory cgroup out of memory: Kill process 6410 (syz-executor5) score 547000 or sacrifice child > : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB > : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > : [ 66.025137] ------------[ cut here ]------------ > : [ 66.029927] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. > : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 > > So we have only a single task in the memcg and it is this task which > triggers the OOM. It gets killed and oom_reaped. This means that > out_of_memory should return with true and so we should retry and force > the charge as I've already mentioned. For some reason this task has > triggered the oom killer path again and then we haven't found any > eligible task and resulted in the warning. This shouldn't happen. > > I will stare to the code some more to see how the heck we get there > without passing > if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > goto force; Hmm, so while the OOM killer was invoked from [ 65.405905] Call Trace: [ 65.408498] dump_stack+0x1c9/0x2b4 [ 65.421606] dump_header+0x27b/0xf70 [ 65.545094] oom_kill_process.cold.28+0x10/0x95a [ 65.605696] out_of_memory+0xa8a/0x14d0 [ 65.627227] mem_cgroup_out_of_memory+0x213/0x300 [ 65.641293] try_charge+0x720/0x1680 [ 65.674806] memcg_kmem_charge_memcg+0x7c/0x120 [ 65.687939] cache_grow_begin+0x207/0x710 [ 65.696553] fallback_alloc+0x203/0x2c0 [ 65.700519] ____cache_alloc_node+0x1c7/0x1e0 [ 65.704999] kmem_cache_alloc+0x1e5/0x760 [ 65.717947] shmem_alloc_inode+0x1b/0x40 [ 65.722003] alloc_inode+0x63/0x190 [ 65.725642] new_inode_pseudo+0x71/0x1a0 [ 65.738077] new_inode+0x1c/0x40 [ 65.741432] shmem_get_inode+0xf1/0x910 [ 65.771550] __shmem_file_setup.part.48+0x83/0x2a0 [ 65.776482] shmem_file_setup+0x65/0x90 [ 65.780444] __x64_sys_memfd_create+0x2af/0x4f0 The warning happened from a different path [ 66.151455] RIP: 0010:try_charge+0x734/0x1680 [...] [ 66.270886] mem_cgroup_try_charge+0x4ff/0xa70 [ 66.305602] mem_cgroup_try_charge_delay+0x1d/0x90 [ 66.310514] __handle_mm_fault+0x25be/0x4470 [ 66.366608] handle_mm_fault+0x53e/0xc80 [ 66.402384] do_page_fault+0xf6/0x8c0 [ 66.451629] page_fault+0x1e/0x30 So the oom victim indeed passed the above force path after the oom invocation. But later on hit the page fault path and that behaved differently and for some reason the force path hasn't triggered. I am wondering how could we hit the page fault path in the first place. The task is already killed! So what the hell is going on here. I must be missing something obvious here. -- Michal Hocko SUSE Labs