Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp685629imm; Fri, 12 Oct 2018 05:11:36 -0700 (PDT) X-Google-Smtp-Source: ACcGV61v08TpxesqYxPKR2Sdi55A2C5QRDI8WpwvCWdrMqkfnqHTPmwB5zjznEhw+V9TEcqBHCgC X-Received: by 2002:a62:48ce:: with SMTP id q75-v6mr5957376pfi.22.1539346296569; Fri, 12 Oct 2018 05:11:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539346296; cv=none; d=google.com; s=arc-20160816; b=lOLi+A95nM5wYAqoqrdJlCQARsC8lxudj5F9FHrAfNAgFOyrxYCigEMvEEWg+upJcP K81Ymray1QD6Qy8IcZomZaX5ERrlQoloo3sLZB1BP40t9Fiqt4Dyg8+x8l+B5iKD2Od1 vsw5PRGp60uA7ik9nD8JlDGnfX5IOCH8abSe2f5xTNQ+mig4zMwf6hTi2pyNR3gnIvSL HexVMrUdtUhCWnBNSnGz1F2ELBgSLapIX0Q0z1t31SwGozPNZJaKjVDUyb3Xppw0KWaB JyQE54JnW/Nr6js8bMqMq0uEbqvpcT5WYtYZgLHU8BvStuETJtStJsXeRMYt9g7N5vOT IM9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Y6yRaEDoMVDjQCb8pbWwpxbLtIy0olt3g3Hs3eSK9yY=; b=c/u+dY0/doPBvpWiQCzZO9R5jPcTwChE+PZ6rfQnz1qQOjUK/BKXrZ3j97z6bmqSGC R0hqCroB65WjgBH+0vUZnZKg+9cDpgQ+af4dl0bqdzmy+BnLrOVNNDBNq7Q+JP2dfCjF FiOR28NB+a4c6kWCFES385aeIya8g1lpAHHfsXmRqimpFRfRhgg2oPrYsyQ8yCgJj2U+ OciFBYqXOIFmgPRFGMmRmg6Evx9G5ZYWrlyRpIqkOmXxxJuqBhLfp/WQe4C2lcVT8XZd buOmtafAPqxqNM1xulgzaM6zmHxuKNGHp73ZW66z8TCp6ZKKuZFvbsPLK+VBbLwoE192 fKYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h9-v6si1140964pll.225.2018.10.12.05.11.22; Fri, 12 Oct 2018 05:11:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728608AbeJLTlM (ORCPT + 99 others); Fri, 12 Oct 2018 15:41:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:37630 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728595AbeJLTlL (ORCPT ); Fri, 12 Oct 2018 15:41:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 87B98ADD6; Fri, 12 Oct 2018 12:09:00 +0000 (UTC) Date: Fri, 12 Oct 2018 14:08:58 +0200 From: Michal Hocko To: Johannes Weiner Cc: linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, guro@fb.com, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, penguin-kernel@i-love.sakura.ne.jp, rientjes@google.com, yang.s@alibaba-inc.com Subject: Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks Message-ID: <20181012120858.GX5873@dhcp22.suse.cz> References: <000000000000dc48d40577d4a587@google.com> <20181010151135.25766-1-mhocko@kernel.org> <20181012112008.GA27955@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181012112008.GA27955@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 12-10-18 07:20:08, Johannes Weiner wrote: > On Wed, Oct 10, 2018 at 05:11:35PM +0200, Michal Hocko wrote: > > From: Michal Hocko > > > > syzbot has noticed that it can trigger RCU stalls from the memcg oom > > path: > > RIP: 0010:dump_stack+0x358/0x3ab lib/dump_stack.c:118 > > Code: 74 0c 48 c7 c7 f0 f5 31 89 e8 9f 0e 0e fa 48 83 3d 07 15 7d 01 00 0f > > 84 63 fe ff ff e8 1c 89 c9 f9 48 8b bd 70 ff ff ff 57 9d <0f> 1f 44 00 00 > > e8 09 89 c9 f9 48 8b 8d 68 ff ff ff b8 ff ff 37 00 > > RSP: 0018:ffff88017d3a5c70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 > > RAX: 0000000000040000 RBX: 1ffffffff1263ebe RCX: ffffc90001e5a000 > > RDX: 0000000000040000 RSI: ffffffff87b4e0f4 RDI: 0000000000000246 > > RBP: ffff88017d3a5d18 R08: ffff8801d7e02480 R09: fffffbfff13da030 > > R10: fffffbfff13da030 R11: 0000000000000003 R12: 1ffff1002fa74b96 > > R13: 00000000ffffffff R14: 0000000000000200 R15: 0000000000000000 > > dump_header+0x27b/0xf72 mm/oom_kill.c:441 > > out_of_memory.cold.30+0xf/0x184 mm/oom_kill.c:1109 > > mem_cgroup_out_of_memory+0x15e/0x210 mm/memcontrol.c:1386 > > mem_cgroup_oom mm/memcontrol.c:1701 [inline] > > try_charge+0xb7c/0x1710 mm/memcontrol.c:2260 > > mem_cgroup_try_charge+0x627/0xe20 mm/memcontrol.c:5892 > > mem_cgroup_try_charge_delay+0x1d/0xa0 mm/memcontrol.c:5907 > > shmem_getpage_gfp+0x186b/0x4840 mm/shmem.c:1784 > > shmem_fault+0x25f/0x960 mm/shmem.c:1982 > > __do_fault+0x100/0x6b0 mm/memory.c:2996 > > do_read_fault mm/memory.c:3408 [inline] > > do_fault mm/memory.c:3531 [inline] > > > > The primary reason of the stall lies in an expensive printk handling > > of oom report flood because a misconfiguration on the syzbot side > > caused that there is simply no eligible task because they have > > OOM_SCORE_ADJ_MIN set. This generates the oom report for each allocation > > from the memcg context. > > > > While normal workloads should be much more careful about potential heavy > > memory consumers that are OOM disabled it makes some sense to rate limit > > a potentially expensive oom reports for cases when there is no eligible > > victim found. Do that by moving the rate limit logic inside dump_header. > > We no longer rely on the caller to do that. It was only oom_kill_process > > which has been throttling. Other two call sites simply didn't have to > > care because one just paniced on the OOM when configured that way and > > no eligible task would panic for the global case as well. Memcg changed > > the picture because we do not panic and we might have multiple sources > > of the same event. > > > > Once we are here, make sure that the reason to trigger the OOM is > > printed without ratelimiting because this is really valuable to > > debug what happened. > > > > Reported-by: syzbot+77e6b28a7a7106ad0def@syzkaller.appspotmail.com > > Cc: guro@fb.com > > Cc: hannes@cmpxchg.org > > Cc: kirill.shutemov@linux.intel.com > > Cc: linux-kernel@vger.kernel.org > > Cc: penguin-kernel@i-love.sakura.ne.jp > > Cc: rientjes@google.com > > Cc: yang.s@alibaba-inc.com > > Signed-off-by: Michal Hocko > > So not more than 10 dumps in each 5s interval. That looks reasonable > to me. By the time it starts dropping data you have more than enough > information to go on already. Yeah. Unless we have a storm coming from many different cgroups in parallel. But even then we have the allocation context for each OOM so we are not losing everything. Should we ever tune this, it can be done later with some explicit examples. > Acked-by: Johannes Weiner Thanks! I will post the patch to Andrew early next week. -- Michal Hocko SUSE Labs