Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp665388imm; Wed, 10 Oct 2018 02:14:23 -0700 (PDT) X-Google-Smtp-Source: ACcGV60fB5bNSHv/mgHKca0Ji5xLJcMfXcpiDvJJYltkc0zSG6Od2FuK+RFW2tG2b9f0x0BQtca2 X-Received: by 2002:a65:41c6:: with SMTP id b6-v6mr29692667pgq.421.1539162863712; Wed, 10 Oct 2018 02:14:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539162863; cv=none; d=google.com; s=arc-20160816; b=oZa5tHStzy4RWVLW3f6Wwo4flUSZ77UjnD+2q1FB93HER10IqiNYHOnkaAJ8+zXtsX vZRnePtSdVANyLkWJfTjccaghjTPA903+BaqXChTg83gSlfU1KWRyXUomQY4VnDoFEe2 WfWHXrll8IPY2YmhBJnNelwIqgoelMxUBGFXzgJOFyMwxp6WWZwSWisSSMJz4ZinhXAE vIgtVkybE5GbUAsgkoXRrf0EftxNNnzYWEwcU2t3QGfOC72YsBr9CMsLs6+dNxuvi4SQ LwnUosdo0fbr2720y/VjjTf9X4ssWU+ASTWP9jQfCsnXHFvoOL6c9HLOoCjcl3LvZ1t+ wOdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=RVhLWeO4JJCNoprpLj/87RfBLKC74aFNFKxHtAKM7eM=; b=uVtpQICDQMqvunKOFrWcDQDUWTylFY31Uss2SpXAOfOhzlXkpHEEJwTkwCkwV2wba3 Ok5TpiMq8Z+3Lgq1eyU7NII1tPyaU1jemWSpROknFnidY+JlhEC87N4t4vMCmcPiM9hL KtoSZ4q3WBS8MdE56Daa/T+lwTR9Sq4S4X1vTYNNcwxDNvhrjffRCRpUM+4MJySCOf9K bKv6vntdG/NR7r5T3PjpeLlx6dm7ChdzufjNsUzI4iF2VkricMJuej6Gu+Z49/g7jDJM BU5iv0PTWM7s73fcmJfrfsh6Ion0tQIixJbTAeQin4stuEJrcGnfsFBWYHjJJpSuT+7y AvKw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l185-v6si22621119pgl.270.2018.10.10.02.14.09; Wed, 10 Oct 2018 02:14:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727352AbeJJQe3 (ORCPT + 99 others); Wed, 10 Oct 2018 12:34:29 -0400 Received: from mx2.suse.de ([195.135.220.15]:52172 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727170AbeJJQe1 (ORCPT ); Wed, 10 Oct 2018 12:34:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id F00A1B02B; Wed, 10 Oct 2018 09:13:10 +0000 (UTC) Date: Wed, 10 Oct 2018 11:13:09 +0200 From: Michal Hocko To: Dmitry Vyukov Cc: David Rientjes , Tetsuo Handa , syzbot , Johannes Weiner , Andrew Morton , guro@fb.com, "Kirill A. Shutemov" , LKML , Linux-MM , syzkaller-bugs , Yang Shi Subject: Re: INFO: rcu detected stall in shmem_fault Message-ID: <20181010091309.GE5873@dhcp22.suse.cz> References: <000000000000dc48d40577d4a587@google.com> <201810100012.w9A0Cjtn047782@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 10-10-18 09:55:57, Dmitry Vyukov wrote: > On Wed, Oct 10, 2018 at 6:11 AM, 'David Rientjes' via syzkaller-bugs > wrote: > > On Wed, 10 Oct 2018, Tetsuo Handa wrote: > > > >> syzbot is hitting RCU stall due to memcg-OOM event. > >> https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64 > >> > >> What should we do if memcg-OOM found no killable task because the allocating task > >> was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires > >> (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper > >> OOM header when no eligible victim left") because syzbot was terminating the test > >> upon WARN(1) removed by that commit) is not a good behavior. > > > You want to say that most of the recent hangs and stalls are actually > caused by our attempt to sandbox test processes with memory cgroup? > The process with oom_score_adj == -1000 is not supposed to consume any > significant memory; we have another (test) process with oom_score_adj > == 0 that's actually consuming memory. > But should we refrain from using -1000? Perhaps it would be better to > use -500/500 for control/test process, or -999/1000? oom disable on a task (especially when this is the only task in the memcg) is tricky. Look at the memcg report [ 935.562389] Memory limit reached of cgroup /syz0 [ 935.567398] memory: usage 204808kB, limit 204800kB, failcnt 6081 [ 935.573768] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 [ 935.580650] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 [ 935.586923] Memory cgroup stats for /syz0: cache:152KB rss:176336KB rss_huge:163840KB shmem:344KB mapped_file:264KB dirty:0KB writeback:0KB swap:0KB inactive_anon:260KB active_anon:176448KB inactive_file:4KB active_file:0KB There is still somebody holding anonymous (THP) memory. If there is no other eligible oom victim then it must be some of the oom disabled ones. You have suppressed the task list information so we do not know who that might be though. So it looks like there is some misconfiguration or a bug in the oom victim selection. -- Michal Hocko SUSE Labs