Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp652512imm; Wed, 10 Oct 2018 02:01:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV61VU/JnccH2EKGK+5DN2yUsgtXf2MJDfaV7Y01EDCbr9hTfFMKR/qGcnEOmbGRrBndQ1xXT X-Received: by 2002:a63:5747:: with SMTP id h7-v6mr29126428pgm.423.1539162065060; Wed, 10 Oct 2018 02:01:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539162065; cv=none; d=google.com; s=arc-20160816; b=Vo82QhNPfc0k7heb0Jqb/D9+vwMBSklweZBmg0ipWevNHYxFk6sPyCxsVxOgyNcwoV dWGHhq4ZBXW0C4Udx5rLJx5JfjgQSR1AkdUgWp8a8FKutik3gja0ShICHza/rFafw7ig nwRNDb/oBfcJoc4z2/8KNnX9AkAMaVRr4F3yKeuFixNb6ChsPKQkEoY/KPnUr3m5GuMS Vge5n/kRq1X2Ivq/XR4Vh+ypqmQuA4vZr6BnIkS+8oZkXLlEtTZYDQBWrooKmZUWwWqo JjIR39INJhIMJQp4Lw0Dqr/Qt/PNeRyKVdY95xrcwDFcmWIsI8ISDtlCfQ5vrh5OfXq7 nwcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=1tQdcc6p/fp3jGC8Z4JelXldFgDfy8/1fNtEqXZVj5A=; b=N9eirjP+5Ck5fWzUP3kAqWpGE28FCaB98H/c8CjsB06t1eh0arzEz/oMgxLQVXcgJi 8sX8GhzwdLHFN4JouwjlBfkfSu+mEgVzrkWYjQv9ZK9Tb9ZgMaSkeJ7eOqiRON04Mdx8 PRXCWpf7kBHgNu9RvhmyepYzTOwyJkVy3eKZoTiNG8cGhHK52QQmJBdF/tz4tTsVSPC0 MrI2y1bgweY8951VO99l4/+HrVpY1av3NbbFipNMHTKQvlmxa0nX/NK2Sng5C4ksEvGA njN38RK8AvmLTDk0Mg3RMpCnP9Y9EfzEZ/Ic/0G1W6qgYBC5OK2976ZEPS8suD57Zfjn JEOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f34-v6si6599021plf.161.2018.10.10.02.00.50; Wed, 10 Oct 2018 02:01:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727529AbeJJQU7 (ORCPT + 99 others); Wed, 10 Oct 2018 12:20:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:48546 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727168AbeJJQU6 (ORCPT ); Wed, 10 Oct 2018 12:20:58 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AC2A5B013; Wed, 10 Oct 2018 08:59:46 +0000 (UTC) Date: Wed, 10 Oct 2018 10:59:45 +0200 From: Michal Hocko To: Tetsuo Handa Cc: syzbot , hannes@cmpxchg.org, akpm@linux-foundation.org, guro@fb.com, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, rientjes@google.com, syzkaller-bugs@googlegroups.com, yang.s@alibaba-inc.com Subject: Re: INFO: rcu detected stall in shmem_fault Message-ID: <20181010085945.GC5873@dhcp22.suse.cz> References: <000000000000dc48d40577d4a587@google.com> <201810100012.w9A0Cjtn047782@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201810100012.w9A0Cjtn047782@www262.sakura.ne.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 10-10-18 09:12:45, Tetsuo Handa wrote: > syzbot is hitting RCU stall due to memcg-OOM event. > https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64 This is really interesting. If we do not have any eligible oom victim we simply force the charge (allow to proceed and go over the hard limit) and break the isolation. That means that the caller gets back to running and realease all locks take on the way. I am wondering how come we are seeing the RCU stall. Whole is holding the rcu lock? Certainly not the charge patch and neither should the caller because you have to be in a sleepable context to trigger the OOM killer. So there must be something more going on. > What should we do if memcg-OOM found no killable task because the allocating task > was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires > (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper > OOM header when no eligible victim left") because syzbot was terminating the test > upon WARN(1) removed by that commit) is not a good behavior. We definitely want to inform about ineligible oom victim. We might consider some rate limiting for the memcg state but that is a valuable information to see under normal situation (when you do not have floods of these situations). -- Michal Hocko SUSE Labs