Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp657019imm; Wed, 10 Oct 2018 02:05:11 -0700 (PDT) X-Google-Smtp-Source: ACcGV63YVUR0GaPopiwFOM8YwfI8h+r1te/b5a8FF4hsvPolTcPk7LpiaJGkrxL7RK+fFsBV+qlf X-Received: by 2002:a63:d753:: with SMTP id w19-v6mr15418191pgi.415.1539162311477; Wed, 10 Oct 2018 02:05:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539162311; cv=none; d=google.com; s=arc-20160816; b=NFCdhj93E+aJDkDbzhYUejZxNpqffLixWQu3hxK8Qjrr5sY/TESwTS6CwTUP+cvQR7 ppFUB8wUL+Nm1Rr7OhL1SRfXps3oRfpjdqWpJKVo+CkFqlodVn6b5MkhFKGIHkmBqrYV fIjVuWHiLBW9ZKqSszTPzL/h2l+VQwPuoVMuRXMK2c/XXq9/U4Jh5pXzC7S0f07oMHDf 6vHw0CTWgYUxWGCMILiqJz9tXAOBH54n/nmbCmgj0+8ylVqyBgon/4hNA0aznBTDTA1j DWbE1JeSrC/kUYgXChqsFVOR1S6pfEPPAaa0caTB8m2+3GCJogAkSu3z18BChe9QDKbb Lsbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GD9wIMAw4RSYtVvBLlv+xr8OXJ675xBO5oT4AAQc7UY=; b=dbZUQgwfItWPdbCm5hsuoua9BkpuorD5TEq9P1a1xjo1yCXsaXEH0cxDWiBb6fi0ze ktSTOTDOC63ukPFN4ZuSKEuEBq/ZX+dIg3v0XDEt0vuQMCRPRM0aeDgb6/+txOndflWo X7GxX4ooUDsOlT7zBhKpZPPmF+N0XBTMEBKtJzJZPfFxfaQHfdzFu1BkFWZSV+5ztGM6 6Xv+nTnjzzABY1NfkAVCQxZbg24QavARe4RSLSAYhZAzPnfiISKZBS6unF+zjdC3taa2 10nUs2XEXZCbT0b66CL+gz6FAZDq2uQBoMvMBqGFAxLS6p8hNe1v0DaBloD+Tu1+jrkY h51w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d81-v6si27951059pfm.40.2018.10.10.02.04.56; Wed, 10 Oct 2018 02:05:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727503AbeJJQXx (ORCPT + 99 others); Wed, 10 Oct 2018 12:23:53 -0400 Received: from mx2.suse.de ([195.135.220.15]:49510 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726721AbeJJQXw (ORCPT ); Wed, 10 Oct 2018 12:23:52 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 53B0FB016; Wed, 10 Oct 2018 09:02:39 +0000 (UTC) Date: Wed, 10 Oct 2018 11:02:38 +0200 From: Michal Hocko To: David Rientjes Cc: Tetsuo Handa , syzbot , hannes@cmpxchg.org, akpm@linux-foundation.org, guro@fb.com, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, yang.s@alibaba-inc.com Subject: Re: INFO: rcu detected stall in shmem_fault Message-ID: <20181010090238.GD5873@dhcp22.suse.cz> References: <000000000000dc48d40577d4a587@google.com> <201810100012.w9A0Cjtn047782@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 09-10-18 21:11:48, David Rientjes wrote: > On Wed, 10 Oct 2018, Tetsuo Handa wrote: > > > syzbot is hitting RCU stall due to memcg-OOM event. > > https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64 > > > > What should we do if memcg-OOM found no killable task because the allocating task > > was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires > > (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper > > OOM header when no eligible victim left") because syzbot was terminating the test > > upon WARN(1) removed by that commit) is not a good behavior. > > > > Not printing anything would be the obvious solution but the ideal solution > would probably involve > > - adding feedback to the memcg oom killer that there are no killable > processes, We already have that - out_of_memory == F > - adding complete coverage for memcg_oom_recover() in all uncharge paths > where the oom memcg's page_counter is decremented, and Could you elaborate? > - having all processes stall until memcg_oom_recover() is called so > looping back into try_charge() has a reasonable expectation to succeed. You cannot stall in the charge path waiting for others to make a forward progress because we would be back to oom deadlocks when nobody can make forward progress due to lock dependencies. Right now we simply force the charge and allow for further progress when situation like this happen because this shouldn't happen unless the memcg is misconfigured badly. -- Michal Hocko SUSE Labs