Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp682048imm; Wed, 10 Oct 2018 02:34:13 -0700 (PDT) X-Google-Smtp-Source: ACcGV60bPPsfVMR3fnsaXhKX33kg2g4Tv+/BuFAdk1+ShfSrB6cIYyIydslkNpltrnCLymXsA4Jt X-Received: by 2002:a62:9e52:: with SMTP id s79-v6mr34776903pfd.110.1539164053351; Wed, 10 Oct 2018 02:34:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539164053; cv=none; d=google.com; s=arc-20160816; b=T5tFjBbOyEQPCH8qWks/ZHf9C9qtcwkthMBsY2iLL+GPg0FZWLvPr8GBVGseKdZcrX Rva4IbSvUeFX9Q+YmIE1WV2Xab3wlJM+0vi9VWWu/wAJu9P/Ouswnxg3Qo4q8uYvHqZk av8kh9djP3YDugcPfcyXR7RBOAXoOiYIdgsgH7HeFFqtFG7WyL4QoxYaaQqJc+9I48Hp D0mz4RLHhpaHVSlmPUpDPxXFOWH8NBzgQJn0pLvHU57vvC55Ti7jLLYo+EY2c8fIDnS7 VlEw6BYaet/9X6Fcbaa9qeIwsAFTqZbfQnp3dc1AccNmJYtZ2l6rfIFefxI0G3A0sET0 O0dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature; bh=qoRJmTum/IGEAF2duW2LywlMjqVpnuxGW1XEm9gtffY=; b=WiGwUH5GIh6usiJUqA45+EG7MavA9rd0kiqId9sukCbNtjmuyAi0fB0bQ5d0pJb65s Szx/PwHci/lYvjPq9GAm7ERWvcBSBTcQYxUWL84WJpCklwaaNtbbycg8ofJraFdV2ABj A34m2FDm3RU1Hk7Pbn729FEDMF6z8W0+zdqMclO3zAnkdIOtZlvYuML9l9kGIi3qhrpX jrnh2YYiQp84pmCnU+E3TKwlnwHndIb/RvHIVP6EqnFBwmyoTbv3AilXijGeXKmLvpGe H5rLxUI5Sqed0cR/NzLrrb14tNmtu9xGfmWKVgP6YKmtISVkunERj+4NeINvYguSNE/p XyuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="c4/b3HRJ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w64-v6si14558036pfb.247.2018.10.10.02.33.58; Wed, 10 Oct 2018 02:34:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="c4/b3HRJ"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726969AbeJJQyz (ORCPT + 99 others); Wed, 10 Oct 2018 12:54:55 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:56093 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726663AbeJJQyy (ORCPT ); Wed, 10 Oct 2018 12:54:54 -0400 Received: by mail-it1-f195.google.com with SMTP id c23-v6so7010723itd.5 for ; Wed, 10 Oct 2018 02:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=qoRJmTum/IGEAF2duW2LywlMjqVpnuxGW1XEm9gtffY=; b=c4/b3HRJjBsv19wUSYGgyqVOyuwAiFa0iec+UX5K6otCY8HSFWbH2fo+N0Cshp8+sF AjJtBN+oQMP0Or6UHU9ZJChruC1S1TosHZmcL9kNgThAuO8JjPtl32GDS+4oH83NvUWa rc23m+Tgai7FLLL7OPN+4QzjQyMUjSsoId5PJMAWiaq9ClXLNURTP2KQT3JKSyAxDSHO LwTkfLcURJAciO1DxnuOgFTOglEA8s+TfOD+P+lvjLt9/SW+NrVn4jtdQiXYzs+Z5Ahs 2W7n0DxK8uG4UYX+7lMUrnRPGOQUwkPui0v5axOLAW22LqOqkUNW6I1cZBeRHpLmpSyA DReQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=qoRJmTum/IGEAF2duW2LywlMjqVpnuxGW1XEm9gtffY=; b=pUlsYwuIKjq+Y7hbE6vt7XZEnDdfCU+5EYTYT83MqdF39zQN1GeLr9r9F+ZAnseR5k kPNnWSb/GCpx9ztMkWA/ZbKyM8LvHe4VRqy9SBULph9rTxAkJKLvDTBCtgPVxyI7sAdb ywK4z655YvVfL9nrWXR7mYj6BrVU1U4chf6DwTKZIB6aVPFNCSgf+RnA7WHwVKJ5P2kk k0kaE27UntOAZht+TjMbAbQ/xa9YpOz74bwH/K+o4MFH2zUOVCZJG0G3e/ZAEH67jHih jt3gHiCpBpX/CbXGMPO6Ue9Ym+RMd6k2AnqWcpXq0GFUAHoJk+C/6PfqZzfsySZXepTX 666g== X-Gm-Message-State: ABuFfohs/m04kEFyppF1zSiULY8GPnSDt/09O61+e8hxy9802GKBk7SM wD/LqSavJ7Wx4WaOtU1ZnisALImey3+CvG1tj58+Dw== X-Received: by 2002:a24:940f:: with SMTP id j15-v6mr140351ite.12.1539164014817; Wed, 10 Oct 2018 02:33:34 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:1003:0:0:0:0:0 with HTTP; Wed, 10 Oct 2018 02:33:14 -0700 (PDT) In-Reply-To: <20181010091309.GE5873@dhcp22.suse.cz> References: <000000000000dc48d40577d4a587@google.com> <201810100012.w9A0Cjtn047782@www262.sakura.ne.jp> <20181010091309.GE5873@dhcp22.suse.cz> From: Dmitry Vyukov Date: Wed, 10 Oct 2018 11:33:14 +0200 Message-ID: Subject: Re: INFO: rcu detected stall in shmem_fault To: Michal Hocko Cc: David Rientjes , Tetsuo Handa , syzbot , Johannes Weiner , Andrew Morton , guro@fb.com, "Kirill A. Shutemov" , LKML , Linux-MM , syzkaller-bugs , Yang Shi Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 10, 2018 at 11:13 AM, Michal Hocko wrote: > On Wed 10-10-18 09:55:57, Dmitry Vyukov wrote: >> On Wed, Oct 10, 2018 at 6:11 AM, 'David Rientjes' via syzkaller-bugs >> wrote: >> > On Wed, 10 Oct 2018, Tetsuo Handa wrote: >> > >> >> syzbot is hitting RCU stall due to memcg-OOM event. >> >> https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64 >> >> >> >> What should we do if memcg-OOM found no killable task because the allocating task >> >> was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires >> >> (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper >> >> OOM header when no eligible victim left") because syzbot was terminating the test >> >> upon WARN(1) removed by that commit) is not a good behavior. >> >> >> You want to say that most of the recent hangs and stalls are actually >> caused by our attempt to sandbox test processes with memory cgroup? >> The process with oom_score_adj == -1000 is not supposed to consume any >> significant memory; we have another (test) process with oom_score_adj >> == 0 that's actually consuming memory. >> But should we refrain from using -1000? Perhaps it would be better to >> use -500/500 for control/test process, or -999/1000? > > oom disable on a task (especially when this is the only task in the > memcg) is tricky. Look at the memcg report > [ 935.562389] Memory limit reached of cgroup /syz0 > [ 935.567398] memory: usage 204808kB, limit 204800kB, failcnt 6081 > [ 935.573768] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > [ 935.580650] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > [ 935.586923] Memory cgroup stats for /syz0: cache:152KB rss:176336KB rss_huge:163840KB shmem:344KB mapped_file:264KB dirty:0KB writeback:0KB swap:0KB inactive_anon:260KB active_anon:176448KB inactive_file:4KB active_file:0KB > > There is still somebody holding anonymous (THP) memory. If there is no > other eligible oom victim then it must be some of the oom disabled ones. > You have suppressed the task list information so we do not know who that > might be though. > > So it looks like there is some misconfiguration or a bug in the oom > victim selection. I afraid KASAN can interfere with memory accounting/OMM killing too. KASAN quarantines up to 1/32-th of physical memory (in our case 7.5GB/32 = 230MB) that is already freed by the task, but as far as I understand is still accounted against memcg. So maybe making cgroup limit >> quarantine size will help to resolve this too. But of course there can be a plain memory leak too.