Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp244934ybx; Wed, 6 Nov 2019 16:23:49 -0800 (PST) X-Google-Smtp-Source: APXvYqwaxXM9okQ+Ku47zVLmcG6nJvl9ZRPoKcoOo7/N8N1rpQAjM+S9Y0aKPvzaqdW5BTW5uSKj X-Received: by 2002:a05:6402:602:: with SMTP id n2mr597417edv.23.1573086229675; Wed, 06 Nov 2019 16:23:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573086229; cv=none; d=google.com; s=arc-20160816; b=knJ8S/sZ6QlgM3ZbLl5rtvTGI6gQ5kOh8zejOgPAFCeJR0XdjaLXKTm46k0nIM5IzE Q+PHVDaZtyO+wJOUjks6neZV2W1VV2oIgQtSStpFB0md3rgFXwBMG2LrXwTx0UEXS1im zs0TcketIWw/ysCrBmU3HbZit4NDpBvHaKQjXLYV0SmOoH2plCHty7VdBhYMpcq0hLgu vHqOSzyh+Dsz3UpUAot20QT2z9J4pOJXABGmhF16JB46pm0vtNNhg40U4Gsp74F5mmta xm/MTMxuBLTm86o/5YB0VGuXOtdNJkLhGSH+qNtovnxW8PZE/kN5GtnBMdBMyN55saA6 kEMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=krnlyvjs+KeSr0Hw/FRyxkzA1g9k1qVle5bXbv5R+yw=; b=soUhUDzgqRB1rID+XZrLuCyzI1BWeXD/+BCWIDXnKohcDbICKGmfpv7Ar64uTy3qZ3 wsv/BFgkMWJZf4b9neF8iKhJ8BPzs0WCtqVgjK88xhbzVw7UHsy5NF8fVc2eqYugRNLo gxpvtwxLAehjymcL3+eRYUVIDvbnNRDdus7wOWIMnguY/3dPyV/z1U6HMBAVErTvh/4R 8R2FVH4S3DBX0HdHD0sPMGPy9UKAW8D3YBGf0Cz6qqxteYJy2CWEWEMONp/PwNNQ4/hC fCVGJ7+hVAQ7aRv7IzvUith73kmXO8xayCOZ4P+yx6hzy3AYgs184PWWJbcqmXWvdMU0 nZmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=0cc8pNLT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gr10si356266ejb.414.2019.11.06.16.23.26; Wed, 06 Nov 2019 16:23:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=0cc8pNLT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728363AbfKGAWI (ORCPT + 99 others); Wed, 6 Nov 2019 19:22:08 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:36407 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727228AbfKGAWH (ORCPT ); Wed, 6 Nov 2019 19:22:07 -0500 Received: by mail-pg1-f196.google.com with SMTP id k13so421552pgh.3 for ; Wed, 06 Nov 2019 16:22:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=krnlyvjs+KeSr0Hw/FRyxkzA1g9k1qVle5bXbv5R+yw=; b=0cc8pNLTRY4uYn+HWKMYisOJ/XkVQI6HNMfotIlKRVv9t1mag0CmVviq468ziCwLou yG/6iyhWwop7qXtxNBa3R2bkVrC5ya4ByPrnnQO82kosOkwh0aCb/kDYyCQttFGHyR66 JNgcpdVEWlPTpyc9tVpRmfmc3p03TYtdGths7wywJ743cw9M0GNvhMUrvOnGakSdOujC bMdsT6XUMaTkLsdgdZh9CwP/N1MKXeugZXggeNwXV7NJp2fbd4x7G27Cz76UqAUBqycW JliOIOAX2QJPXR06sTd4LGTRY3H7vWudrkFCQ633AwBThq4FB59u+7dDOEbPAwhnM8W4 lOyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=krnlyvjs+KeSr0Hw/FRyxkzA1g9k1qVle5bXbv5R+yw=; b=MGa8d49Jo+wnKjC1aw4IofZgwim7f+VYCusnI2WEvtCHMibWLzjKQapQqituQxR0dE UHctTMWVB2E8eeM82Y/DXXR3EC4Wb8xfZHbPvoa53TPLhJRR/9+xpHYpilslyVQxz/iR ksT4CB+VlxUUDvHd6Dc4YRdABYCeGVM1iC8lptudN4lFn8gxnK/k0qUkQOSUsiPBR2N3 Neoj0D2xYj1+/qeK2wzGE3DYmFj1bib+jeMkmJjFJ1Tzc6BfzVMC9o+xsFKwdsi9jLbQ ATWVHduq3R8rs3J0EAL9GzGdb37oRUVvlpEmvWiCC3bpjOgOJMYyqBY5ECG6hUtd0P5m gswA== X-Gm-Message-State: APjAAAUi1YSVLSAvyKV6O9ggUpm14SLO3e2Yg624+25MyV1/rFYbOcnN s/MbQtKyglhAQG1t+SH+PDQxBQ== X-Received: by 2002:a62:1ac6:: with SMTP id a189mr185546pfa.96.1573086126591; Wed, 06 Nov 2019 16:22:06 -0800 (PST) Received: from localhost ([2620:10d:c090:200::2:deb0]) by smtp.gmail.com with ESMTPSA id z25sm145364pfa.88.2019.11.06.16.22.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Nov 2019 16:22:05 -0800 (PST) Date: Wed, 6 Nov 2019 16:22:04 -0800 From: Johannes Weiner To: Roman Gushchin Cc: linux-mm@kvack.org, Andrew Morton , Michal Hocko , linux-kernel@vger.kernel.org, kernel-team@fb.com, stable@vger.kernel.org, Tejun Heo Subject: Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Message-ID: <20191107002204.GA96548@cmpxchg.org> References: <20191106225131.3543616-1-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191106225131.3543616-1-guro@fb.com> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote: > We've encountered a rcu stall in get_mem_cgroup_from_mm(): > > rcu: INFO: rcu_sched self-detected stall on CPU > rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017 > (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33 > <...> > RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90 > <...> > __memcg_kmem_charge+0x55/0x140 > __alloc_pages_nodemask+0x267/0x320 > pipe_write+0x1ad/0x400 > new_sync_write+0x127/0x1c0 > __kernel_write+0x4f/0xf0 > dump_emit+0x91/0xc0 > writenote+0xa0/0xc0 > elf_core_dump+0x11af/0x1430 > do_coredump+0xc65/0xee0 > ? unix_stream_sendmsg+0x37d/0x3b0 > get_signal+0x132/0x7c0 > do_signal+0x36/0x640 > ? recalc_sigpending+0x17/0x50 > exit_to_usermode_loop+0x61/0xd0 > do_syscall_64+0xd4/0x100 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > The problem is caused by an exiting task which is associated with > an offline memcg. We're iterating over and over in the > do {} while (!css_tryget_online()) loop, but obviously the memcg won't > become online and the exiting task won't be migrated to a live memcg. > > Let's fix it by switching from css_tryget_online() to css_tryget(). > > As css_tryget_online() cannot guarantee that the memcg won't go > offline, the check is usually useless, except some rare cases > when for example it determines if something should be presented > to a user. > > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use > css_tryget() instead of css_tryget_online() in task_get_css()"). > > Signed-off-by: Roman Gushchin > Cc: stable@vger.kernel.org > Cc: Tejun Heo Acked-by: Johannes Weiner The bug aside, it doesn't matter whether the cgroup is online for the callers. It used to matter when offlining needed to evacuate all charges from the memcg, and so needed to prevent new ones from showing up, but we don't care now.