Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp1387921pxv; Fri, 16 Jul 2021 08:16:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzwh96M4MmCcOnaF8b45ElSC/KpyUHrge178c/AGED9OIzdTirQ3STr6qXzpHf3lR6rl7MO X-Received: by 2002:a17:906:95cf:: with SMTP id n15mr12168024ejy.531.1626448558660; Fri, 16 Jul 2021 08:15:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626448558; cv=none; d=google.com; s=arc-20160816; b=w8pkzjAFaLM6VRTadqiL8XFPqCgNRP26uHp5QZCc52snFAXwS5AYsQ6JJBqnFbB4O2 SVCzKASyW9jvX9ogBJLzumWMGr0Ic4yGrcyQTsaM7O0ohvNbRgmz02VGq1AFi7nYHewU cQ26MxC/Djqhuwadbq2FeFbJ28V74idCj7Egrdhnjfnw+Zyi5POEIY5LTmVM7DmTceHF uuIZIn1o05Z7g4XraJ75wNPIT8H+R9/mbfJM21wZkrJe4FqbqU0jqXiSv5Rqi+1yyCA5 HEfGOkvBGMxFu4fkVu+f8jsIJJcj+GfujxUHOc7CApxJtxbkfpqVbnJ+tbJZmkrVunz0 mYRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=E5lvtqvNHgwnprByXaUchwc7++4x2TnzEwQL9H4ZASQ=; b=jO6zeYHgVYKGwZ7BWYa/pCu5r8jWxVVUNcNAnJSk47++jUoyV9Cjm+ktlPNsHsAeIK SXjFhQuOcZHyMI2obLoDfstrn5JeWB0KAWT1izCa75ED7GsG4PiKk4Dpb/onB0oNXjEl oFfrUxCYI7HVLDetWvVgcnM2dnFuiWCY+pEE1sYgGWjU9UR6wS577OzxdoxFhMQU0kfT ul6vCaY71AqNFNIIoqPdcKi7/vCgTdbOJzUhP0fOdSur06qKC9u80c+JBn5/JNjHnR9r 0Eo5YPuxpsngzioYDWtjeJLhD8GTQm/acLb1F+Z2X53D+i/jHAQUS4SFqrGy/3f6EILx NGIA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hObCBLU8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x41si273891edy.592.2021.07.16.08.15.34; Fri, 16 Jul 2021 08:15:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hObCBLU8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233094AbhGPPRc (ORCPT + 99 others); Fri, 16 Jul 2021 11:17:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236793AbhGPPRb (ORCPT ); Fri, 16 Jul 2021 11:17:31 -0400 Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98050C061760 for ; Fri, 16 Jul 2021 08:14:36 -0700 (PDT) Received: by mail-lf1-x12d.google.com with SMTP id 22so16482790lfy.12 for ; Fri, 16 Jul 2021 08:14:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=E5lvtqvNHgwnprByXaUchwc7++4x2TnzEwQL9H4ZASQ=; b=hObCBLU8vhQ3DhnNnjvSg7B3axm6THGsz2MeMxUiAGIg67loWCPw5KdkdNiJoeFYcr +KjxiGoOq8ss/mdCvb2FtkcRcYrj2/5V/c83RcHvpX7Z/qSEMdwiTHjd8XN2IJxlK4hi 360mtiZaSfUXs+NL4QtafyL1LHLHJLkaLiyRSc4+QhMvRPc00Cj/0VIsgcFkIs9XRQz4 Vou1+tl8dfKZPa3XHoi2G546FRTN5/HZ2eEV1stlWHsMD2lZ0UyWHrW4DuiPCxpTtLao xWUfE7VZcfhw1OBuLhqP7XTWVpx7aMFbOK+y0IQB4cgZv0/kQKuhbuOWHUjXZ7b1oMiQ tZrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=E5lvtqvNHgwnprByXaUchwc7++4x2TnzEwQL9H4ZASQ=; b=F+p4SosXRm9z/2qbmz0PIAkXZLXD1S6/tlzaKekuMAjlhXR95qfQ51if5E7fvXaxNh O2buDXm8hizOBGXLg8nFgipqHjzAUrY2FAlwSuHcvnq9QrCD3Jz6BF9ovhRboE45jD/B rmanQ0C6FzJMX6n+pj69YeeDSK0xeFEaZ0Nws+3LtNQ7t5CYaXU31sa131c8PmJO4MAe cr04qtN+kTtHvKxC9yOsIs3tZ9E9HMFeTayfdLH0oKWPhG9n+hfADh+R/Ck+Trg4T1CE LgR1WhultNk1OSVsvbzhfxhs+7ltl0+ZKV1txW6w03juDR8SCnF5SmjHSufubfK/urm0 mthg== X-Gm-Message-State: AOAM533l+N0/XLzqSnz0GMkjjDIpyAZuM5XwCKg81VVblhhXbjcbnICT EMTyLuLXaLuXTxiE5ho+32Ryb3sovRnt5x10gOOybA== X-Received: by 2002:a19:ae0f:: with SMTP id f15mr8179914lfc.117.1626448474507; Fri, 16 Jul 2021 08:14:34 -0700 (PDT) MIME-Version: 1.0 References: <20210714013948.270662-1-shakeelb@google.com> <20210714013948.270662-2-shakeelb@google.com> <78005c4c-9233-7bc8-d50e-e3fe11f30b5d@samsung.com> In-Reply-To: <78005c4c-9233-7bc8-d50e-e3fe11f30b5d@samsung.com> From: Shakeel Butt Date: Fri, 16 Jul 2021 08:14:23 -0700 Message-ID: Subject: Re: [PATCH v4 2/2] memcg: infrastructure to flush memcg stats To: Marek Szyprowski Cc: Tejun Heo , Johannes Weiner , Muchun Song , Michal Hocko , Roman Gushchin , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Huang Ying , Hillf Danton , Andrew Morton , Cgroups , Linux MM , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Marek On Fri, Jul 16, 2021 at 8:03 AM Marek Szyprowski wrote: > > Hi, > > On 14.07.2021 03:39, Shakeel Butt wrote: > > At the moment memcg stats are read in four contexts: > > > > 1. memcg stat user interfaces > > 2. dirty throttling > > 3. page fault > > 4. memory reclaim > > > > Currently the kernel flushes the stats for first two cases. Flushing the > > stats for remaining two casese may have performance impact. Always > > flushing the memcg stats on the page fault code path may negatively > > impacts the performance of the applications. In addition flushing in the > > memory reclaim code path, though treated as slowpath, can become the > > source of contention for the global lock taken for stat flushing because > > when system or memcg is under memory pressure, many tasks may enter the > > reclaim path. > > > > This patch uses following mechanisms to solve these challenges: > > > > 1. Periodically flush the stats from root memcg every 2 seconds. This > > will time limit the out of sync stats. > > > > 2. Asynchronously flush the stats after fixed number of stat updates. > > In the worst case the stat can be out of sync by O(nr_cpus * BATCH) for > > 2 seconds. > > > > 3. For avoiding thundering herd to flush the stats particularly from the > > memory reclaim context, introduce memcg local spinlock and let only one > > flusher active at a time. This could have been done through > > cgroup_rstat_lock lock but that lock is used by other subsystem and for > > userspace reading memcg stats. So, it is better to keep flushers > > introduced by this patch decoupled from cgroup_rstat_lock. > > > > Signed-off-by: Shakeel Butt > > This patch landed in today's linux-next (next-20210716) as commit > 42265e014ac7 ("memcg: infrastructure to flush memcg stats"). On my test > system's I found that it triggers a kernel BUG on all ARM64 boards: > > BUG: sleeping function called from invalid context at > kernel/cgroup/rstat.c:200 > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 7, name: > kworker/u8:0 > 3 locks held by kworker/u8:0/7: > #0: ffff00004000c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > process_one_work+0x200/0x718 > #1: ffff80001334bdd0 ((stats_flush_dwork).work){+.+.}-{0:0}, at: > process_one_work+0x200/0x718 > #2: ffff8000124f6d40 (stats_flush_lock){+.+.}-{2:2}, at: > mem_cgroup_flush_stats+0x20/0x48 > CPU: 2 PID: 7 Comm: kworker/u8:0 Tainted: G W 5.14.0-rc1+ #3713 > Hardware name: Raspberry Pi 4 Model B (DT) > Workqueue: events_unbound flush_memcg_stats_dwork > Call trace: > dump_backtrace+0x0/0x1d0 > show_stack+0x14/0x20 > dump_stack_lvl+0x88/0xb0 > dump_stack+0x14/0x2c > ___might_sleep+0x1dc/0x200 > __might_sleep+0x4c/0x88 > cgroup_rstat_flush+0x2c/0x58 > mem_cgroup_flush_stats+0x34/0x48 > flush_memcg_stats_dwork+0xc/0x38 > process_one_work+0x2a8/0x718 > worker_thread+0x48/0x460 > kthread+0x12c/0x160 > ret_from_fork+0x10/0x18 > > This can be also reproduced with QEmu. Please let me know if I can help > fixing this issue. > Thanks for the report. The issue can be fixed by changing cgroup_rstat_flush() to cgroup_rstat_flush_irqsafe() in mem_cgroup_flush_stats(). I will send out the updated patch in a couple of hours after a bit more testing.