Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp7864891rwl; Thu, 23 Mar 2023 09:32:05 -0700 (PDT) X-Google-Smtp-Source: AK7set//cL3Y2nsAEBOyMJP4nmVYODUdQ5kYcLHwTlWmMmgUM6GW7Td2wP8xKv9YxUwJs8OZ4dhi X-Received: by 2002:aa7:9e44:0:b0:624:2e60:f21e with SMTP id z4-20020aa79e44000000b006242e60f21emr7134560pfq.29.1679589125313; Thu, 23 Mar 2023 09:32:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679589125; cv=none; d=google.com; s=arc-20160816; b=jAE9hwwRBOdC8stdB9f2tEpOYAzddFCkT5Nmbz3OzGbl2P59FjRz9jtnfrsNdoL1A+ 1cyMOdqnzH6WXKg1AteYIAX7rsLxvaB2hctz2NZ6ZbrOXWImwfK7O1j48+qIkdbTLKjY GG1czzrKQNnWFcedaZEcPwK9eyiIC3BeiTKcEejL/5AvBDtJyOa2nhyyAA9xIjRSGaUP /jav8Jv0RnFXhe+HpLdrda73U5Uj0/aQpnW4RO0DSijJAH/Ph+aSZIPoZE/5U0RY/i/b kZg7W9pjmW94bR40aQAozPZvb8iCvkEEBkpE4EzzsUA83KP6+6ihBZvY/jr5QOZpIRS0 mKCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=DxwPj9zc4wVERdaRm3UD6tbyjSI8uDBIlR3KTexXHNg=; b=r8R87kqL+rFFTg5wpyJjWaH8wiLW419avUmYe6aWvSXkssuaK2/vVA/+TZaw4fIXFL caFvnhfr6gxRmepEuRGkI5ujnTkA82trlFp6NMLFfq7mszMcz3YpHW1nMPBPE8/VK31z fRYMUvN2gD+NjXW3sDDxhdlwEwWkcfaDPpT409izuq1P1bPf4eqryXwmMEJ8wYr6j1z5 +pK/my93eggAar020YNEedkRGZlKxb2belnK9OcMjrMklXnTuTky2MF406BMU6sqcCso IMCa3YmG4RnhfZ4cpyIyTFAVfAxTonOcZ2jcweeEGkzPCc/sWCUvOD40pO/vuS3fPmDE 3z3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XhO6nZVp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u4-20020a056a00158400b00628217e3ea6si5641842pfk.316.2023.03.23.09.31.53; Thu, 23 Mar 2023 09:32:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XhO6nZVp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229616AbjCWQSO (ORCPT + 99 others); Thu, 23 Mar 2023 12:18:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbjCWQSN (ORCPT ); Thu, 23 Mar 2023 12:18:13 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 590601711 for ; Thu, 23 Mar 2023 09:18:11 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id eg48so89023243edb.13 for ; Thu, 23 Mar 2023 09:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679588290; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DxwPj9zc4wVERdaRm3UD6tbyjSI8uDBIlR3KTexXHNg=; b=XhO6nZVp3cJ6GYNnQ2ZPBd/bevteGrZU/HIt7rhk+LHiYXbwkTEIAA/Gm2lDHIMikg gfoYKiDCpNo8lnfMBlGGsI0PLQz5CrW36qepgWsG7uExChYhRpji3UqRtZum7cH/vdbZ nMqTsTLwMkfU7eqB6QIAZhzjzcr6MfIBsV9+Mh7lY8ajGfGMxL7VuLMDYbs0rNlH1Lgf iV6mh1620CTOr/soDFy1jiw2Cv+8sGb2vRKHlFpQOFPw708c41xBPfnbqLdjoqoACMXl o1UnaPMAjjFpYJ904SCgAI0gxQMgUnTxaQY+oMY7qneZ4HxyRULhAJKXlDXAWNrPcPJl 8/jQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679588290; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DxwPj9zc4wVERdaRm3UD6tbyjSI8uDBIlR3KTexXHNg=; b=II+BtD9HwrPqEKtPWHcLowh5hvxtnSkkeX56k98ckrqP6HcdQRDvUVNC34lTLE+ZWi Ndy8MUzQVyo3B0ISLyB1vDKz78y0sohogRWqhcMkMGearZ0RiyoFH1uvmKzc8HO7QAEa vQ4rby17YZL0onuiugGIlhCFd59FWBjM+3j4flb/ZflzzTmLNqw7PHDpdy2HEKlZF6ow HohACOlkJ3894H2zk/wHGCTSxdZCUEeSjOUkCuMemE3Fqh1RSKEQD17MHsrKI9vA/fZQ NZOLMD2EJM8WryR9myQApf+xbqmQlc9jiN7cPoI1QEjiLOgHU2Z1mWvgXTfUBGzLB5sZ OQ8g== X-Gm-Message-State: AO0yUKX76gip3/Q0Jy1TZlfFcPXFgjCqGqzIr7D3VOoAzfRoTVYZ1Tz6 z87NP+OC/ruPicNt5OVat2BUpCAIOXhiA+3XN+ANmA== X-Received: by 2002:a50:9b55:0:b0:4fc:473d:3308 with SMTP id a21-20020a509b55000000b004fc473d3308mr3392273edj.8.1679588289734; Thu, 23 Mar 2023 09:18:09 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-1-yosryahmed@google.com> <20230323040037.2389095-2-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 23 Mar 2023 09:17:33 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Shakeel Butt Cc: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 9:10=E2=80=AFAM Shakeel Butt = wrote: > > On Thu, Mar 23, 2023 at 8:46=E2=80=AFAM Shakeel Butt wrote: > > > > On Thu, Mar 23, 2023 at 8:43=E2=80=AFAM Yosry Ahmed wrote: > > > > > > On Thu, Mar 23, 2023 at 8:40=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > On Thu, Mar 23, 2023 at 6:36=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > [...] > > > > > > > > > > > > > > > 2. Are we really calling rstat flush in irq context? > > > > > > > > > > > > > > I think it is possible through the charge/uncharge path: > > > > > > > memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usag= e(). I > > > > > > > added the protection against flushing in an interrupt context= for > > > > > > > future callers as well, as it may cause a deadlock if we don'= t disable > > > > > > > interrupts when acquiring cgroup_rstat_lock. > > > > > > > > > > > > > > > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usage() = is only > > > > > > > > done for root memcg. Why is mem_cgroup_threshold() interest= ed in root > > > > > > > > memcg usage? Why not ignore root memcg in mem_cgroup_thresh= old() ? > > > > > > > > > > > > > > I am not sure, but the code looks like event notifications ma= y be set > > > > > > > up on root memcg, which is why we need to check thresholds. > > > > > > > > > > > > This is something we should deprecate as root memcg's usage is = ill defined. > > > > > > > > > > Right, but I think this would be orthogonal to this patch series. > > > > > > > > > > > > > I don't think we can make cgroup_rstat_lock a non-irq-disabling loc= k > > > > without either breaking a link between mem_cgroup_threshold and > > > > cgroup_rstat_lock or make mem_cgroup_threshold work without disabli= ng > > > > irqs. > > > > > > > > So, this patch can not be applied before either of those two tasks = are > > > > done (and we may find more such scenarios). > > > > > > > > > Could you elaborate why? > > > > > > My understanding is that with an in_task() check to make sure we only > > > acquire cgroup_rstat_lock from non-irq context it should be fine to > > > acquire cgroup_rstat_lock without disabling interrupts. > > > > From mem_cgroup_threshold() code path, cgroup_rstat_lock will be taken > > with irq disabled while other code paths will take cgroup_rstat_lock > > with irq enabled. This is a potential deadlock hazard unless > > cgroup_rstat_lock is always taken with irq disabled. > > Oh you are making sure it is not taken in the irq context through > should_skip_flush(). Hmm seems like a hack. Normally it is recommended > to actually remove all such users instead of silently > ignoring/bypassing the functionality. It is a workaround, we simply accept to read stale stats in irq context instead of the expensive flush operation. > > So, how about removing mem_cgroup_flush_stats() from > mem_cgroup_usage(). It will break the known chain which is taking > cgroup_rstat_lock with irq disabled and you can add > WARN_ON_ONCE(!in_task()). This changes the behavior in a more obvious way because: 1. The memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage() path is also exercised in a lot of paths outside irq context, this will change the behavior for any event thresholds on the root memcg. With proposed skipped flushing in irq context we only change the behavior in a small subset of cases. I think we can skip flushing in irq context for now, and separately deprecate threshold events for the root memcg. When that is done we can come back and remove should_skip_flush() and add a VM_BUG_ON or WARN_ON_ONCE instead. WDYT? 2. mem_cgroup_usage() is also used when reading usage from userspace. This should be an easy workaround though.