Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp8013148rwl; Thu, 23 Mar 2023 11:24:06 -0700 (PDT) X-Google-Smtp-Source: AK7set+T1rLNq5MgDWg/BcXaLTOqFfugdU/nWhK6gDXkcmzlDV4X5Ai14PdSJX8JK0L9kRfCV24J X-Received: by 2002:a05:6a00:309e:b0:625:5403:cd87 with SMTP id bh30-20020a056a00309e00b006255403cd87mr5404478pfb.11.1679595846021; Thu, 23 Mar 2023 11:24:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679595846; cv=none; d=google.com; s=arc-20160816; b=R/80OS1tfJVS6pQMlpjLavBG36xz8iHVcEG1LM/EPNlCbTxNytI+s3EuHv6N1r2iWo SxIAzQAnCGkTmwWmx7niMkPHjeYHl/TEsBBDjuLDWJOaEY+sPhTjU0We49CPpFc0M1ey df+88K9BKI7J0II8rn8bOX0OuSD7r8Ed2VUG6NDy4JPrqQ3sYkoZqfQHn63C3PijN9Jr W20SXt9Ak0GYHHMKzv5/vplqmT5RPOPte+Xr/HjQkkH2oK9z7rXywNMgw1e3Kk4Toomq vWOJ8+Or5BbJMdxQAyd/JW5UTttfINWKJObQ+i6qfMjpBa9cOU3dcULLmDJIupUR3mV9 k1Dw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=TG6r42DnDMDxeRAZYy//5B+CwRJPcnDTegQ6DfZfbA0=; b=PLSYUeNdn4Ro4q2Wlh3WkhSxpTawCQWUgHnhnEJ1T1Qq0lxmyv1xINU8r5hbAZg3KD IfZa5wW0HbvEqbyAbAuLBRL3uY1QiUEuECckAjgzicUDPwluDRKngB5E6ZUMNZe1+ipa M6C40CGco70cNtKiOF4f05FPjb9PNc7LmIaO0fezZi9vpSuhdWpA3dnz0eoWHfm+8FPw 7ug8c2xJAKtXPetUKIcZtA/n+o1UXflDZBJ57FwlK8MXtaoJ37JNyr/6OvoMd0iZcgLa fUtDlCBtVvKMQxBYVhG1dYbSgv2+L+wW3EeoQajx7oui3sa5lSUrtzKZnvBkQ4Ll7V8+ thZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lRkNETJE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o17-20020a639211000000b0050e60e3638csi16801860pgd.835.2023.03.23.11.23.34; Thu, 23 Mar 2023 11:24:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lRkNETJE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231278AbjCWSKM (ORCPT + 99 others); Thu, 23 Mar 2023 14:10:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231225AbjCWSKK (ORCPT ); Thu, 23 Mar 2023 14:10:10 -0400 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E70310412 for ; Thu, 23 Mar 2023 11:10:08 -0700 (PDT) Received: by mail-ed1-x530.google.com with SMTP id b20so57520570edd.1 for ; Thu, 23 Mar 2023 11:10:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1679595007; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TG6r42DnDMDxeRAZYy//5B+CwRJPcnDTegQ6DfZfbA0=; b=lRkNETJE7iTItx4Oqh+NEjrOnfGYNHB2E7RmMla6GzKkIpwBLF9JrTmMnnasXhid+k 4GkqmkFXqD/kFADNx1MsVLXYWkgn95RMlwtqftoBICvHYDRIbDVOLp63NhjUNEROXM26 HENlsAwE1ibTbvJpzLEdF2luec/dHuzpc1xzDRetDxf7oeI0mlmlGL6AnXKlSsiN0V9a fkI7cVYLirYeR0xKgmG0B3pMlHl7NRM2QgUtb8qKGaaoz5Ty1xY2/kTy+5DFPQzsh5PD TtsaTo4siyBM23lo1jlLwtoGkyyzIpFokkg4XZiGVdC26hvQ6MBgzGvdKltKogoOZmS+ lKeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679595007; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TG6r42DnDMDxeRAZYy//5B+CwRJPcnDTegQ6DfZfbA0=; b=yxED9UTL8j7vFusbOPrM0TkhIvGDFvHlXeMdXQFF3vS0uuM6FncmidGzYa0nZTa9yM w+6gqK9EuyxecGcMM2F52YcqBoLc1xpg8YUpkYUEeS+MtfmWJX4v9auUBcmr/Os5wNdJ fPjM55C/hrO1/Jkyht97nj+wEWJnOW3k3uiGivNUS3cV0yva55wuNWdeMApfcI7PylFJ +9C4JsV0KmyVHC/Z0XVOJ8kPMORLYr8pDetQ8puLuCrALyeCb/8ENtQFZLSrL3rEVNlt K3Kpsa3U/5/KUYY5HiD7N8Le2UajTg+Gx/cnuK8NXzNCfLC2q9+v6PvjteX2ppNHAsmt IQCg== X-Gm-Message-State: AO0yUKWVOA2VpNfLLPpMXVrEO6rzMd+1KYcV2BhqXvB8DmH+R+dBEqxR SPd9iF5oIcMJhc4tI50n3vJ542PmyDu2To2TYy3k2Q== X-Received: by 2002:a17:906:34cd:b0:8e5:411d:4d09 with SMTP id h13-20020a17090634cd00b008e5411d4d09mr5406251ejb.15.1679595006684; Thu, 23 Mar 2023 11:10:06 -0700 (PDT) MIME-Version: 1.0 References: <20230323040037.2389095-2-yosryahmed@google.com> <20230323173343.GF739026@cmpxchg.org> In-Reply-To: <20230323173343.GF739026@cmpxchg.org> From: Yosry Ahmed Date: Thu, 23 Mar 2023 11:09:30 -0700 Message-ID: Subject: Re: [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock To: Johannes Weiner Cc: Shakeel Butt , Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-15.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 23, 2023 at 10:33=E2=80=AFAM Johannes Weiner wrote: > > On Thu, Mar 23, 2023 at 09:17:33AM -0700, Yosry Ahmed wrote: > > On Thu, Mar 23, 2023 at 9:10=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Thu, Mar 23, 2023 at 8:46=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > On Thu, Mar 23, 2023 at 8:43=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > > On Thu, Mar 23, 2023 at 8:40=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > > > > > On Thu, Mar 23, 2023 at 6:36=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > 2. Are we really calling rstat flush in irq context? > > > > > > > > > > > > > > > > > > I think it is possible through the charge/uncharge path: > > > > > > > > > memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_= usage(). I > > > > > > > > > added the protection against flushing in an interrupt con= text for > > > > > > > > > future callers as well, as it may cause a deadlock if we = don't disable > > > > > > > > > interrupts when acquiring cgroup_rstat_lock. > > > > > > > > > > > > > > > > > > > 3. The mem_cgroup_flush_stats() call in mem_cgroup_usag= e() is only > > > > > > > > > > done for root memcg. Why is mem_cgroup_threshold() inte= rested in root > > > > > > > > > > memcg usage? Why not ignore root memcg in mem_cgroup_th= reshold() ? > > > > > > > > > > > > > > > > > > I am not sure, but the code looks like event notification= s may be set > > > > > > > > > up on root memcg, which is why we need to check threshold= s. > > > > > > > > > > > > > > > > This is something we should deprecate as root memcg's usage= is ill defined. > > > > > > > > > > > > > > Right, but I think this would be orthogonal to this patch ser= ies. > > > > > > > > > > > > > > > > > > > I don't think we can make cgroup_rstat_lock a non-irq-disabling= lock > > > > > > without either breaking a link between mem_cgroup_threshold and > > > > > > cgroup_rstat_lock or make mem_cgroup_threshold work without dis= abling > > > > > > irqs. > > > > > > > > > > > > So, this patch can not be applied before either of those two ta= sks are > > > > > > done (and we may find more such scenarios). > > > > > > > > > > > > > > > Could you elaborate why? > > > > > > > > > > My understanding is that with an in_task() check to make sure we = only > > > > > acquire cgroup_rstat_lock from non-irq context it should be fine = to > > > > > acquire cgroup_rstat_lock without disabling interrupts. > > > > > > > > From mem_cgroup_threshold() code path, cgroup_rstat_lock will be ta= ken > > > > with irq disabled while other code paths will take cgroup_rstat_loc= k > > > > with irq enabled. This is a potential deadlock hazard unless > > > > cgroup_rstat_lock is always taken with irq disabled. > > > > > > Oh you are making sure it is not taken in the irq context through > > > should_skip_flush(). Hmm seems like a hack. Normally it is recommende= d > > > to actually remove all such users instead of silently > > > ignoring/bypassing the functionality. > > +1 > > It shouldn't silently skip the requested operation, rather it > shouldn't be requested from an incompatible context. > > > > So, how about removing mem_cgroup_flush_stats() from > > > mem_cgroup_usage(). It will break the known chain which is taking > > > cgroup_rstat_lock with irq disabled and you can add > > > WARN_ON_ONCE(!in_task()). > > > > This changes the behavior in a more obvious way because: > > 1. The memcg_check_events()->mem_cgroup_threshold()->mem_cgroup_usage() > > path is also exercised in a lot of paths outside irq context, this > > will change the behavior for any event thresholds on the root memcg. > > With proposed skipped flushing in irq context we only change the > > behavior in a small subset of cases. > > Can you do > > /* Note: stale usage data when called from irq context!! */ > if (in_task()) > mem_cgroup_flush_stats() > > directly in the callsite? Maybe even include the whole callchain in > the comment that's currently broken and needs fixing/removing. Yeah, we can do that in mem_cgroup_usage(), which is the only context that I am aware of that may flush from irq context. We can also add WARN_ON_ONCE(!in_task()) in the rstat core flushing code to catch any other code paths that we are not aware of -- which may result in a deadlock, but hopefully if there is a violation it will be caught soon enough.