Received: by 2002:a05:7412:6592:b0:d7:7d3a:4fe2 with SMTP id m18csp1533072rdg; Sat, 12 Aug 2023 05:27:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFltvwEUl4Cyjs5TznGXRcrzfcs+z+ca+cNduzYPDzqP0rmbqqSaaGtysm4DnUrRLkJ6AdW X-Received: by 2002:a05:6a21:819d:b0:132:833b:961 with SMTP id pd29-20020a056a21819d00b00132833b0961mr5589581pzb.36.1691843220940; Sat, 12 Aug 2023 05:27:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691843220; cv=none; d=google.com; s=arc-20160816; b=NY3oyKkyOc4JhKtPvNicZmvu0WPKyIY8UX3yeJsJuCTfs9NteQnzvZ/rO4Q10wOmke lLga4GvOlPi0/O3RaAftNKQHxorPX8zql3Y+oXH1HPn6t3gNNufDlXhNgfOPU1Cd6nl2 lTe8qbZ3K5yrUQr0j8fWcJt/9ZMqWiD1UZuENe9GVpcBlPt2ZjUYY58ClcpQanyIEgEO EmXGCKpK6UZMgp+fbXrI6rSOFe/gqxS12pZCUrTr5RYHkZZvrmPNtwm43CvfjnVAVqCU sXeB/hVnwerXkhdga86IvJfKts2SbiqFc3dIqoCozGYotR3c1LMN4rsnnZtNJtqKio+6 A0pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=NJxz5Wo5bQtCB2HrXnTdvzAAMp9CCbg8i6pks8wxkuI=; fh=I60NRc5NdACHjFUErV1S5KW9sN4JE6wTuyGRVsF0dSo=; b=JA5fKZACr9QdP2xoakwqML+uLAL/S1JsGQbyE7J29H4dGJlvWi+0aOSdqxu3oq5UIc azsBzJv9R9rBoRxAFzNxH272B6ldLlxPd5LyxMxjVH1wvQ3DsVlyVn17ihxZJiIaZSWZ TOHvbLfaIoM6wMKyAh1stZTjjyDcY+Tt9CLecUpE0dPsERCzheoMEqBsxThISc4dW5wI 1JJFZVwn7DlLmuYujq2hLYcXiHwA/iUQk+v7ykBdLoNb8lfhry6xLdA2jHythkcSkPsT mGssSFINh7vamIK/G61n/cp6UERbWWKzcyMgineSmLbrD+XaDwFOPS/SfEBYW1pDTITX nAzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=vW6Bzk3R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e32-20020a630f20000000b00563deb65f93si4751099pgl.200.2023.08.12.05.26.48; Sat, 12 Aug 2023 05:27:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=vW6Bzk3R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237155AbjHLLFT (ORCPT + 99 others); Sat, 12 Aug 2023 07:05:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237180AbjHLLFJ (ORCPT ); Sat, 12 Aug 2023 07:05:09 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 011E91BB for ; Sat, 12 Aug 2023 04:05:11 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-99cce6f7de2so387631466b.3 for ; Sat, 12 Aug 2023 04:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691838310; x=1692443110; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NJxz5Wo5bQtCB2HrXnTdvzAAMp9CCbg8i6pks8wxkuI=; b=vW6Bzk3RwWmtvH8JO/t1CTcYatZkRByvpzHoneTQf+aOWO2q6PtapSd13f+GAjvAB9 dEJtx1rlOAzoje9tcR+r+pWtS7GqPYv+uiN1BQrXS7fiF8u1cJskNVJasN37/yE29xaV 25z+J4ZFzTd7bVkED/r1019GQ/xvCBJVphitX47LbgN3UYkoo8VziKEpAjytsOe4vc5c OhzmBN30ubbXyg3yJpcUqdMmhBYwD18Iu9gOyt6DiYRZUg0ShkUFH32S2DM2E7G/l0Wh DTbG0KqyEseULb0XPtYqU+wYqMGsJiZwLIesBg2RUlPJ0TQFKta67UVzVEuN3ConbNBo t2dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691838310; x=1692443110; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NJxz5Wo5bQtCB2HrXnTdvzAAMp9CCbg8i6pks8wxkuI=; b=FgnXa/4xrimdw1UI+9avERw3AYrnOTU5dzv2B7TiOb8JA2OwEyeM4/AjTbH9Y4JXdf rZocM4PPkzT8VBtwVbqydy4h6/vh8KnWRSULdrCE9wHEMYUzZrrEElQSdGBsAfu+0M2R 6QQ/ZFSSFqj5hIXjbMTd/CMhGeLDSc4T8Bb++Zd5WuZ/B36rt8VBAm4nuVwgxGBNFZQF dl8fAMcDmQWR6IO4iY8ciVoqEzrKzGEhQbFWwv/aWBWcXgKUR2j2Q4IA3GEC0RS73K8B ftbelXmGh5KhptJnZb2bwfrkAsvnrrIWZiOswZHO2kZmTyFteYUz/QXIyofmL+N9YVfq bwDA== X-Gm-Message-State: AOJu0YyMwR+ePRKxKI0eDBdwjWhmlGB8KtPBz8HjvDNH1K4zhw0RqQPd cP4P+dHE+UMOKuBfw2M94bjgMhZA311N6v7eEtBEBw== X-Received: by 2002:a17:907:75d8:b0:99c:ae00:f869 with SMTP id jl24-20020a17090775d800b0099cae00f869mr3812595ejc.41.1691838310176; Sat, 12 Aug 2023 04:05:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Sat, 12 Aug 2023 04:04:32 -0700 Message-ID: Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads To: Michal Hocko Cc: Shakeel Butt , Johannes Weiner , Roman Gushchin , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Tejun Heo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Aug 12, 2023 at 1:35=E2=80=AFAM Michal Hocko wrot= e: > > On Fri 11-08-23 19:48:14, Shakeel Butt wrote: > > On Fri, Aug 11, 2023 at 7:36=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Fri, Aug 11, 2023 at 7:29=E2=80=AFPM Shakeel Butt wrote: > > > > > > > > On Fri, Aug 11, 2023 at 7:12=E2=80=AFPM Yosry Ahmed wrote: > > > > > > > > > [...] > > > > > > > > > > I am worried that writing to a stat for flushing then reading wil= l > > > > > increase the staleness window which we are trying to reduce here. > > > > > Would it be acceptable to add a separate interface to explicitly = read > > > > > flushed stats without having to write first? If the distinction > > > > > disappears in the future we can just short-circuit both interface= s. > > > > > > > > What is the acceptable staleness time window for your case? It is h= ard > > > > to imagine that a write+read will always be worse than just a read. > > > > Even the proposed patch can have an unintended and larger than > > > > expected staleness window due to some processing on > > > > return-to-userspace or some scheduling delay. > > > > > > Maybe I am worrying too much, we can just go for writing to > > > memory.stat for explicit stats refresh. > > > > > > Do we still want to go with the mutex approach Michal suggested for > > > do_flush_stats() to support either waiting for ongoing flushes > > > (mutex_lock) or skipping (mutex_trylock)? > > > > I would say keep that as a separate patch. > > Separate patches would be better but please make the mutex conversion > first. We really do not want to have any busy waiting depending on a > sleep exported to the userspace. That is just no-go. +tj@kernel.org That makes sense. Taking a step back though, and considering there have been other complaints about unified flushing causing expensive reads from memory.stat [1], I am wondering if we should tackle the fundamental problem. We have a single global rstat lock for flushing, which protects the global per-cgroup counters as far as I understand. A single lock means a lot of contention, which is why we implemented unified flushing on the memcg side in the first place, where we only let one flusher operate and everyone else skip, but that flusher needs to flush the entire tree. This can be unnecessarily expensive (see [1]), and to avoid how expensive it is we sacrifice accuracy (what this patch is about). I am exploring breaking down that lock into per-cgroup locks, where a flusher acquires locks in a top down fashion. This allows for some concurrency in flushing, and makes unified flushing unnecessary. If we retire unified flushing we fix both accuracy and expensive reads at the same time, while not sacrificing performance for concurrent in-kernel flushers. What do you think? I am prototyping something now and running some tests, it seems promising and simple-ish (unless I am missing a big correctness issue). [1] https://lore.kernel.org/lkml/CABWYdi3YNwtPDwwJWmCO-ER50iP7CfbXkCep5TKb-= 9QzY-a40A@mail.gmail.com/ > > Thanks! > -- > Michal Hocko > SUSE Labs