Received: by 2002:ac8:760c:0:b0:40f:fb00:664b with SMTP id t12csp923376qtq; Thu, 14 Sep 2023 22:34:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHc0Htugj8WtxCA3NjgwEPVy8w4CrwWhMEb9BIMZy4aCAn71gMgMkcIo9P8YrO9cxyB20AU X-Received: by 2002:a05:6870:a117:b0:1bf:12ab:e16b with SMTP id m23-20020a056870a11700b001bf12abe16bmr882394oae.2.1694756064352; Thu, 14 Sep 2023 22:34:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694756064; cv=none; d=google.com; s=arc-20160816; b=vc6S7m6VGK/trn8nuoAe23tceWiQHoQCkeQBBEs81/0j+fBmBTftJFs/COa3x6exD0 jSg7lo4g7sjkhiMTL+vJ/f1HkQpbKL3g4hq9U3Egj/jPal1UekQ0CE7KlqBV2OKQy4IU C11EUzBUxiw+PSQwRnNjJLEnaasH3oMwuzZB9/tPmaDY9F0f023oKUEXpx0Glao8O6uQ 0jIzW8hTHgA50bO1p+3ff6y1YxejHXnMB3oEn4gdD3iArmychzjPc95cZLHdA188fWeH 3NGZE+VxXwqhQ1PGPhiOd59VAOIip+khwCMKhthnFyfsfqflLmHW8W8A1xLGmOvH7dPK GD3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=OBvN087+LWsLjnRA9Q/StMUiztB5piJc+GOWYwv7hrg=; fh=Z2NYwUhvm/iWGBe/vTEuhVBQeanfufGZUDqBigQZIgw=; b=AAufshofNJ9iGZPNvaTVlHZHw+i5No8mbn2MoTHyNabJHU3OVwFqqYgZHXjqW7Mq7b iF8NmMAuUZjZrvDhPpEHMCBUBckgdPpYYY7DiwrNLlNuDCXU3Zp6XVHM0pmzaiFL3ycK h+jQssLhyh8GIL8yz3C46YxCJXWV8wmID5Ti12K8r+IXRQpC3DhxeqmQ21xbLO4gGmOS I/3gyS6YM2Sz2ZrWn7lL633WrIa30oQgIeu0Z9MeMThocwKmsICBHR29x1mK1Y9nueCP g/mu3e2LCuniGGqWsfvMZovPODJ1AhYi7vR8+5g7P27NsY0VJjWa8FFTayBPPaoNKkYM xMYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=p3Jssecy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id e20-20020a656794000000b0057800024a67si2026222pgr.257.2023.09.14.22.34.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 22:34:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=p3Jssecy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id DD9FF85BE6FC; Thu, 14 Sep 2023 15:58:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229818AbjINW6w (ORCPT + 99 others); Thu, 14 Sep 2023 18:58:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229499AbjINW6v (ORCPT ); Thu, 14 Sep 2023 18:58:51 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E738C2707 for ; Thu, 14 Sep 2023 15:58:46 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-573d44762e4so1246004a12.0 for ; Thu, 14 Sep 2023 15:58:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694732326; x=1695337126; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OBvN087+LWsLjnRA9Q/StMUiztB5piJc+GOWYwv7hrg=; b=p3JssecyoEX9wuIVN0J7z8qmLVKXKzxNoq3mjO4yI95IflwOVTbu9mFZQjU0enJb/6 sbOSw3kaCV3XPKw8/AbII/ly2LbVcY+50VKFZACDajJlsvLWnMto0bk3BxRJ/rdWdqHy iQ1W76kU69HA6FJ5IVeMNTFO0vsl22k5zRCy6K7mw4HLeJpMc9UuDCvZun/wsAkVD+s9 ZkrrwT+BTSpcq9x7qn7ZSwbnxe0PIeHP281nueNpSMKRyzpkAsz7198rnLhoixgJxcDu nNSPPgNeC4jgHQ4MabFMPln+95ePlenHHRYtcmQsuajUyIrXmcQV+rFg82bgLZwko9xO it2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694732326; x=1695337126; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OBvN087+LWsLjnRA9Q/StMUiztB5piJc+GOWYwv7hrg=; b=usPfoMjfn0diifMpxL6MArfYUlt3S3XlmfkCRU3uebV3aeVYUFOx/4tlNlINQYItst LgznPTjQql6wbmiUTYexS0dFaMnOZePZzkBBalWHmYZa8FIDi6vpW9o6fLpHXcMQ+e4J e9+HUR+ST8niCKcdagucqWJMc9ArqoA1nRNKDe16yCwaOldzcjh2egYChS0gUNnM30ea 7NbyW51TVMF3CbLAIE74dQvlwieucBdfjF5msfE1NyC7s0QbRbzF+mZDHoH9w/1LtYAu tieLUYTAY8nwuyEMDqzX5xSZq3jByAhrtbO4OmG5dyjDjqQAUCSKccwjvvFjlbVG2LaZ qyqw== X-Gm-Message-State: AOJu0YxnMyZDc+b6+zizxn71mBn3GhwkwPsi9aSV/CAjoOvkAflQPzuG WT+r2ZgN+QoO2V7y7sc/MW7dUmr+e+2XfQ== X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e]) (user=shakeelb job=sendgmr) by 2002:a63:6d8a:0:b0:578:1b5a:6367 with SMTP id i132-20020a636d8a000000b005781b5a6367mr19710pgc.12.1694732326211; Thu, 14 Sep 2023 15:58:46 -0700 (PDT) Date: Thu, 14 Sep 2023 22:58:44 +0000 In-Reply-To: Mime-Version: 1.0 References: <20230913073846.1528938-1-yosryahmed@google.com> <20230913073846.1528938-4-yosryahmed@google.com> Message-ID: <20230914225844.woz7mke6vnmwijh7@google.com> Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy From: Shakeel Butt To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , "Michal =?utf-8?Q?Koutn=C3=BD?=" , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 14 Sep 2023 15:58:51 -0700 (PDT) On Thu, Sep 14, 2023 at 10:56:52AM -0700, Yosry Ahmed wrote: [...] > > > > 1. How much delayed/stale stats have you observed on real world workload? > > I am not really sure. We don't have a wide deployment of kernels with > rstat yet. These are problems observed in testing and/or concerns > expressed by our userspace team. > Why sleep(2) not good enough for the tests? > I am trying to solve this now because any problems that result from > this staleness will be very hard to debug and link back to stale > stats. > I think first you need to show if this (2 sec stale stats) is really a problem. > > > > 2. What is acceptable staleness in the stats for your use-case? > > Again, unfortunately I am not sure, but right now it can be O(seconds) > which is not acceptable as we have workloads querying the stats every > 1s (and sometimes more frequently). > It is 2 seconds in most cases and if it is higher, the system is already in bad shape. O(seconds) seems more dramatic. So, why 2 seconds staleness is not acceptable? Is 1 second acceptable? or 500 msec? Let's look at the use-cases below. > > > > 3. What is your use-case? > > A few use cases we have that may be affected by this: > - System overhead: calculations using memory.usage and some stats from > memory.stat. If one of them is fresh and the other one isn't we have > an inconsistent view of the system. > - Userspace OOM killing: We use some stats in memory.stat to gauge the > amount of memory that will be freed by killing a task as sometimes > memory.usage includes shared resources that wouldn't be freed anyway. > - Proactive reclaim: we read memory.stat in a proactive reclaim > feedback loop, stale stats may cause us to mistakenly think reclaim is > ineffective and prematurely stop. > I don't see why userspace OOM killing and proactive reclaim need subsecond accuracy. Please explain. Same for system overhead but I can see the complication of two different sources for stats. Can you provide the formula of system overhead? I am wondering why do you need to read stats from memory.stat files. Why not the memory.current of top level cgroups and /proc/meminfo be enough. Something like: Overhead = MemTotal - MemFree - SumOfTopCgroups(memory.current) > > > > I know I am going back on some of the previous agreements but this > > whole locking back and forth has made in question the original > > motivation. > > That's okay. Taking a step back, having flushing being indeterministic I would say atmost 2 second stale instead of indeterministic. > in this way is a time bomb in my opinion. Note that this also affects > in-kernel flushers like reclaim or dirty isolation Fix the in-kernel flushers separately. Also the problem Cloudflare is facing does not need to be tied with this.