Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp245577rwb; Thu, 18 Aug 2022 03:09:08 -0700 (PDT) X-Google-Smtp-Source: AA6agR68+erVpSiOgkBP8XsY8s7Z9UxxLZsq9m/lp7YyWLLEZq2+2eavM0Z1XPewl/qYADm+543t X-Received: by 2002:aa7:d3ce:0:b0:446:82f:eda4 with SMTP id o14-20020aa7d3ce000000b00446082feda4mr1675091edr.327.1660817348427; Thu, 18 Aug 2022 03:09:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660817348; cv=none; d=google.com; s=arc-20160816; b=QT+SO4AFWRImgnAN0rrsMvALOvictTBhttUzdeB4mdPls0adGQSy+i8PRou4q6twoz IgYukZFzF9r0bGbFRTwB3m8I4TtQbgITZ2pMo5wPwy6FX3NfQDFF3iAk9yLkvYBr+y1l 6Ql7RJWW2KufuBgYE0QGETKC6kP3sZtkwXu4ikt4GoB5TYgAoO/aK4yjlHtGY6b0NMHw oWgQR0xc3B8tAVYDcYb6NaJu+cFnn9qctao6f1KJPuKTXTvJxcG4f9hts2mKd8RaTCem /RtgAQ32s2USlbUgbLxbixSnzPBFUag1vHfJ19n65u7NOGq9VD7WBLPtjj4cR+/PO35i ZXkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=yGB57PLr3/XCzM4eFzUWiwdrtPGGJcwk3YhcDibPx8M=; b=K+B8QC/NTnGGWUrA2emomjaCVjDQEaa7louG1QQ67StY4yx+bIppuB5pC21RsrTyWy UBl1QkIXP+dKCjLZOIKtJJVZCYHZtrIeySX2cEXQYMb74DCnN3C8eg/NZhzetGLHgOHl a6z8buh2AcGaCf6uXTM0oX+GY11I1aK5wZPU3hpv0Iy28KFUGFvgH7k/LBtAL5w8lpjH wPnhSFgXnbts4z7BijYDe56/8f+PabELiYW6CuEXOc6qLI2+c76PnV2SEt4b//qd9Tra vUwo+djNkk1HdJv0Mxt2gBSu8+DXBfpS/1Oyvc6MjBVAIb2hjBDbfJQKd9xh8iutboPd iFRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=gvEy9i+g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s26-20020a170906455a00b0072fa2e74874si746469ejq.412.2022.08.18.03.08.41; Thu, 18 Aug 2022 03:09:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=gvEy9i+g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243972AbiHRKEx (ORCPT + 99 others); Thu, 18 Aug 2022 06:04:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232910AbiHRKEu (ORCPT ); Thu, 18 Aug 2022 06:04:50 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F7CB474FC; Thu, 18 Aug 2022 03:04:49 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 31652353CA; Thu, 18 Aug 2022 10:04:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1660817088; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yGB57PLr3/XCzM4eFzUWiwdrtPGGJcwk3YhcDibPx8M=; b=gvEy9i+g+y8d7rIZ0U0o86DQsVZwmbcNxlRYSpGPZr0T18g6ijVFRl1wyUxyWXaJJFfxPx 9qoD12WHed4RQwOShceW3e3jpd7+bBtOWgXhiU9NWNAk7soVGUTTfqS40al6REZE5TWySi 1MximM3wxgOE7vjWSPav2gtm55geZkg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DA66E139B7; Thu, 18 Aug 2022 10:04:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Kkw+NL8O/mJLCgAAMHmgww (envelope-from ); Thu, 18 Aug 2022 10:04:47 +0000 Date: Thu, 18 Aug 2022 12:04:46 +0200 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Shakeel Butt Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , David Hildenbrand , Yosry Ahmed , Greg Thelen , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] Revert "memcg: cleanup racy sum avoidance code" Message-ID: <20220818100446.GA789@blackbody.suse.cz> References: <20220817172139.3141101-1-shakeelb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220817172139.3141101-1-shakeelb@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 17, 2022 at 05:21:39PM +0000, Shakeel Butt wrote: > $ grep "sock " /mnt/memory/job/memory.stat > sock 253952 > total_sock 18446744073708724224 > > Re-run after couple of seconds > > $ grep "sock " /mnt/memory/job/memory.stat > sock 253952 > total_sock 53248 > > For now we are only seeing this issue on large machines (256 CPUs) and > only with 'sock' stat. I think the networking stack increase the stat on > one cpu and decrease it on another cpu much more often. So, this > negative sock is due to rstat flusher flushing the stats on the CPU that > has seen the decrement of sock but missed the CPU that has increments. A > typical race condition. This theory adds up :-) (Provided the numbers.) > For easy stable backport, revert is the most simple solution. Sounds reasonable. > For long term solution, I am thinking of two directions. First is just > reduce the race window by optimizing the rstat flusher. Second is if > the reader sees a negative stat value, force flush and restart the > stat collection. Basically retry but limited. Or just stick with the revert since it already reduces the observed error by rounding to zero in simple way. (Or if the imprecision was worth extra storage, use two-stage flushing to accumulate (cpus x cgroups) and assign in two steps.) Thanks, Michal