Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3057099rwe; Mon, 29 Aug 2022 05:15:19 -0700 (PDT) X-Google-Smtp-Source: AA6agR7QOhmqDuRaRW8IAZZOcIyc/xJscoJ2zW1Pv03hu8vLkYKA/pJiPu68j8yDDPxWVwHJdn8j X-Received: by 2002:a05:6402:320d:b0:448:7cc8:7901 with SMTP id g13-20020a056402320d00b004487cc87901mr3097449eda.423.1661775319314; Mon, 29 Aug 2022 05:15:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661775319; cv=none; d=google.com; s=arc-20160816; b=l+y2h0BkKSX/36wyh6ntV2mVaSR69yZm9A+BOtKFl7zw/yvGkqCHS1CUWd9J6K3IeZ TbvPX55Xi+TWVs3PP43reUtIk+hupRw5g8rq5JN2FMMGUsa7EE6fKrcPuyIvYXBz6OJt EB4cFhGhnTJ27SyOQKvS8+dMQVtELhkP+QNyGUnY+/IXck9R+vPDcUw9yEOej+HIvt+6 zzixUjOWvB50XJhegOc26t6b1k3K0jikcWyT43rel3IeP//YYhbPdfgLSyWgBd7M6ilj AkdBpPNCQnJloqV+1FS4KvvPuKGRxi6Li69Ue3DSVq9MDTBBpIb1jimuITpSj4NZLu6q z+1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=/t6PYdYKeMTXHi9l6vrOCdJ73AfPPZKJ9f1JIH/4los=; b=Avv67Rd5vPNQ7LpT8IInXqJtNCHv5kZQ/fveD30g8yh0DiPCej/Hb5lfKwEC8DETKR Xe1oIMy5jr4M24CB2wDUvEoh6Jxr+TKPS9We7h6vXrNzIo104zC07/0A4K31myo2LPRb 1BT0ZD6QIpNX4Rlx2PyPzgTO621pFV3uRJJ+VFJO76EcJvHmWqgyS4sXbOLPPyZ6bVMy GrC3Ol3+NV/54SbgQpVfRuQ4JyLAAczA2Cgdzdvo6pp31wHnhrjPIXJoMc4J8n21MOSm N/Ex3r+2tjekaVh2XOsEvv67cGEPUGYpOAZAG1UCPV4B+GihpJelFNZivBC5t35PhVK5 Wo4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=NfGZM1sI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o4-20020a170906974400b0073d617812b2si6880789ejy.267.2022.08.29.05.14.52; Mon, 29 Aug 2022 05:15:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=NfGZM1sI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230309AbiH2Ljo (ORCPT + 99 others); Mon, 29 Aug 2022 07:39:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233195AbiH2LjJ (ORCPT ); Mon, 29 Aug 2022 07:39:09 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B092A80F59; Mon, 29 Aug 2022 04:23:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id EAAFB611F4; Mon, 29 Aug 2022 11:11:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D325BC433C1; Mon, 29 Aug 2022 11:11:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1661771519; bh=r8J/os0CjXzRASNdBxOJtI4572iYgoGELcNEMDFNtNE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NfGZM1sIqIBnA/FKxjizzvFnnK4jbKYxL74vYx2sZdShp+e0WKv3u4CmgqYoHqrHu 2c98WOK/TSz/eCIUo1Nlspmbphjl0dS7MoPY5IGi0r+lVS11FN2phDZWd2SVhGAqIM pY6PmWdRWpMSTY5uIrwP2TBVsmkuUMLZO78UIgok= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Shakeel Butt , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , David Hildenbrand , Yosry Ahmed , Greg Thelen , Andrew Morton Subject: [PATCH 5.15 112/136] Revert "memcg: cleanup racy sum avoidance code" Date: Mon, 29 Aug 2022 12:59:39 +0200 Message-Id: <20220829105809.297702456@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220829105804.609007228@linuxfoundation.org> References: <20220829105804.609007228@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Shakeel Butt commit dbb16df6443c59e8a1ef21c2272fcf387d600ddf upstream. This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code") Signed-off-by: Shakeel Butt Cc: "Michal Koutný" Cc: Johannes Weiner Cc: Michal Hocko Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Yosry Ahmed Cc: Greg Thelen Cc: [5.15] Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/memcontrol.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -966,19 +966,30 @@ static inline void mod_memcg_state(struc static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { - return READ_ONCE(memcg->vmstats.state[idx]); + long x = READ_ONCE(memcg->vmstats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; + long x; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - return READ_ONCE(pn->lruvec_stats.state[idx]); + x = READ_ONCE(pn->lruvec_stats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,