Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp5977513rwr; Tue, 9 May 2023 08:38:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5HYi8RjJ3KjeWptsAUt9sVq5medaCJ2OLmxMa7Wm2/FDA2APT9LeF1Mul6bLxULEgCCS/P X-Received: by 2002:a05:6a20:5488:b0:100:fc8d:feaf with SMTP id i8-20020a056a20548800b00100fc8dfeafmr6402899pzk.21.1683646699249; Tue, 09 May 2023 08:38:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683646699; cv=none; d=google.com; s=arc-20160816; b=0x6iTfd346bjQWC5pu7cYUKU521qtgyu5KQyzoAvMPD/B/U+GyMiCxuoXA40T00RFX pcEWJ+JSK1QsZIsiO72c+MghhZNqYT9YNcWMoIS2QoXu2KkmYnJD+h8zsdst1PxM7QOI eUwAMIavDG9p4OQlja4PhKo9Cue8ctD+6x5YLC9x6JyQ9S8zuypmBs8lhfGIW5PEGMHA 5AKU6VIjEWCrlfSe69cSjqphRFDfam8jsd+Q/s2FoVfn9VN1T6TMKhcsHujNqS82e9WC Rm9xjP0hjR6U5WSgfd7TBpLj5BtU0exSlzboXTACrRNqE7jzeipBmYiNtyIHb1YFyYpB iJWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=OqXfPxYQjJLVE7gb4nGjeMx0HHnbj1UjTzhypfxtIMQ=; b=lH9895G+e7m6r4FQKxmFfG10u+7n6EPrl9V5NG/gSJ97LMxH6MD7VnV97vohgnAGzd mcWRfucrKzkT+92asDZdXyEue5rKT98DB0xNjMDcMcMrSVimC8uai3l0e3SnUy7gbRZY pVPGfkWD4VGpDlmYN7ruZpEOyh22T0Prp2MAGtH/CUItqhAMdDizTvO4B9wTLOfYPk8i hKBjR/3SdZPMqlV7Pu0oas5Ap9KIgkDCTUYkMZIzmpnJ8twuanRHsSgFJEakAX+TBWbF PYUPQBQbvGf+T2/gI4dCdnvbbCOP3gxWv//oWX4N7eR4gis/xa+mk9pvXWSVXpAYDOYw Xs2Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m67-20020a633f46000000b0052f9d99941fsi1711052pga.400.2023.05.09.08.38.04; Tue, 09 May 2023 08:38:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235776AbjEIPB7 (ORCPT + 99 others); Tue, 9 May 2023 11:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229998AbjEIPB6 (ORCPT ); Tue, 9 May 2023 11:01:58 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3F759210E for ; Tue, 9 May 2023 08:01:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73D8AFEC; Tue, 9 May 2023 08:02:40 -0700 (PDT) Received: from bogus (e103737-lin.cambridge.arm.com [10.1.197.49]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 58CCF3F663; Tue, 9 May 2023 08:01:54 -0700 (PDT) Date: Tue, 9 May 2023 16:01:51 +0100 From: Sudeep Holla To: K Prateek Nayak , gregkh@linuxfoundation.org Cc: linux-kernel@vger.kernel.org, Sudeep Holla , rafael@kernel.org, yongxuan.wang@sifive.com, pierre.gondois@arm.com, vincent.chen@sifive.com, greentime.hu@sifive.com, yangyicong@huawei.com, prime.zeng@hisilicon.com, palmer@rivosinc.com, puwen@hygon.cn Subject: Re: [PATCH 0/2] drivers: base: cacheinfo: Fix shared_cpu_list inconsistency in event of CPU hotplug Message-ID: <20230509150151.kscbev7qrycz5cqy@bogus> References: <20230508084115.1157-1-kprateek.nayak@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230508084115.1157-1-kprateek.nayak@amd.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 08, 2023 at 02:11:13PM +0530, K Prateek Nayak wrote: > Since v6.3-rc1, the shared_cpu_list in per-cpu cacheinfo breaks in case > of hotplug activity on x86. This can be tracked back to two commits: > > o commit 198102c9103f ("cacheinfo: Fix shared_cpu_map to handle shared > caches at different levels") that matches cache instance IDs without > considering if the instance IDs belong to same cache level or not. > > o commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported through > sysfs") which skips calling populate_cache_leaves() if > last_level_cache_is_valid(cpu) returns true. populate_cache_leaves() > on x86 would have populated the shared_cpu_map when CPU comes online, > which is now skipped, and the alternate path has an early bailout > before setting the CPU in the shared_cpu_map is even attempted. > > On x86, populate_cache_leaves() also sets the > cpu_cacheinfo->cpu_map_populated flag when the cacheinfo is first > populated, the cache_shared_cpu_map_setup() in the driver is bypassed > when a thread comes back online during the hotplug activity. This leads > to the shared_cpu_list displaying abnormal values for the CPU that was > offlined and then onlined since the shared_cpu_maps are never > revaluated. > > Following is the output from a dual socket 3rd Generation AMD EPYC > processor (2 x 64C/128T) for cachinfo when offlining and then onlining > CPU8: > > o v6.3-rc5 with no changes: > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8 > > # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list > 136 > > # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list > 9-15,136-143 > > o v6.3-rc5 with commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported > through sysfs") reverted (Behavior consistent with v6.2): > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list > 8,136 > > # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list > 8-15,136-143 > > This is not only limited to AMD processors but affects Intel processors > too. Following is the output from same experiment on a dual socket Intel > Ice Lake server (2 x 32C/64T) running kernel v6.3-rc5: > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24, > 26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86, > 88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8 > > # cat /sys/devices/system/cpu/cpu72/cache/index0/shared_cpu_list > 72 > > # cat /sys/devices/system/cpu/cpu72/cache/index3/shared_cpu_list > 0,2,4,6,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64, > 66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118, > 120,122,124,126 > > This patch addresses two issues associated with building > shared_cpu_list: > > o Patch 1 fixes an ID matching issue that can lead to cacheinfo > associating CPUs from different cache levels in case IDs are not > unique across all the different cache levels. > > o Patch 2 clears the cpu_cacheinfo->cpu_map_populated flag when CPU goes > offline and is removed from the shared_cpu_map. > > Following are the results after applying the series on v6.3-rc5 on > respective x86 platforms: > > o 3rd Generation AMD EPYC Processor (2 x 64C/128T) > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list > 8,136 > > # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list > 8-15,136-143 > > o Intel Ice Lake Xeon (2 x 32C/128T) > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26, > 28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90, > 92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,72 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 0,2,4,6,8,10,12,14,16,18,20,22,24,26, > 28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90, > 92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126 > > # cat /sys/devices/system/cpu/cpu72/cache/index0/shared_cpu_list > 8,72 > > # cat /sys/devices/system/cpu/cpu72/cache/index3/shared_cpu_list > 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66, > 68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122, > 124,126 > > Running "grep -r 'cpu_map_populated' arch/" shows MIPS and loongarch too > set the cpu_cacheinfo->cpu_map_populated who might also be affected by > the changes in commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported > through sysfs") and this series. Changes from Patch 1 might also affect > RISC-V since Yong-Xuan Wang from SiFive last > made changes to cache_shared_cpu_map_setup() and > cache_shared_cpu_map_remove() in commit 198102c9103f ("cacheinfo: Fix > shared_cpu_map to handle shared caches at different levels"). I think they may be affected as well, it is just that it is not caught in the testing. Thanks for the detailed explanation and output logs. Not sure how much time it took you to write but saved lot of time by making it so simple to understand the exact issue. The changes look good. Reviewed-by: Sudeep Holla Hi Greg, Can you please pick up these fixes in your next cycle of fixes for v6.4 ? -- Regards, Sudeep