Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp455263rwd; Sat, 20 May 2023 00:22:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5cqZbf30t+31PisSzeLPhQvL/DTzLXdt5eZiv3bAvdWpkHQ6Jut6C9Fja5e9cj1YyfaYLC X-Received: by 2002:a17:90a:b88a:b0:253:6e6f:f5c7 with SMTP id o10-20020a17090ab88a00b002536e6ff5c7mr4666332pjr.2.1684567377597; Sat, 20 May 2023 00:22:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684567377; cv=none; d=google.com; s=arc-20160816; b=s330QlF3ZRvF3C3fcaKnMrdgRuZ+xNnKveCbIzJ+Um6g5MTdT8uE3t6K6DR1zsmlTJ 8K324Bcw4r1RWgCSEcJbeGop0c6u028HnCXlP4SQ1WRMFkii8OzSoGsNCzqc9OnMc3Z3 bjEf1R8PN9gtPrrY0uUzBet0/ERzFELjrMJSAkFCMDvLTmkJ5M+39YpjrpSxHXu+6xXC 6X3KmEnyq1OygLGwQK8mdtvjt56/iI31/H+tvXJsfzNZqfQvRBnnLH2f6rgJSPII6aj7 SY/cD7uBq//5SLw8KWOOeNxEo5mJnzlrbXBoRLIyIAj3GHypE/bDK+F4D9zoQtkPL7fk Z5CA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:to:subject :cc; bh=331BowRdRsibT2K1w5kJ/L6dXb/QMSUmoWX7xZG9sxg=; b=gf7ZRnlamQ7lCbPIqUPaQaFzqevWzrnU8Cbu2lECDFVDoVMRc2OZctr5RLtolkL6iZ eNfbPdkJE/pjNin81X4xGvo0kYtA2kPvsv+Q1vie6i35O5yzykFvhONfbqi/xYE9aH16 iD4mDY/LTmhzuwZnCLuwGMW5QrzHawjxZ+HU5GtFcOErKuqMsznDVnCBX9zaicJSMssG zLs7rsncLcrzJM3oNB3tXMqmXukLnF9g6TqfqajO93ZIMvtZSFrFprSlc9esUiR+9yJq hHYdiXFT10QLICOWe+TZMAPLlMUennFrXseNdeZsyuQcY9KL0jfi0nVbBlcXVdVjJNvc alcw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q189-20020a632ac6000000b00534748f9cd8si1071257pgq.644.2023.05.20.00.22.44; Sat, 20 May 2023 00:22:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229602AbjETG4f (ORCPT + 99 others); Sat, 20 May 2023 02:56:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229508AbjETG4e (ORCPT ); Sat, 20 May 2023 02:56:34 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D1E11AD for ; Fri, 19 May 2023 23:56:31 -0700 (PDT) Received: from canpemm500009.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4QNZBp4PkNzqSHN; Sat, 20 May 2023 14:52:02 +0800 (CST) Received: from [10.67.102.169] (10.67.102.169) by canpemm500009.china.huawei.com (7.192.105.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Sat, 20 May 2023 14:56:26 +0800 CC: , , , , , , , , , Subject: Re: [PATCH 2/2] drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug To: K Prateek Nayak , , References: <20230508084115.1157-1-kprateek.nayak@amd.com> <20230508084115.1157-3-kprateek.nayak@amd.com> From: Yicong Yang Message-ID: <32bd2ca8-30f1-204e-898f-fc93bbdd2e14@huawei.com> Date: Sat, 20 May 2023 14:56:26 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: <20230508084115.1157-3-kprateek.nayak@amd.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.102.169] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500009.china.huawei.com (7.192.105.203) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Prateek, On 2023/5/8 16:41, K Prateek Nayak wrote: > Until commit 5c2712387d48 ("cacheinfo: Fix LLC is not exported through > sysfs"), cacheinfo called populate_cache_leaves() for CPU coming online > which let the arch specific functions handle (at least on x86) > populating the shared_cpu_map. However, with the changes in the > aforementioned commit, populate_cache_leaves() is not called when a CPU > comes online as a result of hotplug since last_level_cache_is_valid() > returns true as the cacheinfo data is not discarded. The CPU coming Yes in free_cache_attributes() we only update the shared_cpu_map but make other attributes remained. From my feelings we should do all the work opposite to detect_cache_attributes(), including free the memory allocated. > online is not present in shared_cpu_map, however, it will not be added > since the cpu_cacheinfo->cpu_map_populated flag is set (it is set in > populate_cache_leaves() when cacheinfo is first populated for x86) > > This can lead to inconsistencies in the shared_cpu_map when an offlined > CPU comes online again. Example below depicts the inconsistency in the > shared_cpu_list in cacheinfo when CPU8 is offlined and onlined again on > a 3rd Generation EPYC processor: > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8 > > # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list > 136 > > # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list > 9-15,136-143 > > Clear the flag when the CPU is removed from shared_cpu_map when > cache_shared_cpu_map_remove() is called during CPU hotplug. This will > allow cache_shared_cpu_map_setup() to add the CPU coming back online in > the shared_cpu_map. Set the flag again when the shared_cpu_map is setup. > Following are results of performing the same test as described above with > the changes: > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # echo 0 > /sys/devices/system/cpu/cpu8/online > # echo 1 > /sys/devices/system/cpu/cpu8/online > > # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143 > > # cat /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list > 8,136 > > # cat /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list > 8-15,136-143 > > Fixes: 5c2712387d48 ("cacheinfo: Fix LLC is not exported through sysfs") It's ok for me to have this tag but I don't think this is the root cause, the commit happens to expose the problem. Other arthitectures like arm64 never updates the this_cpu_ci->cpu_map_populated even after the cpumap is populated. > Signed-off-by: K Prateek Nayak Thanks for fixing this! Reviewed-by: Yicong Yang > --- > drivers/base/cacheinfo.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c > index d1ae443fd7a0..cbae8be1fe52 100644 > --- a/drivers/base/cacheinfo.c > +++ b/drivers/base/cacheinfo.c > @@ -410,11 +410,14 @@ static int cache_shared_cpu_map_setup(unsigned int cpu) > coherency_max_size = this_leaf->coherency_line_size; > } > > + /* shared_cpu_map is now populated for the cpu */ > + this_cpu_ci->cpu_map_populated = true; > return 0; > } > > static void cache_shared_cpu_map_remove(unsigned int cpu) > { > + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); > struct cacheinfo *this_leaf, *sib_leaf; > unsigned int sibling, index, sib_index; > > @@ -447,6 +450,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu) > } > } > } > + > + /* cpu is no longer populated in the shared map */ > + this_cpu_ci->cpu_map_populated = false; > } > > static void free_cache_attributes(unsigned int cpu) >