Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp283265lqg; Fri, 1 Mar 2024 05:20:51 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXr7PUjdNv9g729VdSz9eTqs+KnG5CpQdTjrbdiQnMzL+vqs/1DKs6G5mV52hO5WhrxVqyTFTprb/R9DnlXTutBa/naGknoaiSxxtU3rg== X-Google-Smtp-Source: AGHT+IGASFjLmR3R22m28gDnrOYXYRnvSrLCTXh+WBjTn0ccuv5puZ04QJeWcYlvL94tzyFVSwKm X-Received: by 2002:a05:6808:318b:b0:3c1:cdd5:c67b with SMTP id cd11-20020a056808318b00b003c1cdd5c67bmr2342840oib.40.1709299250524; Fri, 01 Mar 2024 05:20:50 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709299250; cv=pass; d=google.com; s=arc-20160816; b=jh0ZT1saiqOs21st+gEjZ6w87+w6LjRAGI5UPfBuSwQtSBpUrXhIvx9Hlw/s33fKG4 gT1a0ykunHliBhMvS4C5jBhKKWS7q7qryjEJuZojdfRK5jWluZrdkRBYhOYLSQbsNniC ok2TYcGo39k+JSf1zd/tgjYTZoIVplXx4dMZkqhLg2kylIxr1ugXr0+PaHFe9zUbTOQj 61n4Hs0BSNR3yUy1jdgkeP1jWOvosc9A+s1YRvmMVvcUsfqzY+xL3KaTCNtx96cyeQDd O6o/wdodwIc8dUwLrcyeYvblvR4XMLJ8HJN43RBWdK2jhDVsf5seOlJDc5IigmAT5fJL /pQg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:organization:references :in-reply-to:message-id:subject:cc:to:from:date; bh=7pjIUWwzhm+Cn+mNRkw4CsLUn2QOEkJm0jjgQ02YB4c=; fh=Jwq4KPxF60xQQRMTl28Pws7s7gE62yG0+RjFiVWWhoE=; b=kyyiF7CRu2mjaWx+FjCijjAgF9JhsTt+Alz1g6/U4Cidoy3wOexrfQB1wlyR09O3gl P+enFf/XcZ+XpLB7JL/gKVnKFjEjFXBSqcz67PBvgs4zjKv6g1YIMec/JEThki0JbRMT UH3wye15bolZUjhmWnySJ/Q8jUdp5C5SSWNcQboX/dBfn+W1R2Gs+WXHjmGw7nNCfo22 IT6b3WCKuBrRi/viUgbgdZBXw2pTIJX/ja1tLcrfTKAULXBw2KU6aus7qSBFjrAIpTPv XxuXplZxPkxdmYvhwd5jle0YyLH0SOcnTySVZmGHPHVbJD6OOjBGclkPe1lSt0Ih86oA Owfg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-88460-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88460-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id i8-20020a63cd08000000b005dc8fe5ef05si3581305pgg.96.2024.03.01.05.20.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Mar 2024 05:20:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-88460-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-88460-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88460-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1B8A6284A73 for ; Fri, 1 Mar 2024 13:20:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4451D6F09A; Fri, 1 Mar 2024 13:19:42 +0000 (UTC) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39D2B10E9; Fri, 1 Mar 2024 13:19:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709299181; cv=none; b=OHECKY5GnCyV0HhrMy5LP5HBioaIVeZh5xYb7K2qWWyW7sPuPblxGzV/5mqb12b6smUScCgbAVZdnUseQ1LRTzP1NcKzMqSROW0EMxH1TawXGZ758T97sFGr5ly2F1bZahSQ/lUop0ZB5pknOSj9CdI76jGulV7XATSYJoLLlpI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709299181; c=relaxed/simple; bh=HGAOZ9rzwRj4KaggpswQDdIDQnkCzfQ+ozJfEoUzsGo=; h=Date:From:To:CC:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=K+CUGb3kbwNzlMx06J5JGKOfS3RKUBjVDdCVgk9j0ngx/v5Ydsoe7PrTprpznE0l3McPgwgJbEVOC7tGgyP6QmV5pLgdVmU6xa4wBi79pvQU54H1MTrCKsS0nbwp5seN4B41+wugdEk4fDrg4grgtCmhNlD30tM1Q3ewOmxZvJM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4TmT8S1fYbz6JB7j; Fri, 1 Mar 2024 21:14:48 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id 482BB140B73; Fri, 1 Mar 2024 21:19:33 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 1 Mar 2024 13:19:32 +0000 Date: Fri, 1 Mar 2024 13:19:31 +0000 From: Jonathan Cameron To: Tony Luck CC: Dan Williams , Shiju Jose , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "linux-mm@kvack.org" , "dave@stgolabs.net" , "dave.jiang@intel.com" , "alison.schofield@intel.com" , "vishal.l.verma@intel.com" , "ira.weiny@intel.com" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "david@redhat.com" , "Vilas.Sridharan@amd.com" , "leo.duran@amd.com" , "Yazen.Ghannam@amd.com" , "rientjes@google.com" , "jiaqiyan@google.com" , "Jon.Grimm@amd.com" , "dave.hansen@linux.intel.com" , "rafael@kernel.org" , "lenb@kernel.org" , "naoya.horiguchi@nec.com" , "james.morse@arm.com" , "jthoughton@google.com" , "somasundaram.a@hpe.com" , "erdemaktas@google.com" , "pgonda@google.com" , "duenwen@google.com" , "mike.malvestuto@intel.com" , "gthelen@google.com" , "wschwartz@amperecomputing.com" , "dferguson@amperecomputing.com" , tanxiaofei , "Zengtao (B)" , "kangkang.shen@futurewei.com" , wanghuiqiang , Linuxarm , John Groves Subject: Re: [RFC PATCH v6 00/12] cxl: Add support for CXL feature commands, CXL device patrol scrub control and DDR5 ECS control features Message-ID: <20240301131931.000070c7@Huawei.com> In-Reply-To: References: <20240215111455.1462-1-shiju.jose@huawei.com> <65d6936952764_1138c7294e@dwillia2-xfh.jf.intel.com.notmuch> <54c55412e9374e4e9cacf8410a5a98cb@huawei.com> <65d8f5201f8cc_2509b29467@dwillia2-mobl3.amr.corp.intel.com.notmuch> <20240226102944.000070a3@Huawei.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: lhrpeml500004.china.huawei.com (7.191.163.9) To lhrpeml500005.china.huawei.com (7.191.163.240) On Thu, 29 Feb 2024 12:41:53 -0800 Tony Luck wrote: > > Obviously can't talk about who was involved in this feature > > in it's definition, but I have strong confidence it will get implemented > > for reasons I can point at on a public list. > > a) There will be scrubbing on devices. > > b) It will need control (evidence for this is the BIOS controls mentioned below > > for equivalent main memory). > > c) Hotplug means that control must be done by OS driver (or via very fiddly > > pre hotplug hacks that I think we can all agree should not be necessary > > and aren't even an option on all platforms) > > d) No one likes custom solutions. > > This isn't a fancy feature with a high level of complexity which helps. Hi Tony, > > But how will users know what are appropriate scrubbing > parameters for these devices? > > Car analogy: Fuel injection systems on internal combustion engines > have tweakable controls. But no auto manufacturer wires them up to > a user accessible dashboad control. Good analogy - I believe performance tuning 3rd parties will change them for you. So the controls are used - be it not by every user. > > Back to computers: > > I'd expect the OEMs that produce memory devices to set appropriate > scrubbing rates based on their internal knowledge of the components > used in construction. Absolutely agree that they will set a default / baseline value, but reality is that 'everyone' (for the first few OEMs I googled) exposes tuning controls in their shipping BIOS menus to configure this because there are users who want to change it. I'd expect them to clamp the minimum scrub frequency to something that avoids them getting hardware returned on mass for reliability and the maximum at whatever ensures the perf is good enough that they sell hardware in the first place. I'd also expect a bios menu to allow cloud hosts etc to turn off exposing RAS2 or similar. > > What is the use case where some user would need to override these > parameters and scrub and a faster/slower rate than that set by the > manufacturer? Its a performance vs reliability trade off. If your larger scale architecture (many servers) requires a few nodes to be super stable you will pay pretty much any cost to keep them running. If a single node failure makes little or no difference, you'll happily crank this down (same with refresh) in order to save some power / get a small performance lift. Or if you care about latency tails, more than reliability you'll turn this off. For comedy value, some BIOS guides point out that leaving scrub on may affect performance benchmarking. Obviously not a good data point, but a hint at the sort of market that cares. Same market that buy cheaper RAM knowing they are going to have more system crashes. There is probably a description gap. That might be a paperwork question as part of system specification. What is relationship between scrub rate and error rate under particular styles of workload (because you get a free scrub whenever you access the memory)? The RAM dimms themselves could in theory provide inputs but the workload dependence makes this hard. Probably fallback on a a test and tune loop over very long runs. Single bit error rates used to detect when getting below a level people are happy with for instance. With the fancier units that can be supported, you can play more reliable memory games by scanning subsets of the memory more frequently. Though it was about a kernel daemon doing scrub, Jiaqi's RFC document here https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/ provided justification for on demand scrub - some interesting stuff in the bit on hardware patrol scrubbing. I see you commented on the thread and complexity of hardware solutions. - Cheap memory makes this all more important. - Need for configuration of how fast and when depending on system state. - Lack of flexibility of what is scanned (RAS2 provides some by association with NUMA node + option to request particular ranges, CXL provides per end point controls). There are some gaps on hardware scrubbers, but offloading this problem definitely attractive. So my understanding is there is demand to tune this but it won't be exposed on every system. Jonathan > > -Tony