Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp4024938rwb; Sat, 12 Nov 2022 19:33:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf4oN1Nh+qyVSPOMtOo063y15bEjKSJLoamIkqluYIpeJpWIKXrmDglZ94kpghI9+enVZFHg X-Received: by 2002:aa7:9405:0:b0:56d:e0b8:e552 with SMTP id x5-20020aa79405000000b0056de0b8e552mr8767114pfo.78.1668310390982; Sat, 12 Nov 2022 19:33:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668310390; cv=none; d=google.com; s=arc-20160816; b=lYYN8SevBeA4iyE33FV5EtuwBKl5kjv3/v46TM7xk7YKZysPo7wLenwnPAq373h1HA mhAFu+yl08KNCg3ZBkpBGzJ2b0jhIVCvdsYSOYxFcoL5mLiRIG3PiVD3OtC2Y1rlmsh3 b8P+cbjnswsNEdps02kaZYxbna/Mfpjz3JgTPt4DmnzryHkz90kxEd5B0ED2aTqayFDF M2B6MiolZoYHboXRVebUYBO1iSVGshLMBMbQBlpjRWWRvy8j2XXTuFLMT6fe36Gtru/F 5wn16PffzbIl8B7E6RDRHabD4q0HadtI+2Mzu7UufiCv09MVurlEXThkr9aPsK0AjVZ6 ZsQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:organization:message-id:date:subject:cc:to :from:dkim-signature; bh=xqO+FSTunp0g7KbHN4Wll5YTn0My9OtSzCECKXzSI0c=; b=nGJ2mulgG+/4Gw97euuQ7dww7Eo7WhnMAHJscOgcpUgDcACQtn/fO+btDDK2f2AMHl EyZuVvl2EWH0kz+0NzxfFg/ziXdcsFT40dl8UwSx7lr4rTmMlCz6kdjvkvxb1x2OAvWf gRrrAylJeUDq9vWdkzez9PAoMOwSUBYCoOia0ObF1WPMinzZewmoJD9TioWT6mdXy3qP GtcIR8gqOz2L5X/alP/PPD3aTV2js5QaAMq3rl2hznDLoanWUdnfXLJ9mXnxHTjz+BG9 cmwzhcKS0gbjyJ2IH9BK7Thui/KnNdffzrlpxqO4EW6bIdGsn0n58c7X2meiXGeA5mtM LHpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=V+XbQS5e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cp8-20020a056a00348800b00544e163575bsi5787296pfb.176.2022.11.12.19.32.53; Sat, 12 Nov 2022 19:33:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=V+XbQS5e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234973AbiKMCf3 (ORCPT + 89 others); Sat, 12 Nov 2022 21:35:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231972AbiKMCf1 (ORCPT ); Sat, 12 Nov 2022 21:35:27 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83F4FFD39; Sat, 12 Nov 2022 18:35:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668306925; x=1699842925; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gT7wBumMbUVlYyDmNw4XcR4Yfyh5B4WGurYATeDlOJU=; b=V+XbQS5eYtwDBuglgHhFY9IXZT8XaEAUOLxMxkknEqX45oUXLfSh8v8M 4yncRCogrIqnwOCrl8vDZsQROEgapzj/j37OfVXBHv+XzEhW3ne8Gm3d4 4PfNzATt2v6VRHJWLZAm1yW158lRAoqujNj20PBfe0jd0O7uhoqcRlWzn /A5RtH6Rvv+R6a5I86bXHUAAQDDr+EgqdHzeewQz2MnMCXwhQoE39A+ja Zd9UAns+sOxajfsazv3oXSqsoSWEkChg2fsu990dD7ya0qKcQmRR9Vrnm tng9lt/0EXF/MLTEnJTMeLs/sGpCO6ZWXkcohkFiCbzkcgMrqcngQ5+Eo w==; X-IronPort-AV: E=McAfee;i="6500,9779,10529"; a="309400071" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="309400071" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Nov 2022 18:35:25 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10529"; a="701589635" X-IronPort-AV: E=Sophos;i="5.96,161,1665471600"; d="scan'208";a="701589635" Received: from fkabir-mobl.amr.corp.intel.com (HELO tjmaciei-mobl5.localnet) ([10.255.228.60]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Nov 2022 18:35:24 -0800 From: Thiago Macieira To: Borislav Petkov , "Luck, Tony" Cc: "Joseph, Jithu" , "hdegoede@redhat.com" , "markgross@kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "gregkh@linuxfoundation.org" , "Raj, Ashok" , "linux-kernel@vger.kernel.org" , "platform-driver-x86@vger.kernel.org" , "patches@lists.linux.dev" , "Shankar, Ravi V" , "Jimenez Gonzalez, Athenas" , "Mehta, Sohil" Subject: Re: [PATCH v2 12/14] platform/x86/intel/ifs: Add current_batch sysfs entry Date: Sat, 12 Nov 2022 18:35:23 -0800 Message-ID: <2687702.9iZYToFQE1@tjmaciei-mobl5> Organization: Intel Corporation In-Reply-To: References: <20221021203413.1220137-1-jithu.joseph@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Saturday, 12 November 2022 15:32:47 PST Luck, Tony wrote: > > Because if this is going to be run during downtime, as Thiago says, then > > you can just as well use debugfs for this. And then there's no need to > > cast any API in stone and so on. >=20 > Did Thiago say =E2=80=9Cduring downtime=E2=80=9D? I think > he talked about some users opportunistic > use of scan tests. But that=E2=80=99s far from only > during downtime. We fully expect CSPs to > run these scans periodically on production > machines. Let me clarify. I did not mean full system downtime for maintenance, but I = did=20 mean that there's a gap in consumer workload, for both threads of one or mo= re=20 cores. As Tony said, it should have little observable effect on any other c= ore,=20 meaning an IFS run can be scheduled *as* any other workload (albeit a=20 privileged one) for a subset of the machine, while the rest of the system=20 remains in production. This allows them a lot of flexibility and is the rea= son=20 I am talking about containers, with the implied constraint that the=20 container's view of the filesystem is narrower than the kernel's. There'll be some coordination required to get all cores to have run all tes= ts,=20 but it should be doable over a period of time, and I'm thinking days, not=20 years. This should still be short enough to reveal if the system can detect= a=20 defect or wear-out before any real workload is impacted by it. If an issue is detected, the admin can decide whether to offline the core(s= )=20 reporting problems but keep the rest serving workloads and generating reven= ue,=20 or offline the entire machine for full maintenance and to run more invasive= and=20 time-consuming tests. =2D-=20 Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering