Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp50380pxp; Thu, 10 Mar 2022 22:14:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJzpaaC4v/IBLwFtEPKLdnSSxn4zLLU6iokDZn9LTernkC62xdTcukQcVupw8IZ1JQ4ausS/ X-Received: by 2002:a63:4d60:0:b0:36c:8803:b92d with SMTP id n32-20020a634d60000000b0036c8803b92dmr7123790pgl.179.1646979272427; Thu, 10 Mar 2022 22:14:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646979272; cv=none; d=google.com; s=arc-20160816; b=ji9FWE1v7HxPuiV+Wc+p7t2+PcaDnLe8yeDp8sZrW3wfJtXCrAqA1nDPzzaM9KZJPb X5R3GcM52/gH2NvYRIz6dLsxEeIBaX1xRNgJVUIwpMSy4In87p0lkhsHP11paeX2X0Mx 6A1wy0wmT0LTezJosS+3F5ok0TnNfSf6WJfW9Za/doQv1T7//0yPWp4MUzfwf4toaOQZ Dw/4JlOKEbfDjQhzWI8pilC4UdMd16DszxKUIom/lq+BVyf4HD+D140AZO7GTJZBk8WL gsJc9rKirVnTUjTysPEALAaC0CyM5Xf5zvnfCOHX6SPWoh5x8LWexNxdbVHkaHNSQWPv 1KVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=RcyX4nd1k79+Dkqln9btNUIRO4d+sHZ7JEWxH/Ai+Z8=; b=AAeHZmwio5jKQt4Wz5y1vluHvZVA7TSuZHNFsaVjVXrYa7hvfZPyyDROi95pchlv6n 5trTZO7ymg68N9eHefu9YF5Lc//LWjPjKwUt0vXRcSHp9gPCNNeS88TTouVw4cZloMpH tD/swvzZt/Q7mtIuttzouh1c+9AupsZIx7GpiCdNYVt6KWUAb65E5lFCCLpdiVwqrlYz rdZqTkG8EF7jdQLIpYyeFPYJncv2QkccAqP/f/eWGcCW4bckSVSULibp0K66PZENfvbV CgeWopLk0R12yBjdr0nfTkH3q9xnvTs7r5GlSav+giDmOXE7ozs9namJRR0wUwBJOpad Yc5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu5I3kLU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l2-20020a170902f68200b0015195a54222si7496006plg.30.2022.03.10.22.14.18; Thu, 10 Mar 2022 22:14:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hu5I3kLU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234860AbiCJVnJ (ORCPT + 99 others); Thu, 10 Mar 2022 16:43:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230387AbiCJVnH (ORCPT ); Thu, 10 Mar 2022 16:43:07 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A694192CB8; Thu, 10 Mar 2022 13:42:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1646948526; x=1678484526; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=Wu9xQh13qOE2SQCUN+y0wm4du69ScDsFeOiJ0FkDQCg=; b=hu5I3kLUlNxdMCRcYV9cnVBayhOR06Nxg62cf/odlqFUTxua12rU2iHE ah5uJko2e+wzA+HWmz9tQW1lrTX/Od/Xenm9Y8EgQKMlo8dnSn6WIa4MT E1oeemwQp1Nku5xIknUrljMj9R3QmMzEUsMiACbXahJ60cPKYM7CjC4Ka 7pZm/iLK/5BaHMCu8GeTuXCmk6ra4Tmxj+GwtEusQLyDnbFh8X9aHISK+ L68RMiwhAOksG2tldnN0aezPmzoRG9Zro5FH/JZM22HQRoaJSaId+qXwx 7w/m9hiEOfdziZVtKeOXvPiWVBx6lL75tVteYUTPrf4HsWVtH19i4vTII w==; X-IronPort-AV: E=McAfee;i="6200,9189,10282"; a="235338638" X-IronPort-AV: E=Sophos;i="5.90,171,1643702400"; d="scan'208";a="235338638" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2022 13:42:05 -0800 X-IronPort-AV: E=Sophos;i="5.90,171,1643702400"; d="scan'208";a="596833491" Received: from dhrumil1-mobl1.amr.corp.intel.com (HELO [10.209.77.231]) ([10.209.77.231]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2022 13:42:03 -0800 Message-ID: <4062bb5c-1e9c-5e1f-5b27-2a4a8fb58078@intel.com> Date: Thu, 10 Mar 2022 13:42:01 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Subject: Re: [RFC 07/10] platform/x86/intel/ifs: Create kthreads for online cpus for scan test Content-Language: en-US To: "Luck, Tony" , "Williams, Dan J" , "Joseph, Jithu" Cc: "hdegoede@redhat.com" , "markgross@kernel.org" , "corbet@lwn.net" , "Raj, Ashok" , "dave.hansen@linux.intel.com" , "patches@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "rostedt@goodmis.org" , "Shankar, Ravi V" , "tglx@linutronix.de" , "platform-driver-x86@vger.kernel.org" , "linux-doc@vger.kernel.org" , "hpa@zytor.com" , "bp@alien8.de" , "gregkh@linuxfoundation.org" , "andriy.shevchenko@linux.intel.com" , "x86@kernel.org" References: <20220301195457.21152-1-jithu.joseph@intel.com> <20220301195457.21152-8-jithu.joseph@intel.com> <09b5b05018a8600ca8fab896790ab16827c80e4e.camel@intel.com> <1503c7940a7149679025173a46dd0daf@intel.com> From: "Kok, Auke" In-Reply-To: <1503c7940a7149679025173a46dd0daf@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/7/22 09:46, Luck, Tony wrote: >>> These are software(driver) defined error codes. Rest of the error codes are supplied by >>> the hardware. Software defined error codes were kept at the other end to provide ample space >>> in case (future) hardware decides to provide extend error codes. >> Why put them in the same number space? Separate software results from >> the raw hardware results and have a separate mechanism to convey each. > We wanted to include in the "details" file, which is otherwise a direct copy of > the SCAN_STATUS MSR. Making sure the software error codes didn't overlap > with any h/w generated codes seemed like a good idea. > > But maybe we should have done this with additional string values in the status > file: > > Current: > > pass > untested > fail > > Add a couple of new options for the s/w cases: > > sw_timeout > sw_retries_exceeded We've made a userspace implementation for this API already as part of opendcdiag that uses it: https://github.com/opendcdiag/opendcdiag/commit/0cbfcee30e0666b0f79a2e452d7f8167d2a0cb90 What I really like is that with this proposed API, we can unambiguously determine whether "the core failed" or "everything is fine, for now" by reading a single file. I hate to see this file become unusable because its content changes from "pass" to "sw_timeout" or, even worse, it changes from "fail" to "sw_timeout". That would render it useless for the purpose that I think our users will be looking at it. So, my preference would be to keep this file functioning as-is in this patch series. I would think that some sort of expandable "statistics" file would be a better way to output various metrics: ``` sw_timeout: 0 sw_retries_exceeded: 2 runs: 42 first_run: 1405529347 last_run: 1646948140 ``` just as a suggested alternative for more/incompatble output values or a complex, dynamic format. I don't have any use in opendcdiag for these values and data. If someone does, they should want to chime in perhaps. Auke