Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp406002pxp; Wed, 16 Mar 2022 08:10:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbgypUygcNY/u0+3AHEMuPKQPY/LjxLvMi+Ft0II+wE/YiivCIWFBxsKOo4OMqPVT/xZQE X-Received: by 2002:a05:6870:338f:b0:d7:55dd:62a7 with SMTP id w15-20020a056870338f00b000d755dd62a7mr67667oae.157.1647443447616; Wed, 16 Mar 2022 08:10:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647443447; cv=none; d=google.com; s=arc-20160816; b=gTdmiVAriyuMi0SQx9dRAIcnF+/vu9aw9zbkkiXzqcomZvT1sZPLRt6l7LnaFwghRU M+g3P++tl4q+89XSYhS+M//R5avWa4fuPbKaqKgnjGvmueahwopNy4/dZdWV7Nd/wvJl nHE09jeYCTG+aE6NQnQiVo45TzP2F45fSM5frPzwSX+V41yZZ/tkyRDS+1ruPa3I8QV6 qcMV5itcg1gWTuhlchBmZ6ppBaWP8W4BW7oylsAJVeh+uPHYrjlT5+re8kBOt99wUQs9 YLzRFKfndT9U8Q1vv0QF0CpiFGGs3hCkQShEwva4IER3m4Fp06j7y7s0rfpKkmBJaq/m Y+Jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :dlp-version:dlp-reaction:dlp-product:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:dkim-signature; bh=zgXarXHfWSN5GI3J0O5v1fRH3CKla0bD9yQYu5ekZiA=; b=g4EI6wj4eI1QXQ5/EyOa+XAXbxJyP/jSNorCzxK0wjAcr+RU/d+TM7T3mOxa9ZTLi0 z7xN7OVnun2jNMZF5uK5Y0ouMziBCcRpAG6tGCs/BF0JKVRPaDeEss3qW6MB8DMbRkNH Np4aXmUARc3N2khc4yGuK+okcKrMtlYL3W9BV/ZvzDcCeu1IPCiiHcjQjBhiLh1dE4dT I4GCZQWHgISPaY54gONuu2wUXz5WGcpSoWoFvU9frhjPXkXLB24eEJiPDzQbfllxRCrZ 39RnXoC3gOXgnKxrMNgIM7B8t6w5CkDPFUAI6oCcyw42YerdB/konoK7cDSNuZX4JP9F 5Zdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8dcUBtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p5-20020a4aa845000000b00320ec2659e8si894150oom.23.2022.03.16.08.10.33; Wed, 16 Mar 2022 08:10:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U8dcUBtg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349934AbiCOQMZ (ORCPT + 99 others); Tue, 15 Mar 2022 12:12:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348623AbiCOQMX (ORCPT ); Tue, 15 Mar 2022 12:12:23 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79A1C56743; Tue, 15 Mar 2022 09:11:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647360671; x=1678896671; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=glOFaMsZQ3qIAkOB+ER/W8hoSjZuYOqUjaQuQNhbfuY=; b=U8dcUBtgGWu/TM8xZqUzpME1QZREue1vS21pkK568uO7/4n8tMud+Nju WwgpZVrmTSMeelfqa9hdmkbOUHbzvd7REXnEnA3NKqBOap3h+daK7b1Nv z18DLw7f715mE9yDZDDAYVHHTfWOx5qeDn0GrNCIhXbZmEb7jkAvCKceL mGnHHdktlAJQLaEzGpOvb7zx3mhedVjrRCrLAtdCkNRpIoNNGHqLL1Sap r6SgO1avJqT5SVx0oI+b/1Msx5Zt+oAoWqn2ZbxsfBiXpFy9dcfn8Ai+w a+/VoHRa+ol+ACDN0uJT5f8ShrBb9RgoI93hpDiQZCYE5WBNIrsdJsLbU g==; X-IronPort-AV: E=McAfee;i="6200,9189,10286"; a="236292957" X-IronPort-AV: E=Sophos;i="5.90,184,1643702400"; d="scan'208";a="236292957" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Mar 2022 09:11:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,184,1643702400"; d="scan'208";a="540498624" Received: from fmsmsx604.amr.corp.intel.com ([10.18.126.84]) by orsmga007.jf.intel.com with ESMTP; 15 Mar 2022 09:11:00 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx604.amr.corp.intel.com (10.18.126.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 15 Mar 2022 09:11:00 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Tue, 15 Mar 2022 09:10:59 -0700 Received: from fmsmsx610.amr.corp.intel.com ([10.18.126.90]) by fmsmsx610.amr.corp.intel.com ([10.18.126.90]) with mapi id 15.01.2308.021; Tue, 15 Mar 2022 09:10:59 -0700 From: "Luck, Tony" To: Greg KH CC: "Joseph, Jithu" , "hdegoede@redhat.com" , "markgross@kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "x86@kernel.org" , "hpa@zytor.com" , "corbet@lwn.net" , "andriy.shevchenko@linux.intel.com" , "Raj, Ashok" , "rostedt@goodmis.org" , "linux-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "platform-driver-x86@vger.kernel.org" , "patches@lists.linux.dev" , "Shankar, Ravi V" , "Williams, Dan J" Subject: RE: [RFC 00/10] Introduce In Field Scan driver Thread-Topic: [RFC 00/10] Introduce In Field Scan driver Thread-Index: AQHYLaZcmKq0UuRYGU6xp1k7P7vRw6yre5AAgAABJQCAFBlxgIABAjIAgAAEKTCAAH+egP//kvKw Date: Tue, 15 Mar 2022 16:10:59 +0000 Message-ID: References: <20220301195457.21152-1-jithu.joseph@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.401.20 x-originating-ip: [10.1.200.100] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Again, I have no idea what you are doing at all with this driver, nor > what you want to do with it. > > Start over please. TL;DR is that silicon ages and some things break that don't have parity/ECC= checks. So systems start behaving erratically. If you are lucky they crash. If you = are less lucky they give incorrect results. There's a paper (and even a movie 11 minutes) that describe the research by Google on this. https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.p= df =20 (https://www.youtube.com/watch?v=3DQMF3rqhjYuM) =20 > What is the hardware you have to support? Feature first available in Sapphire Rapids (Xeon: coming later this year) > What is the expectation from userspace with regards to using the > hardware? Expectation from users is that they can run these tests frequently (many ti= mes per day) to catch silicon that has developed faults quickly and take action= to isolate the cores that have issues. On HT enabled systems both threads that share a core need to be put into test mode together. The current version of tests takes around 50 milli-seco= nds (so for many workloads doesn't need much prep ... those with high sensitivi= ty to latency would need to do some additional userspace task binding to make sure those workloads were moved to another core while the h/w test runs). There are three outcomes from running a test: 1) The test passes all stages. 2) The test did not complete (for a variety of reasons, e.g. power states) 3) The test indicates failure. Recommendation is to run one more time in ca= se the failure was transient .. e.g. cause by a neutron/alpha strike. -Tony