Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2212812rdh; Tue, 26 Sep 2023 16:47:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHMT9IuiZjU7Uve/BxitXUioaUqkR2qn6A8hJuvEP+Pmrxno7wulGs/u8QG8MobqJ2mnC3Y X-Received: by 2002:a05:6808:f09:b0:3ab:843f:76fd with SMTP id m9-20020a0568080f0900b003ab843f76fdmr483205oiw.19.1695772075406; Tue, 26 Sep 2023 16:47:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695772075; cv=none; d=google.com; s=arc-20160816; b=Wgy4flmNHgaDkWAvO86eIQB2aPTtEanRZmhzDBS+IWyN5InbbC+MBs74Wbo7pyBvoL vKAzLbRooF9gST3dYOPcg1QQVmxnWnV2XP9rphUXX24WpZaNUfN3keTLfi+TkhVQYyGK tqvKN6T9uZuenDI2JSEoZV0e+JSZOfx1VxUNSf3Od/TMY992z6w7KZTV+nB6sd083hw5 qzQVzYo303rQwHIv3/g6UKTtMsQM7j6y8mh5U9YlO9hOJXf4S758lG87Cze8GlxD12fQ 8UHTbkL4FAY/c3aIG8I5ic0iVwgAL0ibmVON8fLUdqx+NKH3xmbZkcxB70ajAae68dUe feHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :message-id:subject:cc:to:from:date:dkim-signature; bh=kWRGWgMqY/fID+P+hQHOOgVF78hxhHHhMVZm2lExdzA=; fh=lc0/BtbyqfwkoKxpc38XB7yXndy7tFXFK+xZWyK4+yo=; b=MOHgB02WYi5PXGUAdomOl25+LYM2Pwkc/v3vvp+MYhMPdI/yrq8W7z+FxK5l1ijaRm zmHMF4RuKULVCQo66RPRyuxZbhKxSRgwCl3g9fV24k/d4pZFBBnZkPfeLqXijWWnp6ol dWd7L5RQvbV34pxqqSZRS+OdHsxVTuSEcmX1bqEPfEDegJm7x7UcjzAyQ+qFd3TV8k8e bw8FU1zADonTFXL3NcDf3fMmXOJvZO67wITPf0T5/SycemGuIm5GzPVxj2cIn3fhjE/c rIPQ2D/sDi9Wt4lafQ/g5+fqI12rSGeVoNYWdodB5RmFvzxdwOmcTRCSRizY/dGt7vrk qYhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EAvZ3vNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id 19-20020a631753000000b00565eb0cf702si13828195pgx.310.2023.09.26.16.47.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 16:47:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EAvZ3vNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id BA50C803388F; Tue, 26 Sep 2023 16:46:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233762AbjIZXqB (ORCPT + 99 others); Tue, 26 Sep 2023 19:46:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234035AbjIZXoA (ORCPT ); Tue, 26 Sep 2023 19:44:00 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB8C79008; Tue, 26 Sep 2023 16:02:49 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CD1EC433C7; Tue, 26 Sep 2023 23:02:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1695769369; bh=je6g4sK4peKsh1qGI9bWWT/XBfOKZh3dIb4/VXqVR94=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=EAvZ3vNy2vbk5wk/qfuZFLTEzYgxPC0IYYVZYNR1ywjJCc/PknMAGU9l9YgkeUC0D IuCC0cbREtDSWvQO/d+8xxXF+rvW+7sIJh4gkPkycF0XSKWoHqOd8kzk10XbDgzylE y6loJllN6SBw6Lqj3JNsIraIOGu+Q1prRZTdWYcFyEXnYtiHcGW/Nbdv9wpENKYprW 3Zt76Rn69EOBuO3S/HiKtmJg9qR2z4o6HsIVeHe3BhsCs8hvQWo1SEXbnScSnTsI0w bLtk+MvKaBJqKNx4vERctWe0Lk/crpGPtb3oigWZknhckvTcDY3fmVO5ShaLZpfJxG 5z1iuIuI0PrsA== Date: Tue, 26 Sep 2023 18:02:47 -0500 From: Bjorn Helgaas To: Shuai Xue Cc: "Rafael J. Wysocki" , "tanxiaofei@huawei.com" , "wangkefeng.wang@huawei.com" , Miaohe Lin , gregkh@linuxfoundation.org, Jonathan Cameron , mahesh@linux.ibm.com, "linux-kernel@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "bp@alien8.de" , Baolin Wang , Linux PCI , bhelgaas@google.com, "james.morse@arm.com" , "linuxppc-dev@lists.ozlabs.org" , "lenb@kernel.org" Subject: Re: Questions: Should kernel panic when PCIe fatal error occurs? Message-ID: <20230926230247.GA429368@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Tue, 26 Sep 2023 16:46:09 -0700 (PDT) On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote: > ... > Actually, this is a question from my colleague from firmware team. > The original question is that: > > "Should I set CPER_SEV_FATAL for Generic Error Status Block when a > PCIe fatal error is detected? If set, kernel will always panic. > Otherwise, kernel will always not panic." > > So I pull a question about desired behavior of Linux kernel first :) > From the perspective of the kernel, CPER_SEV_FATAL for Generic Error > Status Block is not reasonable. The kernel will attempt to recover > Fatal errors, although recovery may fail. I don't know the semantics of CPER_SEV_FATAL or why it's there. With CPER, we have *two* error severities: a "native" one defined by the PCIe spec and another defined by the platform via CPER. I speculate that the reason for the CPER severity could be to provide a severity for error sources that don't have a "native" severity like AER does, or for the vendor to force the OS to restart (for CPER_SEV_FATAL, anyway) in cases where it might not otherwise. In the native case, we only have the PCIe severity and don't have the CPER severity at all, and I suspect that unless there's uncontained data corruption, we would rather handle even the most severe PCIe fatal error by disabling the specific device(s) instead of panicking and restarting the whole machine. So for PCIe errors, I'm not sure setting CPER_SEV_FATAL is beneficial unless the platform wants to force the OS to panic, e.g., maybe the platform knows about data corruption and/or the vendor wants the OS to panic as part of a reliability story. Presumably the platform has already logged the error, and I assume the platform *could* restart without even returning to the OS, but maybe it wants the OS to do a crashdump or shutdown in a more orderly way. Bjorn