Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1286777pxb; Fri, 21 Jan 2022 14:24:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJxeIoRWLk9uvNlJBtF14oGfQwC81QaVzgEgm63midAlcdQGmTC6cU3gXzFBh3binNBaQBF0 X-Received: by 2002:aa7:8887:0:b0:4c2:6ed0:fc00 with SMTP id z7-20020aa78887000000b004c26ed0fc00mr5212885pfe.65.1642803857146; Fri, 21 Jan 2022 14:24:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642803857; cv=none; d=google.com; s=arc-20160816; b=Uq8PXtlYZV//neljA1ja3zxuPLdPIHynI3SeZzJyqsNWsCMwz0VFxDPKy+Oq29q+9E YMkhxJOZ21BtTTHNoLf8Rvf8ky4QK/wDzn6VYm3h9e6c1nWYAbO670DpoKTYmlgqu2Mx oRUmiC5OGSsmFW8YMYgs6tfznJl7tBNp9+FM8Xwl/uyEffo7VRln6M4MGcj3oq3ZgwVE v/+/rypbkG3MhpZQWwlZd6swAVh5+CDgRs39Z4yYwyNzlwn+NoJkAgP1EqQv49y1l0DW 90vIsw5jx3JypEikDgJGvRcY9I01k4N9Bn91h2MIKkieZQ/b6VmFxEeuVoYSbU7D925h qYFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=Cw4dUrJKpP2irQPni6OuyfnYk0h0XN7EJB7gQjEiLh0=; b=lLS/HznzroqrhYBmYY2B46Hx6ecotEicVFI6AcqD2qpPqbBYTS5H//J2mDCbf+1zqa XC5GtH+v5hQEPWcfvoiqQa4IvbbtLZHI3uPWSKZgYOXDx+i6HLQg3s5BiMngHwB81Gwo 8aIVP2rWhaH0vODXy2RXHkiO3c7aa1hmo7saAQoTcc5gGeylFOAZn3sKp8HI+sSfPa8/ AklllIL2WA5yOdaEPggJeQeyZ0yl610gLM4zadqqAO/UTmGoI8U1JWXsoD658a/2c7D1 F+zuP4Z+auNJB0TLfq+SZ/XRpkRXy9D9HLyUp6KG3m5zoEh05Rx2euLlz2Kn7WMFSHuS 2KWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e35si7317407pgl.496.2022.01.21.14.24.04; Fri, 21 Jan 2022 14:24:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376970AbiATQYk convert rfc822-to-8bit (ORCPT + 99 others); Thu, 20 Jan 2022 11:24:40 -0500 Received: from mail-qt1-f173.google.com ([209.85.160.173]:38688 "EHLO mail-qt1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376968AbiATQYg (ORCPT ); Thu, 20 Jan 2022 11:24:36 -0500 Received: by mail-qt1-f173.google.com with SMTP id j16so2633949qtr.5; Thu, 20 Jan 2022 08:24:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=K4LYhqSWqSNuiVXiClJzV7bn+HnGpoOT567DraNrN9c=; b=jmf8w+gGCtZwIGjfWXWxkLO4N+cY1knFKloNzcN9tbk8pTgiU85M8kMW9jVPrRyLnO x0ko5QLqNSPoXedXolbTzhdmkfUM/b+2Bw89LcDKg3VGCnKCuyWX80nQWntIwncAiOWE BJDapPY5ghjAQDOk4pBEvWykapzIVu70x0DlgYfGQwqnLeg5jh/Q+wXvLZi4T2aPPWI/ ndk/D5vvmNY3q3s1/m5PJ/tmJgidgAcDTfHvYdKjd4HjguaaBlQYyFgsGubvrVsGCdXG VxGrVp+RcIkpAIfiVSvjMrReAGn2IjXkvIxJl53FtD25/eZBq+Jco8D+7dx6m5dGvoCB DUKg== X-Gm-Message-State: AOAM53281bAkz6NNtwqtun28edgSoBDck23hrraeWiPdMWyRWEjBGP+5 Su4A36AlD0J0uTM0u69xSRU4mHF88LrkKlUhL+o= X-Received: by 2002:ac8:578d:: with SMTP id v13mr14866971qta.472.1642695875395; Thu, 20 Jan 2022 08:24:35 -0800 (PST) MIME-Version: 1.0 References: <20220119204259.GA962224@bhelgaas> In-Reply-To: <20220119204259.GA962224@bhelgaas> From: "Rafael J. Wysocki" Date: Thu, 20 Jan 2022 17:24:24 +0100 Message-ID: Subject: Re: [PATCH v5] ACPI: Move sdei_init and ghes_init ahead to handle platform errors earlier To: Bjorn Helgaas Cc: Shuai Xue , "Rafael J. Wysocki" , Borislav Petkov , Tony Luck , James Morse , Len Brown , "Rafael J. Wysocki" , Bjorn Helgaas , luanshi , zhuo.song@linux.alibaba.com, Linux Kernel Mailing List , ACPI Devel Maling List , Linux ARM , Linux PCI Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 19, 2022 at 9:43 PM Bjorn Helgaas wrote: > > On Wed, Jan 19, 2022 at 02:40:11PM +0800, Shuai Xue wrote: > > [+to Rafael, question about HEST/GHES/SDEI init] > > > > Hi, Bjorn, > > > > Thank you for your comments and quick reply. > > > > 在 2022/1/19 AM6:49, Bjorn Helgaas 写道: > > > On Sun, Jan 16, 2022 at 04:43:10PM +0800, Shuai Xue wrote: > > >> On an ACPI system, ACPI is initialised very early from a > > >> subsys_initcall(), while SDEI is not ready until a > > >> subsys_initcall_sync(). This patch is to reduce the time before GHES > > >> initialization. > > >> > > >> The SDEI driver provides functions (e.g. apei_sdei_register_ghes(), > > >> apei_sdei_unregister_ghes()) to register or unregister event callback > > >> for dispatcher in firmware. When the GHES driver probing, it registers > > >> the corresponding callback according to the notification type specified > > >> by GHES. If the GHES notification type is SDEI, the GHES driver will > > >> call apei_sdei_register_ghes() to register event call. > > >> > > >> When the firmware emits an event, it migrates the handling of the event > > >> into the kernel at the registered entry-point __sdei_asm_handler. And > > >> finally, the kernel will call the registered event callback and return > > >> status_code to indicate the status of event handling. SDEI_EV_FAILED > > >> indicates that the kernel failed to handle the event. > > >> > > >> Consequently, when an error occurs during kernel booting, the kernel is > > >> unable to handle and report errors until the GHES driver is initialized > > >> by device_initcall(), in which the event callback is registered. For > > >> example, when the kernel booting, the console logs many times from > > >> firmware before GHES drivers init in our platform: > > >> > > >> Trip in MM PCIe RAS handle(Intr:910) > > >> Clean PE[1.1.1] ERR_STS:0x4000100 -> 0 INT_STS:F0000000 > > >> Find RP(98:1.0) > > >> --Walk dev(98:1.0) CE:0 UCE:4000 > > >> ... > > >> ERROR: sdei_dispatch_event(32a) ret:-1 > > >> --handler(910) end > > > > > > If I understand correctly, the firmware noticed an error, tried to > > > report it to the kernel, and is complaining because the kernel isn't > > > ready to handle it yet. And the reason for this patch is to reduce > > > these complaints from the firmware. > > > > My thoughts exactly :) > > > > > That doesn't seem like a very good reason for this patch. There is > > > *always* a window before the kernel is ready to handle events from the > > > firmware. > > > > Yes, there is always a window. But if we could do better in kernel that > > reduces the window by 90% (from 33 seconds to 3 second), why not? > > > > > Why is the firmware noticing these errors in the first place? If > > > you're seeing these complaints regularly, my guess is that either you > > > have some terrible hardware or (more likely) the firmware isn't > > > clearing some expected error condition correctly. For example, maybe > > > the Unsupported Request errors that happen while enumerating PCIe > > > devices are being reported. > > > > > > If you register the callback function, the kernel will now start > > > seeing these error reports. What happens then? Does the kernel log > > > the errors somewhere? Is that better than the current situation where > > > the firmware logs them? > > > > Yep, it is a hardware issue. The firmware only logs in console > > (ttyAMA0) and we can not see it in kernel side. After the kernel > > starts seeing these error reports, we could see EDAC/ghes and > > efi/cper detailed logs in dmesg. We did not notice the problem until > > we check the console log, which inspired us to reduce the window > > when kernel startup, so that we can see the message clearly and > > properly. I think the intuition is to check the log of dmesg, not > > the console. > > > > However, I DO think that: > > > > > > - Removing acpi_hest_init() from acpi_pci_root_init(), and > > > > > > - Converting ghes_init() and sdei_init() from initcalls to explicit > > > calls > > > > > > are very good reasons to do something like this patch because HEST is > > > not PCI-specific, and IMO, explicit calls are better than initcalls > > > because initcall ordering is implicit and not well-defined within a > > > level. > > > > Haha, if the above reasons still don't convince you, I would like to > > accept yours :) Should we do it in one patch or separate it into two > > patches? > > IMO, this can be done in one patch, but this would probably go via > Rafael. Yes, that would make sense IMO.