Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp439843pxy; Wed, 21 Apr 2021 06:43:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAuOHpSaxp6Q7/xP9UNd7aHCiSkJ0NLG5cFTLirQOBM67tRIM32HZE21F0HvEzJ7Jpd9Dv X-Received: by 2002:a63:5b0e:: with SMTP id p14mr21863602pgb.110.1619012610927; Wed, 21 Apr 2021 06:43:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619012610; cv=none; d=google.com; s=arc-20160816; b=c5po6HT1asHG1Q3NWCRuAKmIExlnsZcI4/ELYkp8uyxJ/3Bslfv4klnJaoxbj8F+ms Jt4mRDJP2l1hvxOCJ4jWchspPrccGTu/0CnI/aiZI+bpwwWYurWx/oJ//8WydLrI8I7q MbavsB0IRzRmeL+/Op5F8YBFE51KRcHXa4TI8a87rYFRAGW6GCRdWfVOqYYJQjLeKodM G3IULpTbx32QzPXCzj3MR5Isn8hPa6g6ZbSZaZz6F8vjPigw0R/FYVoityLfsWr7fJjx ozoXNt1Q5kH364FC3rnr2qXQXRzadbaggM/PGIxmiZxBPk7OB6OTChY6opp34oe+pWns AE7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date; bh=WwZ7+HHOAIRxtNl8UPG/nU6S4xUNaPQGgIhyfkjNWTY=; b=LTtiYLG599OK57oXCd1IkPRSyGD8M5CIX/af4uT0in/QChDdfJLTOmVIPNHcF4UINQ YRCRa7DxrT8pizTTpInRnnPjjY0qTzsksoHKdb204TL1F9yGFzHNI04bmM8n25+hwhdf ecpy7Oru43WmB2TdUQzfOiwDcuk39iUaCIhFBlM/dadVAN2FR5uV/2wANMFru5ZHhMUq IcDL+IVq3uBLqexJY7AEIMGZ2KTudf9mpaZT7nEcFzjAydw9dZN+9iPXjg4vJV+dLmX0 sPYWzE2FoubAzqJDfAC2eGNcoOXJ04PDIbzuVJ3yj6YwP7Gjndqa4vyyzlEN/ENFmidc BFCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q12si2558677pgm.350.2021.04.21.06.43.19; Wed, 21 Apr 2021 06:43:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236745AbhDUK7R (ORCPT + 99 others); Wed, 21 Apr 2021 06:59:17 -0400 Received: from mail.kernel.org ([198.145.29.99]:52484 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbhDUK7Q (ORCPT ); Wed, 21 Apr 2021 06:59:16 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 89A766144D; Wed, 21 Apr 2021 10:58:43 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lZAZ7-008fKb-Ed; Wed, 21 Apr 2021 11:58:41 +0100 Date: Wed, 21 Apr 2021 11:58:40 +0100 Message-ID: <8735vjrjj3.wl-maz@kernel.org> From: Marc Zyngier To: dann frazier Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Sumit Garg , kernel-team@android.com, Russell King , Catalin Marinas , Thomas Gleixner , Will Deacon Subject: Re: [PATCH 08/11] irqchip/gic: Configure SGIs as standard interrupts In-Reply-To: References: <20200519161755.209565-1-maz@kernel.org> <20200519161755.209565-9-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: dann.frazier@canonical.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, sumit.garg@linaro.org, kernel-team@android.com, linux@arm.linux.org.uk, catalin.marinas@arm.com, tglx@linutronix.de, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dan,n On Tue, 20 Apr 2021 22:25:51 +0100, dann frazier wrote: > > On Tue, Apr 20, 2021 at 02:37:10PM -0600, dann frazier wrote: > > On Tue, May 19, 2020 at 05:17:52PM +0100, Marc Zyngier wrote: > > > Change the way we deal with GIC SGIs by turning them into proper > > > IRQs, and calling into the arch code to register the interrupt range > > > instead of a callback. > > > > > > Signed-off-by: Marc Zyngier > > > > hey Marc, > > > > I bisected a boot failure on our Gigabyte R120-T33 systems (ThunderX > > CN88XX) down to this commit, but only when running in ACPI mode. See below: > > > > > > EFI stub: Booting Linux Kernel... > > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled > > EFI stub: Using DTB from configuration table > > EFI stub: Exiting boot services and installing virtual address map... > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a11] > > [ 0.000000] Linux version 5.11.0-13-generic (buildd@bos02-arm64-067) (gcc (Ubuntu 10.2.1-23ubuntu1) 10.2.1 20210312, GNU ld (GNU Binutils for Ubuntu) 2.36.1) #14-Ubuntu SMP Fri Mar 19 16:57:35 UTC 2021 (Ubuntu 5.11.0-13.14-generic 5.11.7) > > Sorry, realized I posted a log from an Ubuntu kernel. Here's an > upstream one: [...] > > [ 7.842174] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243) > [ 7.849699] io scheduler mq-deadline registered > [ 7.857591] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 > [ 7.865127] efifb: probing for efifb > [ 7.868738] efifb: No BGRT, not showing boot graphics > [ 7.873783] efifb: framebuffer at 0x881010000000, using 3072k, total 3072k > [ 7.880649] efifb: mode is 1024x768x32, linelength=4096, pages=1 > [ 7.886647] efifb: scrolling: redraw > [ 7.890212] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 > [ 7.895905] fbcon: Deferring console take-over > [ 7.900350] fb0: EFI VGA frame buffer device > [ 7.905289] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 > [ 7.913714] ACPI: button: Power Button [PWRB] > [ 7.919549] ACPI GTDT: [Firmware Bug]: failed to get the Watchdog base address. > [ 7.927289] Unable to handle kernel read from unreadable memory at virtual address 0000000000000028 > [ 7.936326] Mem abort info: > [ 7.939108] ESR = 0x96000004 > [ 7.942151] EC = 0x25: DABT (current EL), IL = 32 bits > [ 7.947451] SET = 0, FnV = 0 > [ 7.950494] EA = 0, S1PTW = 0 > [ 7.953624] Data abort info: > [ 7.956492] ISV = 0, ISS = 0x00000004 > [ 7.960316] CM = 0, WnR = 0 > [ 7.963273] [0000000000000028] user address but active_mm is swapper > [ 7.969616] Internal error: Oops: 96000004 [#1] SMP > [ 7.974483] Modules linked in: > [ 7.977531] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc8 #19 > [ 7.983874] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS F02 08/06/2019 > [ 7.990737] pstate: 40400085 (nZcv daIf +PAN -UAO -TCO BTYPE=--) > [ 7.996732] pc : __ipi_send_mask+0x60/0x114 > [ 8.000910] lr : smp_cross_call+0x40/0xcc > [ 8.004913] sp : ffff800012753c10 > [ 8.008216] x29: ffff800012753c10 x28: ffff000100de5d00 > [ 8.013521] x27: 000000000000000a x26: ffff80001225da20 > [ 8.018825] x25: 0000000000000000 x24: ffff000ff62719b0 > [ 8.024129] x23: ffff80001225d000 x22: ffff800012368108 > [ 8.029433] x21: ffff800010f69a20 x20: 0000000000000000 > [ 8.034737] x19: ffff000100143c60 x18: 0000000000000020 > [ 8.040041] x17: 000000008e74252f x16: 00000000bf0ab2ad > [ 8.045345] x15: ffffffffffffffff x14: 0000000000000000 > [ 8.050649] x13: 003d090000000000 x12: 00003d0900000000 > [ 8.055953] x11: 0000000000000000 x10: 00003d0900000000 > [ 8.061257] x9 : ffff800010027f14 x8 : 0000000000000000 > [ 8.066561] x7 : 00000000ffffffff x6 : ffff000ff6148698 > [ 8.071865] x5 : ffff80001159d040 x4 : ffff80001159d110 > [ 8.077169] x3 : ffff800010f69a00 x2 : 0000000000000000 > [ 8.082473] x1 : ffff800010f69a20 x0 : 0000000000000000 > [ 8.087777] Call trace: > [ 8.090213] __ipi_send_mask+0x60/0x114 > [ 8.094038] smp_cross_call+0x40/0xcc > [ 8.097691] smp_send_reschedule+0x3c/0x50 > [ 8.101778] resched_curr+0x5c/0xb0 > [ 8.105258] check_preempt_curr+0x58/0x90 > [ 8.109258] ttwu_do_wakeup+0x2c/0x190 > [ 8.112996] ttwu_do_activate+0x7c/0x114 > [ 8.116909] try_to_wake_up+0x388/0x670 > [ 8.120735] wake_up_process+0x24/0x30 > [ 8.124474] swake_up_one+0x48/0x9c > [ 8.127953] rcu_gp_kthread_wake+0x68/0x8c > [ 8.132041] rcu_accelerate_cbs_unlocked+0xb4/0xf0 > [ 8.136822] rcu_core+0x520/0x694 > [ 8.140128] rcu_core_si+0x1c/0x2c > [ 8.143520] __do_softirq+0x128/0x388 > [ 8.147172] irq_exit+0xc4/0xec > [ 8.150304] __handle_domain_irq+0x8c/0xec > [ 8.154394] gic_handle_irq+0xd8/0x2f0 > [ 8.158132] el1_irq+0xc0/0x180 > [ 8.161262] __pi_strcmp+0x20/0x158 > [ 8.164742] driver_register+0x68/0x140 > [ 8.168571] __platform_driver_register+0x34/0x40 > [ 8.173265] imx8mp_clk_driver_init+0x28/0x34 > [ 8.177614] do_one_initcall+0x50/0x260 > [ 8.181440] kernel_init_freeable+0x24c/0x2d4 > [ 8.185790] kernel_init+0x20/0x134 > [ 8.189271] ret_from_fork+0x10/0x18 > [ 8.192840] Code: a90363f7 aa0103f5 d0010957 f9401260 (b9402800) > [ 8.198955] ---[ end trace c24172add816c1f0 ]--- > [ 8.203562] Kernel panic - not syncing: Oops: Fatal exception in interrupt > [ 8.210442] SMP: stopping secondary CPUs > [ 9.258360] SMP: failed to stop secondary CPUs 0,9 > [ 9.263141] Kernel Offset: disabled > [ 9.266617] CPU features: 0x00040002,69101108 > [ 9.270963] Memory Limit: none > [ 9.274024] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- Please feed this stacktrace to scripts/decode_stacktrace.sh so that I can get an idea about what is going wrong. I bet something is playing ungodly games with the one of the IPIs, and things go horribly wrong. Now, here's a hunch: in the fine TX1 tradition, the firmware is broken and the GTDT table looks unusable. Amusingly, the crash happens right after the SBSA watchdog fails to probe. And looking at the code that implements that driver, it looks dodgy as hell, as it unmaps an interrupt it doesn't even know is valid. And it does that right when the driver fails the way you experienced it. If, by any chance, the interrupt field is 0 in the firmware table, this results in SGI0 being unmapped. Given that this is the rescheduling interrupt, fireworks happen. Can you have a go with the patchlet below, and let me know if that helps? Thanks, M. diff --git a/drivers/acpi/arm64/gtdt.c b/drivers/acpi/arm64/gtdt.c index f2d0e5915dab..0a0a982f9c28 100644 --- a/drivers/acpi/arm64/gtdt.c +++ b/drivers/acpi/arm64/gtdt.c @@ -329,7 +329,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd, int index) { struct platform_device *pdev; - int irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags); + int irq; /* * According to SBSA specification the size of refresh and control @@ -338,7 +338,7 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd, struct resource res[] = { DEFINE_RES_MEM(wd->control_frame_address, SZ_4K), DEFINE_RES_MEM(wd->refresh_frame_address, SZ_4K), - DEFINE_RES_IRQ(irq), + {}, }; int nr_res = ARRAY_SIZE(res); @@ -348,10 +348,11 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd, if (!(wd->refresh_frame_address && wd->control_frame_address)) { pr_err(FW_BUG "failed to get the Watchdog base address.\n"); - acpi_unregister_gsi(wd->timer_interrupt); return -EINVAL; } + irq = map_gt_gsi(wd->timer_interrupt, wd->timer_flags); + res[2] = (struct resource)DEFINE_RES_IRQ(irq); if (irq <= 0) { pr_warn("failed to map the Watchdog interrupt.\n"); nr_res--; @@ -364,7 +365,8 @@ static int __init gtdt_import_sbsa_gwdt(struct acpi_gtdt_watchdog *wd, */ pdev = platform_device_register_simple("sbsa-gwdt", index, res, nr_res); if (IS_ERR(pdev)) { - acpi_unregister_gsi(wd->timer_interrupt); + if (irq > 0) + acpi_unregister_gsi(wd->timer_interrupt); return PTR_ERR(pdev); } -- Without deviation from the norm, progress is not possible.