Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2360946pxb; Fri, 17 Sep 2021 08:08:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxgRjBukXbGXxn+CvODxmsrWfBzT8d+eA4C0Dghi9UgjXa1eRsXJmE3puN4Svre4790M3Ho X-Received: by 2002:a05:6638:13cd:: with SMTP id i13mr9035640jaj.128.1631891318322; Fri, 17 Sep 2021 08:08:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631891318; cv=none; d=google.com; s=arc-20160816; b=JSTs+dtxFGdZEnDM+r1+VqyshBfkWU+eLdRYRjD2TBGRPHKpmnGCQjjCmNPL1Y9vRg lsJu32i9Qze1WMWhREEpRQOy/tsvUAaHx6DPDp7Kz267nUDvPFyr+ZisZouH06lmelOZ n0Z5/EcIBr89+hRCI0OHGczXLGioZD24jmXdq8Ap9oT79z0zvXGxqM9t+U8fKFkdLBi5 i3sqTpjGMZMn96mTv16bacDgbPvBPS+DJnM+wx2zEecmlCnlhl0DVdv797XbP6pvRGoO z0lkklf05+Qbnclvhb8quYvi0x+mmOjEvtm+cp6Ujp8QbRImtu+cQ5Wg4+0/a+DNToPV CQog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=E1MZPG79bywduOp9zN3bAjhhxM2GDH6c1DSLmJeGD30=; b=Pufwwh7oYehvfwuVOFQUthdgyBKiGjTR2DcKMC9v5OHTLAaENrXgsFHetsswOD3o7R i50Mgr7B6luflEJnNpqX7DfJbLYM3xRvPdyapHHkB+1YBN1YrHN9/NL8Ox9CjAl7akR0 RMlVKJvaUaVNwx7cRtsUAeuNFThl8bDFOUY35H/7Vi3+7OkV5RyiXklYIYEbRk3rbrDE tYLJOX/pbJ9EdTvNo7khRUEQ1+L8tsxUnH9J+0ejlLkrlHoPH/E8I4XkDJfiRCEJMobq IubOGuERtUa6aYh4yfii09eF1av1U6DKNIRsq0kDd5pBSVIQC45epFb0xxLlood3D29w J7AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@acc.umu.se header.s=mail1 header.b=AXo+a397; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=acc.umu.se Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q25si225739ios.80.2021.09.17.08.08.26; Fri, 17 Sep 2021 08:08:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@acc.umu.se header.s=mail1 header.b=AXo+a397; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=acc.umu.se Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244238AbhIQN2O (ORCPT + 99 others); Fri, 17 Sep 2021 09:28:14 -0400 Received: from mail.acc.umu.se ([130.239.18.156]:45678 "EHLO mail.acc.umu.se" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244777AbhIQN2M (ORCPT ); Fri, 17 Sep 2021 09:28:12 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by amavisd-new (Postfix) with ESMTP id 587D044B92; Fri, 17 Sep 2021 15:26:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=acc.umu.se; s=mail1; t=1631885209; bh=vIBXtYbunMJ2qy2dnJIXaGromudxlF91jRJyR2yXAeQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=AXo+a397b8JpATuAsbvVGMvzJNXOf/SDogUq8rMEkBOq0ss+CFEiG1h2p7uQKKMMS tmPQP3k+tmuNcNVKhT9hDHkYXkY36dzlw9oz9aEq3lbWxI/Fe7YIkQtPyilMVGT2A0 DUc7PKalLCl8QjQM5sXZ5G9s7DNDtZinyUE2Psc34Vpgm0tGxB6+KU/Gu708xOOOjK IBh4XcIaOO56EZWBBbLIJQtKgvtCdvladkCj5yVciZzX21dvXfTlBhPNYqBIwgg22M xJL4Ph9SAWG1PfUC7VMCnlQri82vFV8FmZseprkXr/C1dmQmtiZXvOHAlrVQLlVkuo n8uj5NGpLcKMg== Received: by mail.acc.umu.se (Postfix, from userid 24471) id E99CF44B93; Fri, 17 Sep 2021 15:26:48 +0200 (CEST) Date: Fri, 17 Sep 2021 15:26:48 +0200 From: Anton Lundin To: Corey Minyard Cc: openipmi-developer@lists.sourceforge.net, LKML Subject: Re: [Openipmi-developer] Issue with panic handling and ipmi Message-ID: <20210917132648.GG108031@montezuma.acc.umu.se> References: <20210916145300.GD108031@montezuma.acc.umu.se> <20210916163945.GY545073@minyard.net> <20210917101419.GE108031@montezuma.acc.umu.se> <20210917120758.GA545073@minyard.net> <20210917125525.GF108031@montezuma.acc.umu.se> <20210917131916.GB545073@minyard.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210917131916.GB545073@minyard.net> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 17 September, 2021 - Corey Minyard wrote: > On Fri, Sep 17, 2021 at 02:55:25PM +0200, Anton Lundin wrote: > > On 17 September, 2021 - Corey Minyard wrote: > > > > > On Fri, Sep 17, 2021 at 12:14:19PM +0200, Anton Lundin wrote: > > > > On 16 September, 2021 - Corey Minyard wrote: > > > > > > > > > On Thu, Sep 16, 2021 at 04:53:00PM +0200, Anton Lundin wrote: > > > > > > Hi. > > > > > > > > > > > > I've just done a upgrade of the kernel we're using in a product from > > > > > > 4.19 to 5.10 and I noted a issue. > > > > > > > > > > > > It started that with that we didn't get panic and oops dumps in our erst > > > > > > backed pstore, and when debugging that I noted that the reboot on panic > > > > > > timer didn't work either. > > > > > > > > > > > > I've bisected it down to 2033f6858970 ("ipmi: Free receive messages when > > > > > > in an oops"). > > > > > > > > > > Hmm. Unfortunately removing that will break other things. Can you try > > > > > the following patch? It's a good idea, in general, to do as little as > > > > > possible in the panic path, this should cover a multitude of issues. > > > > > > > > > > Thanks for the report. > > > > > > > > > > > > > I'm sorry to report that the patch didn't solve the issue, and the > > > > machine locked up in the panic path as before. > > > > > > I missed something. Can you try the following? If this doesn't work, > > > I'm going to have to figure out how to reproduce this. > > > > > > > Sorry, still no joy. > > > > My guess is that there is something locking up due to these Supermicro > > machines have their ERST memory backed by the BMC, and the same BMC is > > is the other end of all the ipmi communications. > > > > I've reproduced this on Server/X11SCZ-F and Server/H11SSL-i but I'm > > guessing it can be reproduced on most, if not all, of their hardware > > with the same setup. > > > > We're using the ERST backend for pstore, because we're still > > bios-booting them and don't have efi services available to use as pstore > > backend. > > > > > > I've tested to just yank out the ipmi modules from the kernel and that > > fixes the panic timer and we get crash dumps to pstore. > > Dang. I'm going to have to look deeper at what that could change to > cause an issue like this. Are you using the IPMI watchdog? Do you have > CONFIG_IPMI_PANIC_EVENT=y set in the config? # CONFIG_IPMI_PANIC_EVENT is not set We're using the IPMI watchdog. //Anton