Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1556424pxk; Sun, 13 Sep 2020 07:14:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxCJBmfVXxluS1Ux0ckA6GTQ1veJt8ZH7auxO/Fo7bzRYlhisKT5ZXl3ugPAAonQU8y9gkF X-Received: by 2002:a17:906:4305:: with SMTP id j5mr10781600ejm.102.1600006491518; Sun, 13 Sep 2020 07:14:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600006491; cv=none; d=google.com; s=arc-20160816; b=Q8FL6TerMOVfJQGwCz0lntprTAVWgAwDRdWQM86xSIvpxBPvNxu3vOPB44zho06rd2 BjkiioIjA3GfI1HpUq8+B5W3tVNMIvnZ9DP34y47sQIdX+9QEOVlbqt+orpF0j/ssguP Bipkj3mrlMre9kDVXIkJdLKieMP33wvlM7TEYsvkEq+Z5y55t2OtBDvlEJpTy6PC9s3J ZClQ3qeIzI58l9/IncXXFSEh88ww27FAPe/Wo79LujJpBH62Swl4gtLbvoIOZKBQnTAo YBBa7xNDJG1K6NwLcRHS0i8WVnVVtV3ovvx662Up8M8QXZ4Ju3tGNv39wcZrwl1SdGMe y+qQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=IhTIiF25xAmrQ6MyBynCtrGcZIc7VlThpF77oJKIUzY=; b=uhZdFrS7V7CxRYiCzoGBoumoIlkPTY7Cqewdulpwb15mJ9mBHiMlx5cnsZkevXPayv kGpMYxjkj/OlSAAOkZbP8paK6DStZRLDkPNIxD2Fv8CFt+ODpHjYeX5vqqyejw8uJaIu fYRMweNzrq+ulN14uB5bihWafAFABsSL1xRo29bP31RvjUte0gIQON4wJSdQP6wW1EHI lCbaAk/nV50D34KN077LADzpnfYnp5FP68gY8urFoXO2XZ3Kr5ChmmAmNHMGOo1ly96d l2vsuAFEV13zdkC6WDe526qkQcgfVkGTfhRa6fqnpZ/zRBq68Bb84jYZfmQQC+oQvHfs FBJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w5si5193695eja.645.2020.09.13.07.13.59; Sun, 13 Sep 2020 07:14:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725938AbgIMOLB convert rfc822-to-8bit (ORCPT + 99 others); Sun, 13 Sep 2020 10:11:01 -0400 Received: from smtp.h3c.com ([60.191.123.50]:32904 "EHLO h3cspam02-ex.h3c.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725932AbgIMOLB (ORCPT ); Sun, 13 Sep 2020 10:11:01 -0400 Received: from DAG2EX10-IDC.srv.huawei-3com.com ([10.8.0.73]) by h3cspam02-ex.h3c.com with ESMTPS id 08DE9vDW054777 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=FAIL); Sun, 13 Sep 2020 22:09:57 +0800 (GMT-8) (envelope-from tian.xianting@h3c.com) Received: from DAG2EX03-BASE.srv.huawei-3com.com (10.8.0.66) by DAG2EX10-IDC.srv.huawei-3com.com (10.8.0.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Sun, 13 Sep 2020 22:10:01 +0800 Received: from DAG2EX03-BASE.srv.huawei-3com.com ([fe80::5d18:e01c:bbbd:c074]) by DAG2EX03-BASE.srv.huawei-3com.com ([fe80::5d18:e01c:bbbd:c074%7]) with mapi id 15.01.1713.004; Sun, 13 Sep 2020 22:10:01 +0800 From: Tianxianting To: "minyard@acm.org" CC: "arnd@arndb.de" , "gregkh@linuxfoundation.org" , "openipmi-developer@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH] ipmi: retry to get device id when error Thread-Topic: [PATCH] ipmi: retry to get device id when error Thread-Index: AQHWicayYDI4BxkBBkSNG1QnUn+lFKll/C8AgACPz+A= Date: Sun, 13 Sep 2020 14:10:01 +0000 Message-ID: References: <20200913120203.3368-1-tian.xianting@h3c.com> <20200913123930.GH15602@minyard.net> In-Reply-To: <20200913123930.GH15602@minyard.net> Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.99.141.128] x-sender-location: DAG2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-DNSRBL: X-MAIL: h3cspam02-ex.h3c.com 08DE9vDW054777 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Corey Thanks for your quickly reply, We didn't try the method you mentioned, actually, I didn't know it before you told me:( The issue ever occurred on our 2 ceph storage server both with low probability. We finally use this patch to solve the issue, it can automatically solve the issue when it happened. So no need to judge and reload ipmi driver manually or develop additional scripts to make it. The 1 second delay is acceptable to us. If there really isn't a BMC out there, ipmi driver will not be loaded, is it right? May be we can adjust to retry 3 times with 500ms interval? Thanks in advance if you can feedback again. -----Original Message----- From: Corey Minyard [mailto:tcminyard@gmail.com] On Behalf Of Corey Minyard Sent: Sunday, September 13, 2020 8:40 PM To: tianxianting (RD) Cc: arnd@arndb.de; gregkh@linuxfoundation.org; openipmi-developer@lists.sourceforge.net; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ipmi: retry to get device id when error On Sun, Sep 13, 2020 at 08:02:03PM +0800, Xianting Tian wrote: > We can't get bmc's device id with low probability when loading ipmi > driver, it caused bmc device register failed. This issue may caused by > bad lpc signal quality. When this issue happened, we got below kernel printks: > [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: IPMI message handler: device id demangle failed: -22 > [Wed Sep 9 19:52:03 2020] IPMI BT: using default values > [Wed Sep 9 19:52:03 2020] IPMI BT: req2rsp=5 secs retries=2 > [Wed Sep 9 19:52:03 2020] ipmi_si IPI0001:00: Unable to get the device id: -5 > [Wed Sep 9 19:52:04 2020] ipmi_si IPI0001:00: Unable to register > device: error -5 > > When this issue happened, we want to manually unload the driver and > try to load it again, but it can't be unloaded by 'rmmod' as it is already 'in use'. I'm not sure this patch is a good idea; it would cause a long boot delay in situations where there really isn't a BMC out there. Yes, it happens. You don't have to reload the driver to add a device, though. You can hot-add devices using /sys/modules/ipmi_si/parameters/hotmod. Look in Documentation/driver-api/ipmi.rst for details. Does that work for you? -corey > > We add below 'printk' in handle_one_recv_msg(), when this issue > happened, the msg we received is "Recv: 1c 01 d5", which means the > data_len is 1, data[0] is 0xd5. > Debug code: > static int handle_one_recv_msg(struct ipmi_smi *intf, > struct ipmi_smi_msg *msg) { > printk("Recv: %*ph\n", msg->rsp_size, msg->rsp); > ... ... > } > Then in ipmi_demangle_device_id(), it returned '-EINVAL' as 'data_len < 7' > and 'data[0] != 0'. > > We used this patch to retry to get device id when error happen, we > reproduced this issue again and the retry succeed on the first retry, > we finally got the correct msg and then all is ok: > Recv: 1c 01 00 01 81 05 84 02 af db 07 00 01 00 b9 00 10 00 > > So use retry machanism in this patch to give bmc more opportunity to > correctly response kernel. > > Signed-off-by: Xianting Tian > --- > drivers/char/ipmi/ipmi_msghandler.c | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c > b/drivers/char/ipmi/ipmi_msghandler.c > index 737c0b6b2..bfb2de77a 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -34,6 +34,7 @@ > #include > #include > #include > +#include > > #define IPMI_DRIVER_VERSION "39.2" > > @@ -60,6 +61,9 @@ enum ipmi_panic_event_op { #else #define > IPMI_PANIC_DEFAULT IPMI_SEND_PANIC_EVENT_NONE #endif > + > +#define GET_DEVICE_ID_MAX_RETRY 5 > + > static enum ipmi_panic_event_op ipmi_send_panic_event = > IPMI_PANIC_DEFAULT; > > static int panic_op_write_handler(const char *val, @@ -2426,19 > +2430,26 @@ send_get_device_id_cmd(struct ipmi_smi *intf) static int > __get_device_id(struct ipmi_smi *intf, struct bmc_device *bmc) { > int rv; > - > - bmc->dyn_id_set = 2; > + unsigned int retry_count = 0; > > intf->null_user_handler = bmc_device_id_handler; > > +retry: > + bmc->dyn_id_set = 2; > + > rv = send_get_device_id_cmd(intf); > if (rv) > return rv; > > wait_event(intf->waitq, bmc->dyn_id_set != 2); > > - if (!bmc->dyn_id_set) > + if (!bmc->dyn_id_set) { > + msleep(1000); > + if (++retry_count <= GET_DEVICE_ID_MAX_RETRY) > + goto retry; > + > rv = -EIO; /* Something went wrong in the fetch. */ > + } > > /* dyn_id_set makes the id data available. */ > smp_rmb(); > -- > 2.17.1 >