Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752062AbbHRB77 (ORCPT ); Mon, 17 Aug 2015 21:59:59 -0400 Received: from mail7.hitachi.co.jp ([133.145.228.42]:60198 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750822AbbHRB75 (ORCPT ); Mon, 17 Aug 2015 21:59:57 -0400 From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= To: "'minyard@acm.org'" CC: "openipmi-developer@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH 6/7] ipmi: Handle queued messages more certainly on panic Thread-Topic: [PATCH 6/7] ipmi: Handle queued messages more certainly on panic Thread-Index: AQHQ1c5PO3ZkWhdT3kCECQhi1RcR354QC2sA Date: Tue, 18 Aug 2015 01:59:52 +0000 Message-ID: <04EAB7311EE43145B2D3536183D1A84454938028@GSjpTKYDCembx31.service.hitachi.net> References: <20150727055516.4759.34462.stgit@softrs> <20150727055516.4759.17665.stgit@softrs> <55CAC7CD.2050405@acm.org> In-Reply-To: <55CAC7CD.2050405@acm.org> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.198.219.41] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id t7I205S6013544 Content-Length: 8531 Lines: 238 Hello Corey, > -----Original Message----- > From: Corey Minyard [mailto:tcminyard@gmail.com] On Behalf Of Corey Minyard > Sent: Wednesday, August 12, 2015 1:13 PM > To: 河合英宏 / KAWAI,HIDEHIRO > Cc: openipmi-developer@lists.sourceforge.net; linux-kernel@vger.kernel.org > Subject: Re: [PATCH 6/7] ipmi: Handle queued messages more certainly on panic > > On 07/27/2015 12:55 AM, Hidehiro Kawai wrote: > > panic_event() called as a panic notifier tries to flush queued > > messages, but it can't handle them if the kernel panic happens > > while processing a message. What happens depends on when the > > kernel panics. > > Sorry this took so long, I've been traveling. No problem. > I have queued the patches before this one. They all look good and > necessary. Thank you for reviewing! > I'm not so sure about this patch. It looks like the only thing that is > a real issue is #2 below. > It's not so important to avoid dropping messages. Initially I thought dropping middle of queued messages breaks some consistencies if a message depends on the preceding dropped message. However, userland tools normally issue request messages in sequential manner, so the above situation wouldn't happen. Now, I think dropping a message is OK. > Can this be simplified somehow to work around the issue at panic time if > intf->curr_msg is set and smi_info->waiting_msg is not? There are two cases where intf->curr_msg is set and smi_info->waiting_msg is not; one is before (2) and the other is after (3). If we decide to drop intf->curr_msg in both cases, I can simplify this patch somewhat. Regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group > > Thank you, > > -corey > > > > > Here is the summary of message sending process. > > > > smi_send() > > smi_add_send_msg() > > (1) intf->curr_msg = msg > > sender() > > (2) smi_info->waiting_msg = msg > > > > > > check_start_timer_thread() > > start_next_msg() > > smi_info->curr_msg = smi_info->waiting_msg > > (3) smi_info->waiting_msg = NULL > > (4) smi_info->handlers->start_transaction() > > > > > > smi_event_handler() > > (5) handle_transaction_done() > > smi_info->curr_msg = NULL > > deliver_recv_msg() > > ipmi_smi_msg_received() > > intf->curr_msg = NULL > > > > If the kernel panics before (1), the requested message will be > > lost. But it can't be helped. > > > > If the kernel panics before (2), new message sent by > > send_panic_events() is queued to intf->xmit_msgs because > > intf->curr_msg is non-NULL. But the new message will be never > > sent because no one sends intf->curr_msg. As the result, the > > kernel hangs up. > > > > If the kernel panics before (3), intf->curr_msg will be sent by > > set_run_to_completion(). It's no problem. > > > > If the kernel panics before (4), intf->curr_msg will be lost. > > However, messages on intf->xmit_msgs will be handled. > > > > If the kernel panics before (5), we try to continue running the > > state machine. It may successfully complete. > > > > If the kernel panics after (5), we will miss the response message > > handling, but it's not much problem in the panic context. > > > > This patch tries to handle messages in intf->curr_msg and > > intf->xmit_msgs only once without losing them. To achieve this, > > this patch does that: > > - if a message is in intf->curr_msg or intf->xmit_msgs and > > start_transaction() for the message hasn't been done yet, > > resend it > > - if start_transaction() for a message has been called, > > just continue to run the state machine > > - if the transaction has been completed, do nothing > > > > >From the perspective of implementation, these are done by keeping > > smi_info->waiting_msg until start_transaction() is completed and > > by keeping new flag IPMI_MSG_RESEND_ON_PANIC just before starting > > the state machine. > > > > Signed-off-by: Hidehiro Kawai > > --- > > drivers/char/ipmi/ipmi_msghandler.c | 36 +++++++++++++++++++++++++++++++++++ > > drivers/char/ipmi/ipmi_si_intf.c | 5 ++++- > > include/linux/ipmi_smi.h | 5 +++++ > > 3 files changed, 45 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c b/drivers/char/ipmi/ipmi_msghandler.c > > index 5a2d9fe..3dcd814 100644 > > --- a/drivers/char/ipmi/ipmi_msghandler.c > > +++ b/drivers/char/ipmi/ipmi_msghandler.c > > @@ -1493,6 +1493,8 @@ static struct ipmi_smi_msg *smi_add_send_msg(ipmi_smi_t intf, > > struct ipmi_smi_msg *smi_msg, > > int priority) > > { > > + smi_msg->flags |= IPMI_MSG_RESEND_ON_PANIC; > > + > > if (intf->curr_msg) { > > if (priority > 0) > > list_add_tail(&smi_msg->link, &intf->hp_xmit_msgs); > > @@ -4223,6 +4225,7 @@ struct ipmi_smi_msg *ipmi_alloc_smi_msg(void) > > rv->done = free_smi_msg; > > rv->user_data = NULL; > > atomic_inc(&smi_msg_inuse_count); > > + rv->flags = 0; > > } > > return rv; > > } > > @@ -4531,7 +4534,40 @@ static int panic_event(struct notifier_block *this, > > spin_unlock(&intf->waiting_rcv_msgs_lock); > > > > intf->run_to_completion = 1; > > +restart: > > intf->handlers->set_run_to_completion(intf->send_info, 1); > > + > > + if (intf->curr_msg) { > > + /* > > + * This can happen if the kernel panics before > > + * setting msg to smi_info->waiting_msg or while > > + * processing a response. For the former case, we > > + * resend the message by re-queueing it. For the > > + * latter case, we simply ignore it because handling > > + * response is not much meaningful in the panic > > + * context. > > + */ > > + > > + /* > > + * Since we want to send the current message first, > > + * re-queue it into the high-prioritized queue. > > + */ > > + if (intf->curr_msg->flags & IPMI_MSG_RESEND_ON_PANIC) > > + list_add(&intf->curr_msg->link, > > + &intf->hp_xmit_msgs); > > + > > + intf->curr_msg = NULL; > > + } > > + > > + if (!list_empty(&intf->hp_xmit_msgs) || > > + !list_empty(&intf->xmit_msgs)) { > > + /* > > + * This can happen if the kernel panics while > > + * processing a response. Kick the queue and restart. > > + */ > > + smi_recv_tasklet((unsigned long)intf); > > + goto restart; > > + } > > } > > > > #ifdef CONFIG_IPMI_PANIC_EVENT > > diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c > > index 814b7b7..c5c7806 100644 > > --- a/drivers/char/ipmi/ipmi_si_intf.c > > +++ b/drivers/char/ipmi/ipmi_si_intf.c > > @@ -383,7 +383,6 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info) > > int err; > > > > smi_info->curr_msg = smi_info->waiting_msg; > > - smi_info->waiting_msg = NULL; > > debug_timestamp("Start2"); > > err = atomic_notifier_call_chain(&xaction_notifier_list, > > 0, smi_info); > > @@ -401,6 +400,7 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info) > > rv = SI_SM_CALL_WITHOUT_DELAY; > > } > > out: > > + smi_info->waiting_msg = NULL; > > return rv; > > } > > > > @@ -804,6 +804,9 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info, > > { > > enum si_sm_result si_sm_result; > > > > + if (smi_info->curr_msg) > > + smi_info->curr_msg->flags &= ~(IPMI_MSG_RESEND_ON_PANIC); > > + > > restart: > > /* > > * There used to be a loop here that waited a little while > > diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h > > index ba57fb1..1200872 100644 > > --- a/include/linux/ipmi_smi.h > > +++ b/include/linux/ipmi_smi.h > > @@ -47,6 +47,9 @@ > > /* Structure for the low-level drivers. */ > > typedef struct ipmi_smi *ipmi_smi_t; > > > > +/* Flags for flags member of struct ipmi_smi_msg */ > > +#define IPMI_MSG_RESEND_ON_PANIC 1 /* If set, resend in panic_event() */ > > + > > /* > > * Messages to/from the lower layer. The smi interface will take one > > * of these to send. After the send has occurred and a response has > > @@ -75,6 +78,8 @@ struct ipmi_smi_msg { > > /* Will be called when the system is done with the message > > (presumably to free it). */ > > void (*done)(struct ipmi_smi_msg *msg); > > + > > + int flags; > > }; > > > > struct ipmi_smi_handlers { > > > > ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?