Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751739AbdFSOtN (ORCPT ); Mon, 19 Jun 2017 10:49:13 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:35657 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750929AbdFSOtL (ORCPT ); Mon, 19 Jun 2017 10:49:11 -0400 Reply-To: minyard@acm.org Subject: Re: [PATCH] ipmi: use rcu lock around call to intf->handlers->sender() To: Tony Camuso , openipmi-developer@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org References: <1497365651-7413-1-git-send-email-tcamuso@redhat.com> <1a359a04-f1da-0e2e-8663-6527e84dd5a3@acm.org> <613dc158-e3de-f713-019e-cf8c190e72be@acm.org> <91da06cc-3311-845e-22b6-78b69dbcdeb2@acm.org> From: Corey Minyard Message-ID: <5399eaad-22e6-401f-8850-1d4cef2e122b@acm.org> Date: Mon, 19 Jun 2017 09:49:08 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8777 Lines: 229 On 06/19/2017 09:42 AM, Tony Camuso wrote: > On 06/19/2017 10:32 AM, Corey Minyard wrote: >> On 06/19/2017 09:29 AM, Tony Camuso wrote: >>> On 06/19/2017 09:31 AM, Corey Minyard wrote: >>>> On 06/16/2017 07:15 AM, Corey Minyard wrote: >>>>> On 06/15/2017 10:54 AM, Corey Minyard wrote: >>>>>> On 06/13/2017 09:54 AM, Tony Camuso wrote: >>>>>>> A vendor with a system having more than 128 CPUs occasionally >>>>>>> encounters a >>>>>>> crash during shutdown. This is not an easily reproduceable >>>>>>> event, but the >>>>>>> vendor was able to provide the following analysis of the crash, >>>>>>> which >>>>>>> exhibits the same footprint each time. >>>>>>> >>>>>>> crash> bt >>>>>>> PID: 0 TASK: ffff88017c70ce70 CPU: 5 COMMAND: "swapper/5" >>>>>>> #0 [ffff88085c143ac8] machine_kexec at ffffffff81059c8b >>>>>>> #1 [ffff88085c143b28] __crash_kexec at ffffffff811052e2 >>>>>>> #2 [ffff88085c143bf8] crash_kexec at ffffffff811053d0 >>>>>>> #3 [ffff88085c143c10] oops_end at ffffffff8168ef88 >>>>>>> #4 [ffff88085c143c38] no_context at ffffffff8167ebb3 >>>>>>> #5 [ffff88085c143c88] __bad_area_nosemaphore at ffffffff8167ec49 >>>>>>> #6 [ffff88085c143cd0] bad_area_nosemaphore at ffffffff8167edb3 >>>>>>> #7 [ffff88085c143ce0] __do_page_fault at ffffffff81691d1e >>>>>>> #8 [ffff88085c143d40] do_page_fault at ffffffff81691ec5 >>>>>>> #9 [ffff88085c143d70] page_fault at ffffffff8168e188 >>>>>>> [exception RIP: unknown or invalid address] >>>>>>> RIP: ffffffffa053c800 RSP: ffff88085c143e28 RFLAGS: 00010206 >>>>>>> RAX: ffff88017c72bfd8 RBX: ffff88017a8dc000 RCX: >>>>>>> ffff8810588b5ac8 >>>>>>> RDX: ffff8810588b5a00 RSI: ffffffffa053c800 RDI: >>>>>>> ffff8810588b5a00 >>>>>>> RBP: ffff88085c143e58 R8: ffff88017c70d408 R9: >>>>>>> ffff88017a8dc000 >>>>>>> R10: 0000000000000002 R11: ffff88085c143da0 R12: >>>>>>> ffff8810588b5ac8 >>>>>>> R13: 0000000000000100 R14: ffffffffa053c800 R15: >>>>>>> ffff8810588b5a00 >>>>>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>>>>>> --- --- >>>>>>> [exception RIP: cpuidle_enter_state+82] >>>>>>> RIP: ffffffff81514192 RSP: ffff88017c72be50 RFLAGS: 00000202 >>>>>>> RAX: 0000001e4c3c6f16 RBX: 000000000000f8a0 RCX: >>>>>>> 0000000000000018 >>>>>>> RDX: 0000000225c17d03 RSI: ffff88017c72bfd8 RDI: >>>>>>> 0000001e4c3c6f16 >>>>>>> RBP: ffff88017c72be78 R8: 000000000000237e R9: >>>>>>> 0000000000000018 >>>>>>> R10: 0000000000002494 R11: 0000000000000001 R12: >>>>>>> ffff88017c72be20 >>>>>>> R13: ffff88085c14f8e0 R14: 0000000000000082 R15: >>>>>>> 0000001e4c3bb400 >>>>>>> ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 >>>>>>> >>>>>>> This is the corresponding stack trace >>>>>>> >>>>>>> It has crashed because the area pointed with RIP extracted from >>>>>>> timer >>>>>>> element is already removed during a shutdown process. >>>>>>> >>>>>>> The function is smi_timeout(). >>>>>>> >>>>>>> And we think ffff8810588b5a00 in RDX is a parameter struct smi_info >>>>>>> >>>>>>> crash> rd ffff8810588b5a00 20 >>>>>>> ffff8810588b5a00: ffff8810588b6000 0000000000000000 >>>>>>> .`.X............ >>>>>>> ffff8810588b5a10: ffff880853264400 ffffffffa05417e0 >>>>>>> .D&S......T..... >>>>>>> ffff8810588b5a20: 24a024a000000000 0000000000000000 >>>>>>> .....$.$........ >>>>>>> ffff8810588b5a30: 0000000000000000 0000000000000000 >>>>>>> ................ >>>>>>> ffff8810588b5a40: ffffffffa053a040 ffffffffa053a060 >>>>>>> @.S.....`.S..... >>>>>>> ffff8810588b5a50: 0000000000000000 0000000100000001 >>>>>>> ................ >>>>>>> ffff8810588b5a60: 0000000000000000 0000000000000e00 >>>>>>> ................ >>>>>>> ffff8810588b5a70: ffffffffa053a580 ffffffffa053a6e0 >>>>>>> ..S.......S..... >>>>>>> ffff8810588b5a80: ffffffffa053a4a0 ffffffffa053a250 >>>>>>> ..S.....P.S..... >>>>>>> ffff8810588b5a90: 0000000500000002 0000000000000000 >>>>>>> ................ >>>>>>> >>>>>>> Unfortunately the top of this area is already detroyed by someone. >>>>>>> But because of two reasonns we think this is struct smi_info >>>>>>> 1) The address included in between ffff8810588b5a70 and >>>>>>> ffff8810588b5a80: >>>>>>> are inside of ipmi_si_intf.c see crash> module ffff88085779d2c0 >>>>>>> >>>>>>> 2) We've found the area which point this. >>>>>>> It is offset 0x68 of ffff880859df4000 >>>>>>> >>>>>>> crash> rd ffff880859df4000 100 >>>>>>> ffff880859df4000: 0000000000000000 0000000000000001 >>>>>>> ................ >>>>>>> ffff880859df4010: ffffffffa0535290 dead000000000200 >>>>>>> .RS............. >>>>>>> ffff880859df4020: ffff880859df4020 ffff880859df4020 @.Y.... >>>>>>> @.Y.... >>>>>>> ffff880859df4030: 0000000000000002 0000000000100010 >>>>>>> ................ >>>>>>> ffff880859df4040: ffff880859df4040 ffff880859df4040 >>>>>>> @@.Y....@@.Y.... >>>>>>> ffff880859df4050: 0000000000000000 0000000000000000 >>>>>>> ................ >>>>>>> ffff880859df4060: 0000000000000000 ffff8810588b5a00 >>>>>>> .........Z.X.... >>>>>>> ffff880859df4070: 0000000000000001 ffff880859df4078 >>>>>>> ........x@.Y.... >>>>>>> >>>>>>> If we regards it as struct ipmi_smi in shutdown process >>>>>>> it looks consistent. >>>>>>> >>>>>>> The remedy for this apparent race is affixed below. >>>>>> >>>>>> I think you are right about this problem, but in_shutdown is >>>>>> checked already >>>>>> a bit before when newmsg is extracted from the list. Wouldn't it >>>>>> be better >>>>>> to add the rcu_read_lock() region starting right before the previous >>>>>> in_shutdown check to after the send? That would avoid a leak in >>>>>> this >>>>>> case. >>>>> >>>>> While lying awake unable to sleep, I realized that you can't call the >>>>> sender function while holding rcu_read_lock(). That will break RT, >>>>> because you can't claim a mutex while holding rcu_read_lock(), >>>>> and the sender function will claim normal spinlocks. >>>>> >>>>> So I need to think about this a bit. >>>>> >>>> >>>> I was wrong about this. An rcu_read_lock() around the whole thing >>>> should >>>> be all that is required to fix this. I can do a patch, or you can, >>>> if you like. >>>> >>>> Thanks again for pointing this out. >>>> >>>> -corey >>> >>> Is this what you have in mind? >>> >>> --- >>> drivers/char/ipmi/ipmi_msghandler.c | 11 +++++++++-- >>> 1 file changed, 9 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/char/ipmi/ipmi_msghandler.c >>> b/drivers/char/ipmi/ipmi_msghandler.c >>> index 9f69995..e20f8d7 100644 >>> --- a/drivers/char/ipmi/ipmi_msghandler.c >>> +++ b/drivers/char/ipmi/ipmi_msghandler.c >>> @@ -3880,6 +3880,9 @@ static void smi_recv_tasklet(unsigned long val) >>> */ >>> if (!run_to_completion) >>> spin_lock_irqsave(&intf->xmit_msgs_lock, flags); >>> + >>> + rcu_read_lock(); >>> + >>> if (intf->curr_msg == NULL && !intf->in_shutdown) { >>> struct list_head *entry = NULL; >>> >>> @@ -3894,11 +3897,15 @@ static void smi_recv_tasklet(unsigned long val) >>> newmsg = list_entry(entry, struct ipmi_smi_msg, link); >>> intf->curr_msg = newmsg; >>> } >>> + >>> + if (newmsg) >>> + intf->handlers->sender(intf->send_info, newmsg); >>> } >>> + >>> + rcu_read_unlock(); >>> + >>> if (!run_to_completion) >>> spin_unlock_irqrestore(&intf->xmit_msgs_lock, flags); >>> - if (newmsg) >>> - intf->handlers->sender(intf->send_info, newmsg); >>> >>> handle_new_recv_msgs(intf); >>> } >> >> No, you definitely cannot call the sender function while holding the >> lock. >> >> I was talk about adding rcu_read_lock() before the spin_lock_irqsave() >> and rcu_read_unlock() after the sender() call. > > More like this, then. > > diff --git a/drivers/char/ipmi/ipmi_msghandler.c > b/drivers/char/ipmi/ipmi_msghandler.c > index 9f69995..49a7e96 100644 > --- a/drivers/char/ipmi/ipmi_msghandler.c > +++ b/drivers/char/ipmi/ipmi_msghandler.c > @@ -3878,6 +3878,9 @@ static void smi_recv_tasklet(unsigned long val) > * because the lower layer is allowed to hold locks while calling > * message delivery. > */ > + > + rcu_read_lock(); > + > if (!run_to_completion) > spin_lock_irqsave(&intf->xmit_msgs_lock, flags); > if (intf->curr_msg == NULL && !intf->in_shutdown) { > @@ -3900,6 +3903,8 @@ static void smi_recv_tasklet(unsigned long val) > if (newmsg) > intf->handlers->sender(intf->send_info, newmsg); > > + rcu_read_unlock(); > + > handle_new_recv_msgs(intf); > } > Perfect, yes. -corey >> >> IIRC, this code was moved from a section that was under the rcu read >> lock, but the lock was not added when moved. >> >> -corey >> >