Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp540102ybg; Wed, 3 Jun 2020 07:21:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwv1aeVSLG6OMSfNbrCDh1smuOG3Xj0UKCfL7b7mzkJEEaY9vPJiV8Ja4TjjkwLQ9Nx4cSN X-Received: by 2002:aa7:d785:: with SMTP id s5mr17680806edq.17.1591194098980; Wed, 03 Jun 2020 07:21:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591194098; cv=none; d=google.com; s=arc-20160816; b=cjprbf1+4sc4CtiscCH63ZzOmk7wdfDH/DXA5RQgcQYGugW3V+VIaGmHHbfAemSbei vqSbkUly8N1bASGXkS54HNZtsCHC2UDeRzdYwrssS9I9+pkU7GBm0kUBoUqHjFb/zF4F RD7UySG7Qv7GxOq3S5wgcfP8UwtZKkVIMX6p3h+GIUUa/1X5Yl99c3tpwDlfAvBNSoG0 aTVWFuXenctIRBfd7/RErdq+aHISzXxGhKRa4lkDmWjhYRr4XhpumH87weERnvVdl0e6 1TS70m7yrLBNK0UXrJOIzJPF+FubuvCS4Ps8bJW+WPGu3ybdHZRnyK/2mSG4+pHVeiMN 5LLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=rlgbnDOCvijlMzaJ+VV+e3yeiYqhq+WW35K2afVNAGI=; b=KOWoC6/EkUJiAwJGhbn5fQGfxzv4mE0uLCWOOXaT1GYQeDlXRgwHsQ1KfLw2Z97qIF xM4ciH+Q9MFgFZQTDRXnEJVcW2hn2uDpU1GHacWXpMm+Bv+EtAupLiWfaC2Qq5kuMFKl avSSu3l1WIrNPqCgAAV56NHq2tt/XSDKG2WkuGfQRdPZUh8R3cHvTDviAi3O2D41qfky MLrLXsIngd+JuDuF9H5P5cOK9pb6hBC2+wAboGpur09KfH8sbi2WKX2Ttl494ugipNvm a21BCHBD9gxRsBThRU0iBVcxv6SaPaulU1auTe2QU4c9gtqh6FfQl0l0XlAGBnaiqbib PEJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ks19si1177329ejb.409.2020.06.03.07.21.16; Wed, 03 Jun 2020 07:21:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726077AbgFCOT3 (ORCPT + 99 others); Wed, 3 Jun 2020 10:19:29 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5848 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725930AbgFCOT2 (ORCPT ); Wed, 3 Jun 2020 10:19:28 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 1A584F554C894F8F56DA; Wed, 3 Jun 2020 22:19:26 +0800 (CST) Received: from huawei.com (10.175.124.27) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.487.0; Wed, 3 Jun 2020 22:19:15 +0800 From: Cheng Jian To: CC: , , , , , , , Subject: [RFC PATCH] panic: fix deadlock in panic() Date: Wed, 3 Jun 2020 14:19:15 +0000 Message-ID: <20200603141915.38739-1-cj.chengjian@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.124.27] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A deadlock caused by logbuf_lock occurs when panic: a) Panic CPU is running in non-NMI context b) Panic CPU sends out shutdown IPI via NMI vector c) One of the CPUs that we bring down via NMI vector holded logbuf_lock d) Panic CPU try to hold logbuf_lock, then deadlock occurs. we try to re-init the logbuf_lock in printk_safe_flush_on_panic() to avoid deadlock, but it does not work here, because : Firstly, it is inappropriate to check num_online_cpus() here. When the CPU bring down via NMI vector, the panic CPU willn't wait too long for other cores to stop, so when this problem occurs, num_online_cpus() may be greater than 1. Secondly, printk_safe_flush_on_panic() is called after panic notifier callback, so if printk() is called in panic notifier callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops is set, we print some debug information, it will try to hold the logbuf_lock. To avoid this deadlock, drop the num_online_cpus() check and call the printk_safe_flush_on_panic() before panic_notifier_list callback, attempt to re-init logbuf_lock from panic CPU. Signed-off-by: Cheng Jian --- kernel/panic.c | 3 +++ kernel/printk/printk_safe.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index b69ee9e76cb2..8dbcb2227b60 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -255,6 +255,9 @@ void panic(const char *fmt, ...) crash_smp_send_stop(); } + /* Call flush even twice. It tries harder with a single online CPU */ + printk_safe_flush_on_panic(); + /* * Run any panic handlers, including those that might need to * add information to the kmsg dump output. diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index d9a659a686f3..9ebc1723e1a4 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -269,9 +269,6 @@ void printk_safe_flush_on_panic(void) * Do not risk a double release when more CPUs are up. */ if (raw_spin_is_locked(&logbuf_lock)) { - if (num_online_cpus() > 1) - return; - debug_locks_off(); raw_spin_lock_init(&logbuf_lock); } -- 2.17.1