Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp1111417ybg; Thu, 4 Jun 2020 01:02:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzG1XPDwFjW9kAloxum4M0vFKxueVuRiQVsVf648VSmX/IJDX95ienAhts4/ta3vGScsBJn X-Received: by 2002:a17:906:63c9:: with SMTP id u9mr2897237ejk.487.1591257739016; Thu, 04 Jun 2020 01:02:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591257739; cv=none; d=google.com; s=arc-20160816; b=VS6Sq5x/ZJiFtFpAw3rNlRUh6Od1Nplzs/AIQ/qDWWIflqZYJq5I8gmhCQSUmTEXeU YkG4A8040Lct48uyfv/I8niTsSfFEeAdKEukSlRTgaVbu9uTHffD6H2HmLwloHA2HHat 16IUG49zKN7TwhmRPY0ktu8D/jQnrWienJoNFQutetlHxM6W6BfgOJYeCpgrQko5PSnC /qPZOJ/u2dORSe+WbKfM1v/ung1JwePVQZ1jCwCpoY3QW0NS40qLmWoJVmUZvE1pncTP Cv6lwmGHJo+rveYSAHGpVeJ4xX/tdRfBxRImtNQALwxqoTW4YAY7N/81KSNSfacXCJ5v H39A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:date:from :dkim-signature; bh=VX+yGAffSeajwf7uN6dwW7IpTMshFF/QkNyfthZYOH4=; b=LvjnAQ5ug2s2qRXiTlczBOdTgojhcSi4hT1N13UM2g9U/p6+ZN8JzfNvrewEVxiGvv yeQ7uNCB4Bd2eJL5B5WDUfCOGgZJzOK/DgqD6VDxi8KwHbkAYL85yaR/VvuqyvX0oZXj 6m2hnlk326fjKxWLY9EzvSocxEyroVOZBqu1JvqZgX0eqqbD9efJVS7onKwu8j9hPAhi Zk9Y3j9q2mOBq+bmVgYtvBHrpkvejIOLDN/GhM7OpeoU46+Ho1j8IeLgokE0+qGKYCl/ zg9rl2qBU4S2G1B+5uxOY1WrKl83IYYfnm63/c0SwD2j6yp5Xzm7f9NsmrKZazbw7AqX VQxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=a7oQr88H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h11si1217278ejc.87.2020.06.04.01.01.55; Thu, 04 Jun 2020 01:02:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=a7oQr88H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726848AbgFDH70 (ORCPT + 99 others); Thu, 4 Jun 2020 03:59:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725950AbgFDH70 (ORCPT ); Thu, 4 Jun 2020 03:59:26 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1204BC05BD1E for ; Thu, 4 Jun 2020 00:59:26 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id s10so3157888pgm.0 for ; Thu, 04 Jun 2020 00:59:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VX+yGAffSeajwf7uN6dwW7IpTMshFF/QkNyfthZYOH4=; b=a7oQr88HgpJ+c8tpkiiO7Vqmf7hNy663uXVB0eFDMvkPVJjrx8LbG3zZSfC+cQ4vWi TKGCuLdwGG0fOsmbd4gGB2xZH9OrQjwL61xv/+FbVKnYD0vpJOg7T81t7zReUr0D0aKN XjU9hb09HKBHMzFKzH6odlMRSQpKjlJw6/CBP9YSxazTAbSsLasWI+nXacfTySV3mdYt Irt8c20yRlUxyIoSEiydY5AiPYMp71c88/LZ0Ggx9S3CdY2j3AaWKCYVkF68Z8DJ6UQV sPYdy+oea9ZoVuvLdZ28w4ZmWKfMqPOjDz2v8EhB7kn0A53CWl0cVgXrf9mcPQLHBo3H HyKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VX+yGAffSeajwf7uN6dwW7IpTMshFF/QkNyfthZYOH4=; b=BfMk/D9G4Yu6/4ngcKJwPSkz2k+LVEjn1qh6gZWeitdewW4VrEtzVJRzav9B1ZNbwz Zs1neqoHgU7XOt/rXVR2Do2YGjcXl1/PSfjJDKpJPEvsCWGVrRT+FX4bKcWMqGWFU83Y 0qiwDvaB6drz4vXfrzAzLCosCN+HJizlEqEb8pWNXNdWDRTYtqlcAdRye8e/YIPE58HI tDU0tt+Dm7Bi2F2nFZd1FjDsamiZWGyAiv7c/rTXIS31cniGmdTdEhgRudladOLthKbV aCKm/HL59UQyzom7cabZdDYiQSTQ0Y5xCoXvHLOA5/YpTmaAAZD9Gbc6yLZcMGenlqf/ KRFg== X-Gm-Message-State: AOAM531ogvI9GY1NKkn27mxxF11vZaqyTUoej4juW3M8dpiZI4kJaf3n eZpIjnSze5mYSwm+fRkT4Nc= X-Received: by 2002:a63:6604:: with SMTP id a4mr3344000pgc.12.1591257565526; Thu, 04 Jun 2020 00:59:25 -0700 (PDT) Received: from localhost ([2409:10:2e40:5100:6e29:95ff:fe2d:8f34]) by smtp.gmail.com with ESMTPSA id m12sm4580285pjs.41.2020.06.04.00.59.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jun 2020 00:59:24 -0700 (PDT) From: Sergey Senozhatsky X-Google-Original-From: Sergey Senozhatsky Date: Thu, 4 Jun 2020 16:59:22 +0900 To: Cheng Jian Cc: linux-kernel@vger.kernel.org, chenwandun@huawei.com, xiexiuqi@huawei.com, bobo.shaobowang@huawei.com, huawei.libin@huawei.com, pmladek@suse.com, sergey.senozhatsky@gmail.com, rostedt@goodmis.org Subject: Re: [RFC PATCH] panic: fix deadlock in panic() Message-ID: <20200604075922.GA143696@jagdpanzerIV.localdomain> References: <20200603141915.38739-1-cj.chengjian@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200603141915.38739-1-cj.chengjian@huawei.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (20/06/03 14:19), Cheng Jian wrote: > A deadlock caused by logbuf_lock occurs when panic: > > a) Panic CPU is running in non-NMI context > b) Panic CPU sends out shutdown IPI via NMI vector > c) One of the CPUs that we bring down via NMI vector holded logbuf_lock > d) Panic CPU try to hold logbuf_lock, then deadlock occurs. > > we try to re-init the logbuf_lock in printk_safe_flush_on_panic() > to avoid deadlock, but it does not work here, because : > > Firstly, it is inappropriate to check num_online_cpus() here. > When the CPU bring down via NMI vector, the panic CPU willn't > wait too long for other cores to stop, so when this problem > occurs, num_online_cpus() may be greater than 1. > > Secondly, printk_safe_flush_on_panic() is called after panic > notifier callback, so if printk() is called in panic notifier > callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops > is set, we print some debug information, it will try to hold the > logbuf_lock. > > To avoid this deadlock, drop the num_online_cpus() check and call > the printk_safe_flush_on_panic() before panic_notifier_list callback, > attempt to re-init logbuf_lock from panic CPU. We hopefully will get rid of some of these locks (around 5.9 kernel maybe), so the deadlocks (at least in the printk-code) should become less common. -ss