Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4201421pxb; Mon, 8 Feb 2021 10:14:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzNzzM6WlaQh5cNtfw3weAH5Xw2ZCpJmIXwb71V3dpTnnj681bZC4vSuAJRXbM/T0nWF7ss X-Received: by 2002:a17:906:85d8:: with SMTP id i24mr8186751ejy.115.1612808071670; Mon, 08 Feb 2021 10:14:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612808071; cv=none; d=google.com; s=arc-20160816; b=STHJChskoDLK9ElCPeOiLekY45IHGmjt4JsJOLoEhR5nsAqdl0k7JOwLQYKCg0umU3 SCvr2PjZ+d99d/bzWnt7hC/MEo/yTniTNCP3n+EJ6aIseICdj+wyyM0Md7oOsgdkzaIc pRqGw9F5qVXOPLvMVR0eW6pmY4YzvPKaweDJuczJmuAwgfCrdzXQgT18etlb2Cu8xFJZ 8r6/4RLjbcArdcaMkXbmv21UOTwe67R6DpPnvKaHsRJiu7kAhf8hNgFjLKhgSfcoQ7wE DW5JOxk/0x+9MU63MHNF8Dl10QOvEaIG7kOsmgwNGUObmwALMYMCMpFNJ8iL5ECTU09T Nawg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=/Qc/45eRqfQBEqGJzHaK7yd+vGYvqGu3tixgC95meQM=; b=BmTEJqUbg483fajSlQoyT4fjeABBj/9rxticyWKewWEDG80k9BOjtFOyNpNY/ELB/9 iK28WoXCGZQ1capHMg1WOR3mAKqmc+ukLR0ybAtAY0G4mHKKx5IWG0UOMkuCnYLolqWw 7kxvSvn3fJpEyGGFECzaqiqnxU7c0ZbAtM50yjygpwOZ7VZSjGmaK536f27z4GVYGm8p MnxENvF3lWyfIlKSXstNYT3s3hK+ETEjMJCWBaJkBnWalceuO25O8sIZ219xw5BlAJk2 8mENbFNppPPusYCyc4i1X9PpdmuaJaeuMv7+OvWtxJx2jP5Bd7MWhRMPAWlI1oLAQxTA 3CVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=ADAdFY96; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i10si13778705ejd.325.2021.02.08.10.14.07; Mon, 08 Feb 2021 10:14:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=ADAdFY96; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230394AbhBHSMn (ORCPT + 99 others); Mon, 8 Feb 2021 13:12:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234065AbhBHPl0 (ORCPT ); Mon, 8 Feb 2021 10:41:26 -0500 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2039C061786 for ; Mon, 8 Feb 2021 07:40:43 -0800 (PST) Received: by mail-pj1-x102e.google.com with SMTP id t2so3825933pjq.2 for ; Mon, 08 Feb 2021 07:40:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/Qc/45eRqfQBEqGJzHaK7yd+vGYvqGu3tixgC95meQM=; b=ADAdFY96oK/I7o3ug5tJCI7osmm84XxY6ZcJC1dr4wp6o5mwqhiAYugzWpzgFwyunC yTTyCZolYCEXb0kYEL0SQmTfBsVq26Ucyq3osiHJGYd0PMeu2Umivj0RyLa7TePa62Ug oHNg9zblMo28u90xXEhH3HawKmkSWlNGl77MSEa8O1qGDw49+21SpVx3NaMGIIxTRWae Hvbr6lsA+MQnxlXQjsuTANo6nw3cEJLkQNeFJIWOIzGUkHzze2RLN40mPePiDoRtAIuf B6UTnJo9LBBRA7mzAQdgV8vwEigVzvTP96KXzRDHOTVnWEG9eM7jTTV0apzhCdwIatmx U0zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/Qc/45eRqfQBEqGJzHaK7yd+vGYvqGu3tixgC95meQM=; b=lnq52uZJQ2XAbarFp24Td+kjK5r+WAIS8EWP3r+4L5nijI8Qy6M/QQIs6uWiLZDTtd r9O1oABsgIXhZPWdc2IXLDWAcBVxqbl9PPQdVFmJIWeiorGAqUBsFWydhZakQbqu3/7i +ywazgz7kHVuSIn+yDoOOpsBuzLcjmCu4zz+CfTthiQT4RxCU1gveEytFbkeIlRb88Uu pwOeAeHzLDymCuMF62r5h8UPfFQLOx5L8MmYhIxhY63AlYVX4ozV6WsnuYj+6kspTpKa u3zecefVK2uxcv8NeGPmNRHU/VW8miN8uQB1KlIs7SQIwAS1gbEo1XAHqJ6yaLIaiHMJ faFg== X-Gm-Message-State: AOAM533+QO/HPOW+0AMuMQujL2jjCdhI9/rRpmwwQ61TZLCXcI4MlJsw aLFNPPdp7UqZu5LywOQEofh/9Cnhmwz6Ak+VpIkBFA== X-Received: by 2002:a17:90a:b702:: with SMTP id l2mr17651515pjr.13.1612798843383; Mon, 08 Feb 2021 07:40:43 -0800 (PST) MIME-Version: 1.0 References: <20210206054124.6743-1-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Mon, 8 Feb 2021 23:40:07 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v2] printk: fix deadlock when kernel panic To: Sergey Senozhatsky Cc: Petr Mladek , Steven Rostedt , john.ogness@linutronix.de, Andrew Morton , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 8, 2021 at 9:12 PM Sergey Senozhatsky wrote: > > On (21/02/08 16:49), Muchun Song wrote: > > On Mon, Feb 8, 2021 at 2:38 PM Sergey Senozhatsky > > wrote: > > > > > > On (21/02/06 13:41), Muchun Song wrote: > > > > We found a deadlock bug on our server when the kernel panic. It can be > > > > described in the following diagram. > > > > > > > > CPU0: CPU1: > > > > panic rcu_dump_cpu_stacks > > > > kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace > > > > register_nmi_handler(crash_nmi_callback) printk_safe_flush > > > > __printk_safe_flush > > > > raw_spin_lock_irqsave(&read_lock) > > > > // send NMI to other processors > > > > apic_send_IPI_allbutself(NMI_VECTOR) > > > > // NMI interrupt, dead loop > > > > crash_nmi_callback > > > > > > At what point does this decrement num_online_cpus()? Any chance that > > > panic CPU can apic_send_IPI_allbutself() and printk_safe_flush_on_panic() > > > before num_online_cpus() becomes 1? > > > > I took a closer look at the code. IIUC, It seems that there is no point > > which decreases num_online_cpus. > > So then this never re-inits the safe_read_lock? Right. If we encounter this case, we do not flush printk buffer. So, it seems my previous patch is the right fix. Right? https://lore.kernel.org/patchwork/patch/1373563/ > > if (num_online_cpus() > 1) > return; > > debug_locks_off(); > raw_spin_lock_init(&safe_read_lock); > > -ss