Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp315437pxb; Tue, 9 Feb 2021 00:41:28 -0800 (PST) X-Google-Smtp-Source: ABdhPJzJQJ3hEhfbBsGe1vDGBaBjAvQEpMdqFAWDDmWA9j6ECs06MtysBTL8aMobxYcxN72gTVwT X-Received: by 2002:aa7:d692:: with SMTP id d18mr21857905edr.327.1612860088403; Tue, 09 Feb 2021 00:41:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612860088; cv=none; d=google.com; s=arc-20160816; b=g0T+piuMBo9MM8L4grJd3pH54PaN6Iw053u1L3h+NdrqZ5LhUHEjv5UV6RM9VmKTU2 derQXZlFGlJexwNjxvmBIou7BLExQKNq/GZSyeL1gPx1Hi/K488/7G/dX9gcxPZezSMs ijXCFr8v+A3MuGjlRQY6aOp0v29fCAwWHTFq6GHTkbzI5LY9qfuTDlT8qhG8yAT9AvqV QafzoE7K8ufadSPZUB3AjIm/4KfLXgzNSXY2xUqTBUd4s9hdJSJJkCiUMafr0M+U9nEI 5NoklJtWKXHIIES+TdpIw0gts7KGenK6y5h056id7s7ItMMZHRL1R6EZKZfB/biUo0FS RNDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=M/TCQMsEKZeWKYWA7PitmpFz+lfBIV2PUxzKNcHqcSA=; b=DDROK4M0XVrT5QzpKunMdM7pBj4xc9dTdn1IzmPJxCMTRABv6Ta7Qtlh7SodBadZGX 3GxINF5IBEn0nYHlAJ3uVqPe3tvA0HV8HGUBh48eUcga3avg8gNLPR8LykAlyNVHPqBm U/iINVvoVfi8KKPYTIlYSOsjbQQ8BIR72zkUyLC/m29n+cOPNF//AmjSXZPw2wnt8InQ Y9oQXVvTVOS6PMBtPsIFJfughhEiy34djp1AWeVU10EtYDx/UqSXgNfGcaqCXz+joFzQ zaI49ya5uJSX1s2NU2hhY+DA2VYHA4VQLJF5qbSzGDrtqIhE5/lAYzmFYwW78UNVccBj q3Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=KZ7MzYv+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l21si10480433ejg.481.2021.02.09.00.41.05; Tue, 09 Feb 2021 00:41:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=KZ7MzYv+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229878AbhBIIka (ORCPT + 99 others); Tue, 9 Feb 2021 03:40:30 -0500 Received: from mx2.suse.de ([195.135.220.15]:52966 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229759AbhBIIkY (ORCPT ); Tue, 9 Feb 2021 03:40:24 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612859975; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M/TCQMsEKZeWKYWA7PitmpFz+lfBIV2PUxzKNcHqcSA=; b=KZ7MzYv+TlxppKhZLwDSyysF5uPVvVC7N2pC83pC/uTDWki/80BBoVFPdQBTjNmrVy8CT0 y8vV0l+DqBOHYgEUOBarOoaQMVE+UkrpmLSVuQZjJ6K0hiBYKCywhHW5qvRjmBxqZ5HCFr plK6sLPUIvTOai6G3U2VGlUe2lbO9so= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 94141AD3E; Tue, 9 Feb 2021 08:39:35 +0000 (UTC) Date: Tue, 9 Feb 2021 09:39:34 +0100 From: Petr Mladek To: Muchun Song Cc: Sergey Senozhatsky , Steven Rostedt , john.ogness@linutronix.de, Andrew Morton , LKML Subject: Re: [External] Re: [PATCH v2] printk: fix deadlock when kernel panic Message-ID: References: <20210206054124.6743-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2021-02-08 23:40:07, Muchun Song wrote: > On Mon, Feb 8, 2021 at 9:12 PM Sergey Senozhatsky > wrote: > > > > On (21/02/08 16:49), Muchun Song wrote: > > > On Mon, Feb 8, 2021 at 2:38 PM Sergey Senozhatsky > > > wrote: > > > > > > > > On (21/02/06 13:41), Muchun Song wrote: > > > > > We found a deadlock bug on our server when the kernel panic. It can be > > > > > described in the following diagram. > > > > > > > > > > CPU0: CPU1: > > > > > panic rcu_dump_cpu_stacks > > > > > kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace > > > > > register_nmi_handler(crash_nmi_callback) printk_safe_flush > > > > > __printk_safe_flush > > > > > raw_spin_lock_irqsave(&read_lock) > > > > > // send NMI to other processors > > > > > apic_send_IPI_allbutself(NMI_VECTOR) > > > > > // NMI interrupt, dead loop > > > > > crash_nmi_callback > > > > > > > > At what point does this decrement num_online_cpus()? Any chance that > > > > panic CPU can apic_send_IPI_allbutself() and printk_safe_flush_on_panic() > > > > before num_online_cpus() becomes 1? > > > > > > I took a closer look at the code. IIUC, It seems that there is no point > > > which decreases num_online_cpus. > > > > So then this never re-inits the safe_read_lock? Yes, but it will also not cause the deadlock. printk_safe_flush_on_panic() will return without flushing the buffers. > Right. If we encounter this case, we do not flush printk > buffer. So, it seems my previous patch is the right fix. > Right? > > https://lore.kernel.org/patchwork/patch/1373563/ No, there is a risk of deadlock caused by logbuf_lock, see https://lore.kernel.org/lkml/YB0nggSa7a95UCIK@alley/ > > if (num_online_cpus() > 1) > > return; > > > > debug_locks_off(); > > raw_spin_lock_init(&safe_read_lock); > > > > -ss I prefer this approach. It is straightforward because it handles read_lock the same way as logbuf_lock. IMHO, it is not worth inventing any more complexity. Both logbuf_lock and read_lock are obsoleted by the lockless ringbuffer. And we need something simple to get backported to the already released kernels. Best Regards, Petr