Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2889078imm; Wed, 3 Oct 2018 10:39:04 -0700 (PDT) X-Google-Smtp-Source: ACcGV6090ldnYhVG46NDlSJh/BK5rlk9TvFGjq+7X0O03eOa1fx68BX824sj6FiJ+gnBmcc9Csey X-Received: by 2002:a17:902:5e3:: with SMTP id f90-v6mr2683797plf.222.1538588344872; Wed, 03 Oct 2018 10:39:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538588344; cv=none; d=google.com; s=arc-20160816; b=yp13qcWwUJSIyPITYVIaODYT90teoiXd/1AIHnlAs0+BVmJVqB3bdFr0sUOMdMD0a6 3Eu09Rkj+uzAfyUvINiyjAGR3CEx3UNQnHL/RLyP0DjQltLrqzH3/g7fdZedd63l9iZP GEin/1OebsJyZeY41FBdbzGwR6TZMY5nX3GjLf1qNMGpojm2nzMnV0ZECfWlL04FEFrg gZneDKLvfW54ftd74Vbx/JNOHeh8V9fRa/7ozfxCtN+2UdxZxrVhc8SHN30IsPSAz6tX BTbvNPhrhlhAZqTMufQMYxn7OvpF+4AS9u8cKXcTTC2GEhpsGSUjcJDiSpsvxvv6fRQ2 rNsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=Dgq+XN/3XpubzKk4Va0+ksMJE90cdSYrw6uryozc5cI=; b=NHd1jAQMwhL5yUaHnWHM8yVQLNchuOm19SI67HI2ag1DuP5dLyAqX4EHxfJ8xA1mvI 5AfYmGFr4CU4bzpHt55lYuBzlbYRoR+9Wu+fbFa585ROpSUZ3SHlBEWsbnVzUUlHSi2d Ds1h1XSejBdUR8KWyCCR8CF+8hFwLnFK85MfjHErSyjb+J2OdhOsWReDUDPuV3iSWpIk KZV52C2vnKO6zGyLwz19W+WQuC+ax81z7jePxXY9c9cfoIsa9gM1dyeDH6/zj//XbIfq 76AohIpq7kMd4D6SeI7V+RxqyzG+H+dL8fZSBCSKRd813y2Td0/xu6GFl5uUFiQUvcSH JIIQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z189-v6si2488804pfb.26.2018.10.03.10.38.47; Wed, 03 Oct 2018 10:39:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727005AbeJDA0c (ORCPT + 99 others); Wed, 3 Oct 2018 20:26:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:34228 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726851AbeJDA0b (ORCPT ); Wed, 3 Oct 2018 20:26:31 -0400 Received: from gandalf.local.home (cpe-66-24-56-78.stny.res.rr.com [66.24.56.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7F7A42089F; Wed, 3 Oct 2018 17:37:06 +0000 (UTC) Date: Wed, 3 Oct 2018 13:37:04 -0400 From: Steven Rostedt To: Daniel Wang Cc: Petr Mladek , stable@vger.kernel.org, Alexander.Levin@microsoft.com, akpm@linux-foundation.org, byungchul.park@lge.com, dave.hansen@intel.com, hannes@cmpxchg.org, jack@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mathieu Desnoyers , Mel Gorman , mhocko@kernel.org, pavel@ucw.cz, penguin-kernel@i-love.sakura.ne.jp, peterz@infradead.org, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, Cong Wang , Peter Feiner Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes" Message-ID: <20181003133704.43a58cf5@gandalf.local.home> In-Reply-To: References: <20180927194601.207765-1-wonderfly@google.com> <20181001152324.72a20bea@gandalf.local.home> <20181002084225.6z2b74qem3mywukx@pathway.suse.cz> <20181002212327.7aab0b79@vmware.local.home> <20181003091400.rgdjpjeaoinnrysx@pathway.suse.cz> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 3 Oct 2018 10:16:08 -0700 Daniel Wang wrote: > On Wed, Oct 3, 2018 at 2:14 AM Petr Mladek wrote: > > > > On Tue 2018-10-02 21:23:27, Steven Rostedt wrote: > > > I don't see the big deal of backporting this. The biggest complaints > > > about backports are from fixes that were added to late -rc releases > > > where the fixes didn't get much testing. This commit was added in 4.16, > > > and hasn't had any issues due to the design. Although a fix has been > > > added: > > > > > > c14376de3a1 ("printk: Wake klogd when passing console_lock owner") > > > > As I said, I am fine with backporting the console_lock owner stuff > > into the stable release. > > > > I just wonder (like Sergey) what the real problem is. The console_lock > > owner handshake is not fully reliable. It is might be good enough I'm not sure what you mean by 'not fully reliable' > > to prevent softlockup. But we should not relay on it to prevent > > a deadlock. > > Yes. I myself was curious too. :) > > > > > My new theory ;-) > > > > printk_safe_flush() is called in nmi_trigger_cpumask_backtrace(). > > => watchdog_timer_fn() is blocked until all backtraces are printed. > > > > Now, the original report complained that the system rebooted before > > all backtraces were printed. It means that panic() was called > > on another CPU. My guess is that it is from the hardlockup detector. > > And the panic() was not able to flush the console because it was > > not able to take console_lock. > > > > IMHO, there was not a real deadlock. The console_lock owner > > handshake jsut helped to get console_lock in panic() and > > flush all messages before reboot => it is reasonable > > and acceptable fix. Agreed. > > I had the same speculation. Tried to capture a lockdep snippet with > CONFIG_PROVE_LOCKING turned on but didn't get anything. But > maybe I was doing it wrong. > > > > > Just to be sure. Daniel, could you please send a log with > > the console_lock owner stuff backported? There we would see > > who called the panic() and why it rebooted early. > > Sure. Here is one. It's a bit long but complete. I attached another log > snippet below it which is what I got when `softlockup_panic` was turned > off. The log was from the IRQ task that was flushing the printk buffer. I > will be taking a closer look at it too but in case you'll find it helpful. Just so I understand correctly. Does the panic hit with and without the suggested backport patch? The only difference is that you get the full output with the patch and limited output without it? -- Steve