Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp440242imm; Mon, 1 Oct 2018 12:23:51 -0700 (PDT) X-Google-Smtp-Source: ACcGV61D6OU4G/73plPYPNSyS0NPdnQatCzf/L6UnzSeCVBgSqKB225J3ScUoG3EAlvp2zhFAsgW X-Received: by 2002:a62:c08b:: with SMTP id g11-v6mr12875884pfk.72.1538421831020; Mon, 01 Oct 2018 12:23:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538421830; cv=none; d=google.com; s=arc-20160816; b=gsLkKBVJA/RX+H7VnfIAeP1BM84bVAH8PFvVyI5RtghOzggR4NcHxIL9v/TkwUIhIm 3xaO1KtbtNsBrCJpNieEYy1g+xeG/uMh8Cx3Ll4w+jNO5n7xTQcAGrycKRLlQWTF3qZ7 Z9gbAqHvq1GcQr+gunhMNqQ/CURgleZ1pYirz2VL92hulkz9hjCTf+p3qWqTN1LUb0m2 n5NCEoIxOGYIku+cppbzyZWPBDMIj3TnJZn7cbmhY6bz++jyl/NaSRAfQNqc36aL0EpU a8GpF50mjSGd7ksU/rA1nl5NKmWvEHrulbT6og+fwUtY4exvMEf41qhmg9QK09aTnnHw V5Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=RkeZFdnMQx5TS31nwL61HZG2LhNiNAFRWxlb3Wut3nE=; b=xkQ+xHw8PTS9N6FjfbuB+KmfnT8GMCTMPeY2zoigAIVGqO+ky23GfKLbQBvmazMed5 Y9hB+CAbbfh3F373t0hEQVjPDbsljOhY6YzsoOmbV65el2+dz9lH05Xyb9pJoIX6r/Ek x6WOX0B5pgaW8YXMFLcblfg5lDGgMvFThViHWQhz7mxJegJm+I/mL3Nck9tZ+c8TWFC5 ZwxAMu0lb37Fzgri+oTaaozohzXRyTJP30xLHO/7lrDUqcFmS1Jn2SPiFoVTx/zF0fDZ ymscTEDE079aqCsdtzJpUWgqRm+YlO25ga/GBw/OvUu+L/knDsum7SYv9aZnLzB9DS5Z Pk7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v16-v6si12995729pgb.96.2018.10.01.12.23.35; Mon, 01 Oct 2018 12:23:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726287AbeJBCCo (ORCPT + 99 others); Mon, 1 Oct 2018 22:02:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:35116 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725975AbeJBCCo (ORCPT ); Mon, 1 Oct 2018 22:02:44 -0400 Received: from gandalf.local.home (cpe-66-24-56-78.stny.res.rr.com [66.24.56.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id C1C672084C; Mon, 1 Oct 2018 19:23:25 +0000 (UTC) Date: Mon, 1 Oct 2018 15:23:24 -0400 From: Steven Rostedt To: Daniel Wang Cc: stable@vger.kernel.org, pmladek@suse.com, Alexander.Levin@microsoft.com, akpm@linux-foundation.org, byungchul.park@lge.com, dave.hansen@intel.com, hannes@cmpxchg.org, jack@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mathieu.desnoyers@efficios.com, mgorman@suse.de, mhocko@kernel.org, pavel@ucw.cz, penguin-kernel@I-love.SAKURA.ne.jp, peterz@infradead.org, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, xiyou.wangcong@gmail.com, pfeiner@google.com Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes" Message-ID: <20181001152324.72a20bea@gandalf.local.home> In-Reply-To: <20180927194601.207765-1-wonderfly@google.com> References: <20180927194601.207765-1-wonderfly@google.com> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 27 Sep 2018 12:46:01 -0700 Daniel Wang wrote: > Prior to this change, the combination of `softlockup_panic=1` and > `softlockup_all_cpu_stacktrace=1` may result in a deadlock when the reboot path > is trying to grab the console lock that is held by the stack trace printing > path. What seems to be happening is that while there are multiple CPUs, only one > of them is tasked to print the back trace of all CPUs. On a machine with many > CPUs and a slow serial console (on Google Compute Engine for example), the stack > trace printing routine hits a timeout and the reboot path kicks in. The latter > then tries to print something else, but can't get the lock because it's still > held by earlier printing path. This is easily reproducible on a VM with 16+ > vCPUs on Google Compute Engine - which is a very common scenario. > > A quick repro is available at > https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 seconds > into executing repro.sh. Both deadlock analysis and repro are credits to Peter > Feiner. > > Note that I have read previous discussions on backporting this to stable [1]. > The argument for objecting the backport was that this is a non-trivial fix and > is supported to prevent hypothetical soft lockups. What we are hitting is a real > deadlock, in production, however. Hence this request. > > [1] https://lore.kernel.org/lkml/20180409081535.dq7p5bfnpvd3xk3t@pathway.suse.cz/T/#u > > Serial console logs leading up to the deadlock. As can be seen the stack trace > was incomplete because the printing path hit a timeout. I'm fine with having this backported. -- Steve