Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp485642imm; Mon, 1 Oct 2018 13:14:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV61yG56wG+13KzES7sMWjM/U9P7J1ac0N3XWMBJn68k860xMxZbOaj6a4utimHZnCGKUq7Rn X-Received: by 2002:a17:902:9f8c:: with SMTP id g12-v6mr13450638plq.309.1538424844964; Mon, 01 Oct 2018 13:14:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538424844; cv=none; d=google.com; s=arc-20160816; b=g76ceM41UvU+45hbCVwyZO0yllHvLi1tKI2DbUpvOiMb/zLONkjuv4n7Cm0pj0vBud 2Zvr4f0sG9Znb778E4PVjSUCqNu3kk45kMaZplVDuqdmpQVG5tur2OJ8jYYHk3rN2i93 3YbNxc81V7G/mlfp8l1NUvNQZSoWHwICF3NZHxVMjERi7Uc5hewOu9qvYu89R0YpOx91 A5qN1YpIzour3Lx3CnPK9/L+7F5YEtWS/FWP1ouHiobgiPPl2czH+x4cmS0Daf3IcS8E WKJSViy3VBKovHwTekKZc5g0VquNdcXKbNAFZVk8d30EP5OVgjZloBWyfQ04a1ktmW8W VDLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=8Cb0jxldmrYQIRviPTroPpRtwQpySksvEwZ0I5bhnmE=; b=UOd118hF7t6nzMTcX/3dUK9hrnhW/seRUrFnG0wcMrS5FNJUWtChkxJjDezwckBgeI egCaSsagRAqc/E9+W898pNodJjYLQV1ebdz3W4JNzeyrQ5NR/VKChhUiO5mXpmVxYjOv MKV8m/xJQzeEliNV9Hyrm9unmiUcZu8IVs3PKInXaiSpdxB2cxKLxhNnqd9TVx5krCyr 8+dJyCJFUUrx3Kvhvl2GJHbcL6y1V/uLeherBEWXWU6NNSdIwry2oUHQsU1BNT/EhBIH d9NRCZILwiPYV17l9D6SWWkXonK1loUjw4fLpmHJkvtXZxXzfHr7yGw4iDbqNdAV+4NH k1Mw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e124-v6si13237435pfg.216.2018.10.01.13.13.50; Mon, 01 Oct 2018 13:14:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726532AbeJBCwm (ORCPT + 99 others); Mon, 1 Oct 2018 22:52:42 -0400 Received: from atrey.karlin.mff.cuni.cz ([195.113.26.193]:51130 "EHLO atrey.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726274AbeJBCwl (ORCPT ); Mon, 1 Oct 2018 22:52:41 -0400 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 512) id 9EE8180790; Mon, 1 Oct 2018 22:13:10 +0200 (CEST) Date: Mon, 1 Oct 2018 22:13:10 +0200 From: Pavel Machek To: Steven Rostedt Cc: Daniel Wang , stable@vger.kernel.org, pmladek@suse.com, Alexander.Levin@microsoft.com, akpm@linux-foundation.org, byungchul.park@lge.com, dave.hansen@intel.com, hannes@cmpxchg.org, jack@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mathieu.desnoyers@efficios.com, mgorman@suse.de, mhocko@kernel.org, penguin-kernel@I-love.SAKURA.ne.jp, peterz@infradead.org, tj@kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, xiyou.wangcong@gmail.com, pfeiner@google.com Subject: Re: 4.14 backport request for dbdda842fe96f: "printk: Add console owner and waiter logic to load balance console writes" Message-ID: <20181001201309.GA9835@amd> References: <20180927194601.207765-1-wonderfly@google.com> <20181001152324.72a20bea@gandalf.local.home> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zYM0uCDKw75PZbzx" Content-Disposition: inline In-Reply-To: <20181001152324.72a20bea@gandalf.local.home> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon 2018-10-01 15:23:24, Steven Rostedt wrote: > On Thu, 27 Sep 2018 12:46:01 -0700 > Daniel Wang wrote: >=20 > > Prior to this change, the combination of `softlockup_panic=3D1` and > > `softlockup_all_cpu_stacktrace=3D1` may result in a deadlock when the r= eboot path > > is trying to grab the console lock that is held by the stack trace prin= ting > > path. What seems to be happening is that while there are multiple CPUs,= only one > > of them is tasked to print the back trace of all CPUs. On a machine wit= h many > > CPUs and a slow serial console (on Google Compute Engine for example), = the stack > > trace printing routine hits a timeout and the reboot path kicks in. The= latter > > then tries to print something else, but can't get the lock because it's= still > > held by earlier printing path. This is easily reproducible on a VM with= 16+ > > vCPUs on Google Compute Engine - which is a very common scenario. > >=20 > > A quick repro is available at > > https://github.com/wonderfly/printk-deadlock-repro. The system hangs 3 = seconds > > into executing repro.sh. Both deadlock analysis and repro are credits t= o Peter > > Feiner. > >=20 > > Note that I have read previous discussions on backporting this to stabl= e [1]. > > The argument for objecting the backport was that this is a non-trivial = fix and > > is supported to prevent hypothetical soft lockups. What we are hitting = is a real > > deadlock, in production, however. Hence this request. > >=20 > > [1] https://lore.kernel.org/lkml/20180409081535.dq7p5bfnpvd3xk3t@pathwa= y.suse.cz/T/#u > >=20 > > Serial console logs leading up to the deadlock. As can be seen the stac= k trace > > was incomplete because the printing path hit a timeout. >=20 > I'm fine with having this backported. Dunno. Is the patch perhaps a bit too complex? This is not exactly trivial bugfix. pavel@duo:/data/l/clean-cg$ git show dbdda842fe96f | diffstat printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- I see that it is pretty critical to Daniel, but maybe kernel with console locking redone should no longer be called 4.4? Pavel --=20 (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --zYM0uCDKw75PZbzx Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAluyf9UACgkQMOfwapXb+vIctACePrQsLeBiFBo/2uPqXXACActz jsQAoJ24Q+l/v+gk5q+VGhyCWhwLu+if =TaX0 -----END PGP SIGNATURE----- --zYM0uCDKw75PZbzx--