Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2810798ybi; Mon, 1 Jul 2019 19:47:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqwOAV4iICPsrUXJHnQcea2FSfkZRqmVaiuuTvVM3DzDm0P1OAe/o9I6CePo5csjlwHJzUJe X-Received: by 2002:a17:902:aa0a:: with SMTP id be10mr32923736plb.27.1562035669176; Mon, 01 Jul 2019 19:47:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562035669; cv=none; d=google.com; s=arc-20160816; b=uMg+gmUQAgAD5gRTuTxxZcmBK1KIN/sR7m9hMBJBHDhHNDkZng5W2JjOFNyIVj9BoI ol88FTFnAScHJ28DuR5mqvVfLSK39PgYhpIvpun9elbFujR42fIHlyGHyh6GxHb14cnk IUfFD5KwNJVdem/v32BXZ0eREUPh55lTkbk4c+0d9tD8IAVegdcHWa92H3MlSEJgqcy5 c/ZvqOoeS7m588xqN6H7fgjwrpUjJv5a0yk7GHgy6NcZSOn88Nt7OH9IfrdtGoAimZOd 15AgWRPxl2zPEFU4wHXoze0UlVeVSyDf/UGaA6edVlkuiWAT6ydYOc7rBsjVOw+gzK6s 7Gow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=SKvLKCFlZWwuszWDj/mzJXUxO3jLZHsr6BNDLf2H0h8=; b=tGnbJGiiyl7vKSiQMIx8N2DZtOvw/LNm+dnvja9oE6fgkGy6/41NtbFMdzX/zntzrF 4ikyr4bEnirD8PNHqcGWkeAUP3uWEUR59uF8WbCy2dG8WgtFadxIuzJjSrsuB0P/FvVp u9OUC8sbnADJ5gSILbIf/YwS52zwNrUdH19/0sSC26ruHdHyGxmZYXe7FiRxcULTY55m 4+Q5F/gYC0SmdxhugBbswTjquwzArA7p3oOlliNrXrUSJvxvf8ozUaSMhH591TmmwCyd cjSDGAOK/ZbDD2ZWf39yZHirmpgGukNcjgQZ1ja5RqAEtl8mX0q97N3lnqroWfFuF9qv rPoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 36si11510063pgn.174.2019.07.01.19.47.33; Mon, 01 Jul 2019 19:47:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726936AbfGBCrH (ORCPT + 99 others); Mon, 1 Jul 2019 22:47:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53998 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726434AbfGBCrH (ORCPT ); Mon, 1 Jul 2019 22:47:07 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4ECDE2F8BEB; Tue, 2 Jul 2019 02:47:07 +0000 (UTC) Received: from localhost (ovpn-12-52.pek2.redhat.com [10.72.12.52]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B6A0A60C44; Tue, 2 Jul 2019 02:47:04 +0000 (UTC) Date: Tue, 2 Jul 2019 10:47:02 +0800 From: Baoquan He To: Dave Young Cc: airlied@redhat.com, kexec@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: mgag200 fails kdump kernel booting Message-ID: <20190702024702.GD3178@localhost.localdomain> References: <20190626081522.GX24419@MiWiFi-R3L-srv> <20190702022140.GA3327@dhcp-128-65.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190702022140.GA3327@dhcp-128-65.nay.redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 02 Jul 2019 02:47:07 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/02/19 at 10:21am, Dave Young wrote: > On 06/26/19 at 04:15pm, Baoquan He wrote: > > Hi Dave, > > > > We met an kdump kernel boot failure on a lenovo system. Kdump kernel > > failed to boot, but just reset to firmware to reboot system. And nothing > > is printed out. > > > > The machine is a big server, with 6T memory and many cpu, its graphic > > driver module is mgag200. > > > > When added 'earlyprintk=ttyS0' into kernel command line, it printed > > out only one line to console during kdump kernel booting: > > KASLR disabled: 'nokaslr' on cmdline. > > > > Then reset to firmware to reboot system. > > > > By further code debugging, the failure happened in > > arch/x86/boot/compressed/misc.c, during kernel decompressing stage. It's > > triggered by the vga printing. As you can see, in __putstr() of > > arch/x86/boot/compressed/misc.c, the code checks if earlyprintk= is > > specified, and print out to the target. And no matter if earlyprintk= is > > added or not, it will print to VGA. And printing to VGA caused it to > > reset to firmware. That's why we see nothing when didn't specify > > earlyprintk=, but see only one line of printing about the 'KASLR > > disabled'. > > > > To confirm it's caused by VGA printing, I blacklist the mgag200 by > > writting it into /etc/modprobe.d/blacklist.conf. The kdump kernel can > > boot up successfully. And add 'nomodeset' can also make it work. So it's > > for sure mgag driver or related code have something wrong when booting > > code tries to re-init it. > > > > This is the only case we ever see, tend to pursuit fix in mgag200 driver > > side. Any idea or suggestion? We have two machines to be able to > > reproduce it stablly. > > Personally I think early code should not blindly do vga writing, there > are cases that does not work: > 1. efi booted machine, just no output > 2. kdump kernel booted, writing to vga caused undefined state, for > example in your case it caused a system reset. > > So I suggest only write to vga when we see earlyprintk=vga in kernel > cmdline. I remember one customer ever attached a picture of kernel booting hang from monitor. I planned to disable vga when it's not specified, but changed my mind because not all machines are servers w/o monitor. Still there are many people using laptop, PC, they have vga printing, possibly have no console. When crash happened, maybe randomly, the vga printing could be the only witness. In above listed cases, case 1 doesn't output, seems efi need be fixed, but I can't see why it matters here. About case 2, do you have a specific example, except of this one? Printing to vga has been done so long time, if it does cause troubles, we need to mute it now. Thanks Baoquan