Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753327AbbGAHoZ (ORCPT ); Wed, 1 Jul 2015 03:44:25 -0400 Received: from mga09.intel.com ([134.134.136.24]:8902 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753257AbbGAHoP (ORCPT ); Wed, 1 Jul 2015 03:44:15 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,383,1432623600"; d="scan'208";a="753852940" From: Rui Wang To: daniel.vetter@ffwll.ch Cc: bp@alien8.de, tony.luck@intel.com, airlied@redhat.com, robdclark@gmail.com, matthew.d.roper@intel.com, gong.chen@intel.com, linux-kernel@vger.kernel.org, Rui Wang Subject: Re: drm/mgag200: doesn't work in panic context Date: Wed, 1 Jul 2015 15:26:03 +0800 Message-Id: <1435735563-5820-1-git-send-email-rui.y.wang@intel.com> X-Mailer: git-send-email 1.7.5.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2779 Lines: 53 On Tuesday, June 30, 2015 11:24 PM, Daniel Vetter wrote: > On Tue, Jun 30, 2015 at 9:23 AM, Rui Wang wrote: > > But einj does something more than what an IPI can do, it injects hardware > > errors which trigger exceptions in NMI context... and the exception handler > > usually panics on fatal errors. And the display may be the only way to catch > > what has happened. I'm just hoping that the future version may work in > > NMI context. > > NMI sounds ... ambigous ;-) But yeah if we can somehow inject > something as an NMI too then that would be even better. What I want to > avoid is forcing reboots, since that means you can't run a basic > modeset test afterwards to make sure nothing was trampled too badly. > Of course we'd have replace the screen contents, but the important > part is that the panic handler doen't touch anything if the driver is > in modeset code right now (because it'll massively increase the risk > of dying completely), and an easy way to check that it didn't step all > over modeset state unduly is to do a modeset afterwards. If that works > we'll be fine. > > Also with that approach we can make sure that no real errors get into > dmesg (as opposed to a real panic), which means we can capture dmesg > afterwards and if there is a seroius log message (or even backtrace) > then drm panic handling has a bug. > > All that isn't possible when we force a real panic to happen. > > Actually thinking more about NMI that shouldn't be a problem. The > important thing with nmi vs. hardirq is that you can't even reliably > grab an irqsave spinlock, it's trylocks all the way down. But that > also holds for the panic handler, it's trylocks only. Could we somehow > just check that using lockdep - is there an NMI lockdep context > somewhere we could fake-grab? That's another upside of using an IPI > btw: Real panics kill lockdep ;-) Einj is supported by ACPI in combination with the hardwre. The injected errors result in true MCEs, truly non-maskable. Lockdep might not be useful in this case. Corrected Errors (CEs) don't result in panic but I guess it might be possible to let it invoke your future mode-setting code for testing purpose, without rebooting. (Notice that MCEs can happen right from inside your mode-setting code while accessing any memory address) But anyway we're not looking for a 100% working solution so if it could only work in normal irq or ipi context, it'd already be a big plus compared to what we have now. Thanks Rui -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/