Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753177AbbGAJ7m (ORCPT ); Wed, 1 Jul 2015 05:59:42 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:35179 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751696AbbGAJ7e (ORCPT ); Wed, 1 Jul 2015 05:59:34 -0400 MIME-Version: 1.0 X-Originating-IP: [212.51.149.109] In-Reply-To: <1435735563-5820-1-git-send-email-rui.y.wang@intel.com> References: <1435735563-5820-1-git-send-email-rui.y.wang@intel.com> Date: Wed, 1 Jul 2015 11:59:34 +0200 Message-ID: Subject: Re: drm/mgag200: doesn't work in panic context From: Daniel Vetter To: Rui Wang Cc: Borislav Petkov , "Luck, Tony" , Dave Airlie , "Clark, Rob" , Matthew D Roper , "Chen, Gong" , Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3498 Lines: 65 On Wed, Jul 1, 2015 at 9:26 AM, Rui Wang wrote: > On Tuesday, June 30, 2015 11:24 PM, Daniel Vetter wrote: >> On Tue, Jun 30, 2015 at 9:23 AM, Rui Wang wrote: >> > But einj does something more than what an IPI can do, it injects hardware >> > errors which trigger exceptions in NMI context... and the exception handler >> > usually panics on fatal errors. And the display may be the only way to catch >> > what has happened. I'm just hoping that the future version may work in >> > NMI context. >> >> NMI sounds ... ambigous ;-) But yeah if we can somehow inject >> something as an NMI too then that would be even better. What I want to >> avoid is forcing reboots, since that means you can't run a basic >> modeset test afterwards to make sure nothing was trampled too badly. >> Of course we'd have replace the screen contents, but the important >> part is that the panic handler doen't touch anything if the driver is >> in modeset code right now (because it'll massively increase the risk >> of dying completely), and an easy way to check that it didn't step all >> over modeset state unduly is to do a modeset afterwards. If that works >> we'll be fine. >> >> Also with that approach we can make sure that no real errors get into >> dmesg (as opposed to a real panic), which means we can capture dmesg >> afterwards and if there is a seroius log message (or even backtrace) >> then drm panic handling has a bug. >> >> All that isn't possible when we force a real panic to happen. >> >> Actually thinking more about NMI that shouldn't be a problem. The >> important thing with nmi vs. hardirq is that you can't even reliably >> grab an irqsave spinlock, it's trylocks all the way down. But that >> also holds for the panic handler, it's trylocks only. Could we somehow >> just check that using lockdep - is there an NMI lockdep context >> somewhere we could fake-grab? That's another upside of using an IPI >> btw: Real panics kill lockdep ;-) > > Einj is supported by ACPI in combination with the hardwre. The injected > errors result in true MCEs, truly non-maskable. Lockdep might not be useful > in this case. Corrected Errors (CEs) don't result in panic but I guess it > might be possible to let it invoke your future mode-setting code for testing > purpose, without rebooting. (Notice that MCEs can happen right from inside > your mode-setting code while accessing any memory address) Yeah NMI can happen anywhere and that's about the worst-case panic context we have. The problem is that NMI bugs are a giant pain to debug, so for testing I think it'd be better to just have a hardirq context + the help of lockdep (if possible) to make sure we only do try_lock and lockless stuff. > But anyway we're not looking for a 100% working solution so if it could only > work in normal irq or ipi context, it'd already be a big plus compared to > what we have now. NMI vs ipi vs other stuff is just about what's the best debug/testing strategy. Most of the work there will really be in writing tons of testcases to race the drm panic handler against drm modeset ioctls. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/