Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752743AbZCIPBQ (ORCPT ); Mon, 9 Mar 2009 11:01:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751691AbZCIPA7 (ORCPT ); Mon, 9 Mar 2009 11:00:59 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:55278 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751670AbZCIPA6 (ORCPT ); Mon, 9 Mar 2009 11:00:58 -0400 Date: Mon, 9 Mar 2009 07:59:37 -0700 (PDT) From: Linus Torvalds X-X-Sender: torvalds@localhost.localdomain To: Alan Stern cc: "Rafael J. Wysocki" , Jeremy Fitzhardinge , LKML , Jesse Barnes , Thomas Gleixner , "Eric W. Biederman" , Ingo Molnar , pm list , =?ISO-8859-15?Q?Arve__Hj=F8nnev=E5g?= Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5) In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (LFD 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2919 Lines: 56 On Sun, 8 Mar 2009, Alan Stern wrote: > > There have been examples in the past of devices that, for one reason or > another, _did_ generate IRQs at inconvenient times. The hardware or > the BIOS may have done improper initialization, for example. On a > shared IRQ this led to interrupt storms. IIRC, the solution was to add > a PCI quirk routine to disable IRQ generation at an early stage. > Didn't e100 have this problem? .. and this is exactly the reason why we've done all these changes. There are tons of drivers that are unable to cope with interrupts that happen after they've done their "pci_set_power_state(PCI_D3hot)". With shared interrupts (and _another_ device still live), they do stupid things like read the interrupt status register, getting all-ones (because the device is dead), and then deciding that that means that that need to handle the interrupt. And that goes on in a loop. Forever. Or they do _that_ part right, but their suspend also free'd some data structure, so now the interrupt handler will follow a NULL pointer and/or scribble to freed memory. The source of bugs is infinite, and not fixable (because, quite frankly, most device driver writers are very focused on the hardware, and have a hard time thinking about it as part of the bigger system - and even if they happen test suspend/resume, they probably won't be testing it with shared interrupts, so it will work _for_them_ even if it's totally broken). So what all the PCI changes try to do is to basically not have the driver do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer. But more importantly, it needs to be done _after_ interrupts have been disabled for this all to work. And, for exactly the same reason, the PCI layer needs to wake the device up and restore its config space _before_ enabling interrupts again, and _before_ doing any ->resume calls. And that, in turn, means that since we have all these ACPI ordering things, and many cases want to use ACPI to wake things up, and/or have delays etc, we end up actually wanting things like timer interrupts working at that time - but not normal "device" interrupts. Because many delays do need them, even as simple delays as the (fairly short, but not "busy loop" short) one for turning the device back into PCI_D0 again. So this literally explains all the re-ordering, and all the interrupt games we now play in Rafael's patch-set. The _whole_ (and only) point is to make it easier for device drivers, while also changing the environment so that we can call ACPI and we can sleep even before the devices have really resumed (or even early_resume'd). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/