Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sat, 21 Dec 2002 17:35:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sat, 21 Dec 2002 17:35:14 -0500 Received: from neon-gw-l3.transmeta.com ([63.209.4.196]:1298 "EHLO neon-gw.transmeta.com") by vger.kernel.org with ESMTP id ; Sat, 21 Dec 2002 17:35:12 -0500 Date: Sat, 21 Dec 2002 14:44:23 -0800 (PST) From: Linus Torvalds To: "Eric W. Biederman" cc: davidm@hpl.hp.com, Ivan Kokshaysky , Subject: Re: PATCH 2.5.x disable BAR when sizing In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3960 Lines: 88 [ Ivan added to the cc, to see if he has any ideas on turning things off ] On 21 Dec 2002, Eric W. Biederman wrote: > > Actually it is not quite as bad as that. > - We can reasonably assume there are no pci to pci transactions going > on, so the only accesses to a pci resource are generated by the > kernel from printk. Actually, I think it's certainly valid to not allow "printk()" to happen around the BAR probing, at least at bootup when we control all the CPU's tightly anyway. And hotplug devices should be disabled at plug-in, so later BAR probing should be pretty harmless too (and, as you point out about bridging, they should be shielded by the hotplug bridge itself). > - If the large device is behind a pci bridge it should be shielded > from the chaos. > > - If we don't call printk until we restore the old BAR value (which > is currently the case (drivers/pci/probe.c:60)) there should be no > transactions on the pci bus, that get a conflicting routing. The problem has been at least in the case I saw it that there are devices that aren't entirely quiescent, often because we haven't even _gotten_ to them yet, and the boot sequence left them active. The one I saw was USB, and that's likely to be the worst case, since it's one of very few devices that tends to "do stuff" even when inactive (ie a USB setup walks the USB command tables in memory continuously, even if nothing is happening). It's also one of the few classes of devices that many PC's have SMM support for, so they are still alive even after the BIOS has otherwise given up control. > As long as the pci bus is quiet while we are sizing a bar the current > method safe. Well, the thing is, as long as the PCI bus is 100% quiet, it simply doesn't _matter_ which method we use. The interesting cases are all "some activity that we don't know about is going on". That's the thing that breaks disabling the PCI device, but it's also the thing that can break _not_ disabling the PCI device. So if we can guarantee a quiescent PCI bus, then I could also accept the patch that disables MEM/IO resources for BAR probing. At that point it simply shouldn't matter any more, and then I'd happily drop my concerns about it. This is why I repeated my "turn the power off at the whole house" analogy, even if David didn't like it. It's _fine_ to turn the power off if we know things are quiet, it's just that as things stand now, we don't actually know that. If somebody wants to try to follow that method, I can try to dig out the machine that I had problems on before and test things at least on that setup to make myself happier about the fact that it really solves the problem. The solution may be as simple as just making our current two-phase PCI scanning be a _three_phase one: - (new) phase 1 - scan for and turn off all devices - phase 2 - go back and check the resources (BAR probing etc) - phase 3 - allocate unassigned resources. One of the problems with turning off devices is that we actually have a hard time doing so. We can trivially turn off IO/MEM/DMA, but PCI doesn't have a good way to turn off interrupts (which in turn can become SCI events). Which still makes me worry about legacy USB in particular - simply because I wonder what happens if the USB controller raises an interrupt which causes an SMM event, which then causes trouble because the SMM handler will be unhappy when the device isn't there any more. We've actually had those kinds of problems in real life, see the quirk_piix3_usb() quirks, for example. So I'm really not trying to be difficult here, it's just that PC BIOS issues, and SMM in _particular_ tends to be quite a horrible mess for the early boot sequence. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/